0% found this document useful (0 votes)

231 views

Discriminant Analysis For Risk Classification and Prediction

1. Discriminant analysis can be used to classify individuals into groups such as good/bad lending risks based on characteristics like age, income, and years married. 2. An analysis is performed on data from 18 credit card customers classified as low or high risk. A discriminant function is built that can correctly classify 94.4% of the cases. 3. The analysis indicates which of the variables - age, income, or years married - are relatively better at discriminating between low and high risk applicants. It also provides a decision rule and cutoff score to classify new applicants.

Uploaded by

Sumit Sharma

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views

Discriminant Analysis For Risk Classification and Prediction

Uploaded by

Sumit Sharma

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 23

Discriminant Analysis

for
Risk Classification
and
Prediction

Ajay Kumar Chauhan

Application Areas

 When we want to distinguish between 2 or 3

sets of objects/people, based on the knowledge of
some of their characteristics.
Eg: Selection process for a job, Admission
process of an educational programme in a college,
or dividing a group of people into potential buyers
and non-buyers.

 Used by Credit rating agencies to rate

individuals, to classify them into good lending risks
or bad lending risks.

 Linear DA can be used to classify objects into

2 or more groups based on the knowledge of some
variables related to them. Typically, these groups
would be users-non-users, potentially successful
salesman-potentially unsuccessful salesman, high
risk–low risk consumer, or on similar lines.
Methods, Data etc.

1. Similar to multiple regression. The form of eq. in a

2-variable DA is:
Y = a + k1 x1 + k2 x2

called the discriminant function. Also, like in a

regression analysis, y is dependent variable and x1 &
x2 are independent variables. k1 & k2 are coefficients
of independent variables, and a is a constant. In
practice, there may be any number of x variables.

3. Y is a Categorical variable (In regression

analysis, it is continuous). X1, X2 are however,
continuous (metric) variables. k1 & k2 are determined
by appropriate algorithms in computer package used,
but the underlying objective is that these two
coefficients should maximise the separation or
differences between the two groups of y variable.
4. Y will have 2 possible values in a 2
group DA, and 3 values in a 3 group DA,
and so on.

5. K1 & K2 are also called Unstandardised

discriminant function coefficients

6. Y is a classification into 2 or more

groups and therefore, a ‘grouping’
variable, in the terminology of DA ie
groups are formed on the basis of existing
data, and coded as 1 & 2 or similar to
dummy variable coding.

7. Independent (x) variables are continuous

scale variables, and used as predictors of
the group to which the objects will belong.
Therefore, to be able to use DA, we need to
have some data on y and x variables from
experience and / or past records.
Building a Model for
Prediction/Classification

Assuming we have data on both Y & X variables

Estimate coefficients of the model and use them to
calculate the Y value (Discriminant score) – for any
new data points that we want to classify into one of
the groups. A decision rule is formulated for this
process – to determine the cut off score, which is
usually the midpoint of mean discriminant scores of
two groups.

Accuracy of Classification:

Then, the classification of existing data points is done

using the eq, and the accuracy of model is determined.
This output is given by the classification matrix (also
called Confusion matrix), which tells us what
percentage of the existing data points is correctly
classified by this model.
This percentage is somewhat analogous to R2
in regression analysis (percentage of variation
in dependent variable explained by the
model). Of course, the actual predictive
accuracy of discriminant model may be less
than the figure obtained by applying it to the
data points on which it was based.

Stepwise / Fixed Model:

Just as in regression, we have the option of

entering one variable at a time (Stepwise) into
the discriminant eq, or entering all variables
which we plan to use. Depending on the
correlations between the independent
variables, and the objective of the study
(exploratory or predictive / confirmatory), the
choice is left to the student.
Relative Importance of Independent
Variables

1. Suppose we have two independent

variables, x1 and x2. How do we know which
one is more imp in discriminating between
groups?

2. Coefficients of x1 & x2 are the ones which

provide the answer, but not the raw
(unstandardised) coefficients. To overcome
the problem of different measurement units,
we must obtain standardised discriminant
coefficients. These are available from the
computer output.

3. Higher the standardised discriminant

coefficient of a variable, the higher its
discriminating power.
A Priori Probability of Classification into Groups

The DA requires us to assign an a priori (before

analysis) probability of a given case belonging to one
of the groups. There are two ways of doing this.

•.We can assign an equal probability of

assignment to all groups. Thus, in a 2 group
discriminant analysis, we can assign 0.5 as the
probability of a case being assigned to any
group.

•.We can formulate any other rule for the

assignment of probabilities. For e.g. the
probabilities could proportional to the group
size in sample data. If two thirds of the sample
is in one group, the a priori probability of a
case being in that group would be 0.66 (two
thirds).
Statistic Associated with DA

Connonical Correlation: measures the

association between the D-Scores and the
group or measures the association between the
single DF and and the set of dummy variables
that defines the group membership

Centroid: mean value for the D-scores

for a particular group.

Classification Matrix: Contains the no

of correctly classified and misclassified cases.
Correctly classified cases appears on the
diagonal

Hit Ratio: Sum of diagonal elements

divided by total no of cases.
Discriminant functions coefficients
(unstandardised) are the multiplier of
variables, when the variables are in the
original units of measurement.

D-Scores: Unstandardised coeff are

multiplied by the values of the variables.
These products are summed and added to the
constant terms.

Eigen Value: ratio of between group to

within group sum of squares

F-Value and its significance: calculated

uning anova

Group means and group S.D.

Standardised discriminant function
coefficients: provides the relative influence of
predictors on the distinct groups.

Structure correlations: represent the simple

correlations between the predictors and the
discriminant function.

Wilk’s : ratio of RSS to TSS ( within

group sum of squares to TSS)
Problem

Suppose SBB wants to start credit card

division. They want to use discriminant
analysis and set up a system to screen
applicants and classify them as either ‘low
risk’ or ‘high risk’ (risk of default on credit
card bill payments), based on information
collected from their applications for a credit
card.

Suppose SBB has managed to get from SBI,

its sister bank, some data on SBI’s credit card
holders who turned out to be ‘low risk’ (no
default) and ‘high risk’ (defaulting on
payments) customers. These data on 18
customers are given in fig. 1.
Fig. 1

1 1 3 4
RISKL AG INC YRSM
1 OHI1 E35 4000 ARID
OME 8
2 1 33 4500
0 6
3 1 29 3600
0 5
4 2 22 3200
0 0
5 2 26 3000
0 1
6 1 28 3500
0 6
7 2 30 3100
0 7
8 2 23 2700
0 2
9 1 32 4800
0 6
10 2 24 1200
0 4
11 2 26 1500
0 3
12 1 38 2500
0 7
13 1 40 2000
0 5
14 2 32 1800
0 4
15 1 36 2400
0 3
16 2 31 1700
0 5
17 2 28 1400
0 3
18 1 33 1800
0 6
0
We will perform a DA and advise SBB on
how to set up its system to screen potential
good customers (low risk) from bad customers
(high risk). In particular, we will build a
discriminant function (model) and find out

The percentage of customers that it is able

to classify correctly.

 Statistical significance of discriminant

function.

 Which variables (age, income, or years of

marriage) are relatively better in
discriminating between ‘low’ and ‘high’ risk
applicants.

 How to classify a new credit card applicant

into one of the two groups – ‘low risk’ or
‘high risk’, by building a decision rule and a
cut off score.
Input Data are given in fig. 1.

Interpretation of Computer Output:

Q1. How good is the Model? How many

of 18 data points does it classify
correctly?

fig. 3 is a part of DA output from any

computer package such as SPSS

Fig. 3 : Classification Matrix

STAT. Classification Matrix

DISCR
Group (discrbkl.sta)
Percent G_1 G_2
IM.
G1 Rows: 9
100.000
Correct (Predict Observed
0
(Predict
Total
ANAL
(Obser 94.4444 10
classifications
0 1
ed) 8
ed)
ved)
YSIS Columns:P=.5000 Predicted
P=.500
G2 classifications
88.8889 0 00
(Obser
ved)
This output is called the Classification
matrix (confusion matrix), and it indicates
that the D-function is able to classify 94.44
% of 18 objects correctly. This fig is in the
“percent correct” column of classification
matrix.

More specifically, it also says that out of 10

cases predicted to be in group 1, 9 were
observed to be in group 1 and 1 in Group 2,
(from column G-1). Similarly, from the
column G-2, we understand that out of 8
cases predicted to be in group 2, all 8 were
found to be in group 2. Thus, on the whole,
only 1 case out of 18 was misclassified by
discriminant model, thus giving us a
classification (or prediction) accuracy level
of (18-1)/18, or 94.44 %.
Q2. How significant (statistically) is
the discriminant function?
answered by Wilks’ Lambda and P-Value
for F test shown below:
Discriminant Function Analysis Results
No. of variables in model: 3
Wilks’ Lambda: .3188764 approx.
F (3, 14) = 9.968056
p < .00089

Value of Wilks’ Lamba is 0.318. This value

is between 0 and 1, and a low value (closer
to 0) indicates better discriminating power of
the model. Thus, 0.318 is an indicator of the
model being good. P-value of F test indicates
that the discrimination between 2 groups is
highly significant. This is because p
<.00089, which indicates that the F test
would be significant at a confidence level of
Q3. We have 3 predictor variables – Age,
Income and no of Years Married for.
Which of these is a better predictor of a
person being a low credit risk or high
credit risk?
Look at the standardised coefficients in the
output. These are shown below:
Fig. 5.

STAT Standardized
Variab
. Root 1
Coefficients
AGE
DISC _.9239
le (discrbkl.sta) for
Eigen 2.1360
INCO Canonical
RIM. 55
val
ANA 12
ME Variables
_.7747
This
YRSoutput
Cum.
LYSI 80shows that Age is the best
1.0000
predictor,
Prop with
00 the coefficient of –0.92,
SMARI _.1512
followed
D by 98Income, with a coefficient of –
0.77, Years of Marriage is the last, with a
coefficient of – 0.15, Please recall that the
absolute value of the standardised coefficient
Q4. How do we classify a new credit
card applicant into either a ‘high risk’
or ‘low risk’ category, and make a
decision on accepting or refusing him a
credit card?

Most imp question to be answered. SBB

wished to have a decision model for
screening credit card applicants.

The way to do this is to use the outputs in

fig. 4 (unstandardised coefficients in D-
function) and fig. 6 (Means of canonical
variables). Fig. 6, the means of canonical
variables, gives us the new means for
transformed group centroids.
Fig. 6.

STAT Means of
.Group Canonical
Root 1
DISC
G_1:1 Variables
-
RIM. (discrbkl.sta)
Thus, the new mean for group
1 (low risk) is – 1.37793, and
the new mean for group 2 (high
risk) is + 1.37792. This means
that the midpoint of these two
is 0. This is clear when we plot
the two means on a straight
line, and locate their midpoint,
as shown below-

-1.37 0 +1.37

Mean of Group1 Mean of Group2

(Low Risk) (High Risk)
This gives a decision rule for classifying
any new case. If D-score of an applicant
falls to the right of midpoint, we classify
him as ‘high risk’, and if D-score of an
applicant falls to left of midpoint, we
classify him as ‘low risk’. In this case, the
midpoint is 0. Therefore, any positive
(greater than 0) value of D- score will lead
to classification as ‘high risk’, and any
negative (less than 0) value of D-score
will lead to classification as ‘low risk’.
But how do we compute the D-scores of
an applicant?

We use the applicant’s Age, Income and

Years of Marriage (from his application)
and plug these into the unstandardised D-
function. This gives us his D-score.
Fig. 4.

STAT. Raw Coefficients

Variable
DISCRI Root 1
(discrbkl.sta) for
M. AGE Canonical
-.24560
Constan
INCOM 10.00335
_.00008
ANALY Variables
Eigenval
tSIS E 2.13601
_.08465
Cum.Pr
YRSM 1.00000
op ARID
From Fig. 4 (reproduced above), the unstandardised
(or raw) discriminant function is

Y = 10.0036 – Age (.24560) – Income (.00008)

- Yrs. Married (.08465)

Where y would give us the discriminant score of any

person whose Age, Income and Yrs. Married were
known.
Example: A credit card application of a
customer, Aged=40, Income (Rs. 25,000 per
month) and married for 15 years. Plugging
these values into the D-function above, we
find his D-score y to be

= 10.0036 – 40 (.24560) – 25000 (.00008) -

15 (.08465),
which is = - 3.09015

According to decision rule, any D-score to

the left of the midpoint of 0 leads to a
classification in low risk group. Therefore,
we should give this person a credit card, as
he is a low risk customer. The same process
is to be followed for any new applicant. If
his D-score is to the right of the midpoint of
0, he should be denied a credit card, as he is
a ‘high risk’ customer.

SPSS ANNOTATED OUTPUT Discriminant Analysis 1
No ratings yet
SPSS ANNOTATED OUTPUT Discriminant Analysis 1
14 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
Ant Analysis
No ratings yet
Ant Analysis
31 pages
Ant Analysis
No ratings yet
Ant Analysis
18 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
29 pages
Discriminant Analysis: Prepared By-Sumit Jain
No ratings yet
Discriminant Analysis: Prepared By-Sumit Jain
44 pages
Chapter 25 - Discriminant Analysis
No ratings yet
Chapter 25 - Discriminant Analysis
20 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
20 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
13 pages
Multiple Discriminant Analysis: Dr. Hemal Pandya
No ratings yet
Multiple Discriminant Analysis: Dr. Hemal Pandya
29 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Discriminant Analysis Psy.
No ratings yet
Discriminant Analysis Psy.
5 pages
Classification Models
No ratings yet
Classification Models
95 pages
Discriminant Analysis PDF
No ratings yet
Discriminant Analysis PDF
9 pages
Discriminant Analysis: Discriminant Functions Is A
No ratings yet
Discriminant Analysis: Discriminant Functions Is A
17 pages
Discriminant & Logit Analysis Using SAS Enterprise Guide
No ratings yet
Discriminant & Logit Analysis Using SAS Enterprise Guide
53 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
33 pages
Session 16-Discriminant Analysis
No ratings yet
Session 16-Discriminant Analysis
16 pages
Discriminant analysis
No ratings yet
Discriminant analysis
20 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
33 pages
Materi Discrminant Analysis
No ratings yet
Materi Discrminant Analysis
83 pages
Discriminant 5
No ratings yet
Discriminant 5
10 pages
Chap12 DiscriminantAnalysis
No ratings yet
Chap12 DiscriminantAnalysis
30 pages
9 ASAP Advanced Statistics Discriminant Analyisi
No ratings yet
9 ASAP Advanced Statistics Discriminant Analyisi
45 pages
DFA Interpretation Help
No ratings yet
DFA Interpretation Help
36 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
19 pages
Discriminant Function Analysis
No ratings yet
Discriminant Function Analysis
16 pages
Ant Analysis (Smoker Edition) Final
No ratings yet
Ant Analysis (Smoker Edition) Final
13 pages
Ant An Aly Sis: Presented by
No ratings yet
Ant An Aly Sis: Presented by
33 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
32 pages
Unit 3 MA
No ratings yet
Unit 3 MA
60 pages
Project Report Multiple Discriminant Analysis: in Partial Fulfilment of Covering The Course of Business Research Methods
No ratings yet
Project Report Multiple Discriminant Analysis: in Partial Fulfilment of Covering The Course of Business Research Methods
36 pages
Discriminant Analysis: How Can You Answer These Questions?
No ratings yet
Discriminant Analysis: How Can You Answer These Questions?
14 pages
Analiza Discriminanta
No ratings yet
Analiza Discriminanta
3 pages
Session 12 - Discriminant Analysis
No ratings yet
Session 12 - Discriminant Analysis
13 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Discriminant Function Analysis
100% (1)
Discriminant Function Analysis
30 pages
Why Discriminant Analysis Is Done
No ratings yet
Why Discriminant Analysis Is Done
22 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
7 pages
Empirical Data Analysis in Accounting and Finance
No ratings yet
Empirical Data Analysis in Accounting and Finance
37 pages
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
No ratings yet
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
22 pages
Two Group Discriminant Function Analysis
No ratings yet
Two Group Discriminant Function Analysis
4 pages
Discrimi NT
No ratings yet
Discrimi NT
18 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
Discriminant Analysis Chapter-Seven
No ratings yet
Discriminant Analysis Chapter-Seven
7 pages
Analisis Diskriminan 2
No ratings yet
Analisis Diskriminan 2
30 pages
MR - Discriminant Analysis
No ratings yet
MR - Discriminant Analysis
32 pages
Discriminant Ana
No ratings yet
Discriminant Ana
8 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
6 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
19 pages
Discriminant Analysis - Groups 3 - 6 - Part 1
No ratings yet
Discriminant Analysis - Groups 3 - 6 - Part 1
15 pages
Discriminant Analysis Presentation
No ratings yet
Discriminant Analysis Presentation
21 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
16 pages
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sales Forcasting
No ratings yet
Sales Forcasting
28 pages
Ch4: Havaldar and Cavale
75% (4)
Ch4: Havaldar and Cavale
27 pages
Forward Rate Agreement
No ratings yet
Forward Rate Agreement
17 pages
WTO Provisions and Implications For India
No ratings yet
WTO Provisions and Implications For India
19 pages
Analysis of DLF (FINAL)
No ratings yet
Analysis of DLF (FINAL)
31 pages
MCQ Inventory Valuation LBSIM
No ratings yet
MCQ Inventory Valuation LBSIM
49 pages
Semester 1 Review Packet Answer Key
No ratings yet
Semester 1 Review Packet Answer Key
6 pages
Lightweight CPT System
No ratings yet
Lightweight CPT System
2 pages
Why Suspension Analysis Is Performed
No ratings yet
Why Suspension Analysis Is Performed
22 pages
Lecture # 1 Introduction and Scope of Statistics
No ratings yet
Lecture # 1 Introduction and Scope of Statistics
34 pages
Master Minds: No.1 For CA/CWA & MEC/CEC
No ratings yet
Master Minds: No.1 For CA/CWA & MEC/CEC
8 pages
Guess Paper General Ability CSS 2023 by Sir Sabir Hussain
No ratings yet
Guess Paper General Ability CSS 2023 by Sir Sabir Hussain
2 pages
Polynomials Class 9 Solution
No ratings yet
Polynomials Class 9 Solution
3 pages
Math G9: Quarter 3 - Week 2
No ratings yet
Math G9: Quarter 3 - Week 2
21 pages
WWW - Dauniv.ac - in Downloads EmbsysRevEd PPTs Chap 5Lesson01EmsysNewCProgrElements
No ratings yet
WWW - Dauniv.ac - in Downloads EmbsysRevEd PPTs Chap 5Lesson01EmsysNewCProgrElements
55 pages
B.Tech Third Year Computer Science and Engineering From Academic Year 2016-17
No ratings yet
B.Tech Third Year Computer Science and Engineering From Academic Year 2016-17
14 pages
Gibbs - Konovalov Theorem
100% (1)
Gibbs - Konovalov Theorem
3 pages
Roadmap:: Six Months To Machine Learning
No ratings yet
Roadmap:: Six Months To Machine Learning
22 pages
1st year physics short questions notes exercise based_240930_110106
No ratings yet
1st year physics short questions notes exercise based_240930_110106
42 pages
Treumann R.a., Baum Johann W. Advanced Space Plasma Physics (WS, 1997) (ISBN 1860940269) (388s) - PPL
100% (1)
Treumann R.a., Baum Johann W. Advanced Space Plasma Physics (WS, 1997) (ISBN 1860940269) (388s) - PPL
388 pages
Costas 2
No ratings yet
Costas 2
4 pages
2013 Aerodynamics of Ground-Mounted Solar Panels Test Model Scale Effects
No ratings yet
2013 Aerodynamics of Ground-Mounted Solar Panels Test Model Scale Effects
11 pages
Namma Kalvi 12 Cs em Atp Practical Hand Book 215961
No ratings yet
Namma Kalvi 12 Cs em Atp Practical Hand Book 215961
29 pages
Logit Model of Motorcycle Accidents in The Philippines Considering Personal and Environmental Factors
No ratings yet
Logit Model of Motorcycle Accidents in The Philippines Considering Personal and Environmental Factors
12 pages
M.Tech SKA Thesis PDF
No ratings yet
M.Tech SKA Thesis PDF
138 pages
Plane Wave Propagation in Lossless Media
100% (1)
Plane Wave Propagation in Lossless Media
39 pages
Cambridge IGCSE: Additional Mathematics 0606/12
No ratings yet
Cambridge IGCSE: Additional Mathematics 0606/12
16 pages
Study of Modernization of Distillation Units and Applications of Nonlinear ROI Equity Model: A Case of Gayo Lues Patchouli Value Chain
No ratings yet
Study of Modernization of Distillation Units and Applications of Nonlinear ROI Equity Model: A Case of Gayo Lues Patchouli Value Chain
13 pages
SNAP Mock Test 4
No ratings yet
SNAP Mock Test 4
38 pages
Engineering Mathematics-I PDF
100% (1)
Engineering Mathematics-I PDF
3 pages
Canonical Forms: 1) Triangular Form
100% (2)
Canonical Forms: 1) Triangular Form
19 pages
Lec06b - Bsa 2103 - 011920
No ratings yet
Lec06b - Bsa 2103 - 011920
1 page
Lec06 AI Knowledge Representation Reasoning
No ratings yet
Lec06 AI Knowledge Representation Reasoning
71 pages
Improper Integrals
No ratings yet
Improper Integrals
20 pages
Exercise # 1: General Term
No ratings yet
Exercise # 1: General Term
5 pages
Pyramids and Scalar Waves PDF
No ratings yet
Pyramids and Scalar Waves PDF
10 pages

Discriminant Analysis For Risk Classification and Prediction

Uploaded by

Discriminant Analysis For Risk Classification and Prediction

Uploaded by

Discriminant Analysis

Ajay Kumar Chauhan

 When we want to distinguish between 2 or 3

 Used by Credit rating agencies to rate

 Linear DA can be used to classify objects into

1. Similar to multiple regression. The form of eq. in a

called the discriminant function. Also, like in a

3. Y is a Categorical variable (In regression

5. K1 & K2 are also called Unstandardised

6. Y is a classification into 2 or more

7. Independent (x) variables are continuous

Assuming we have data on both Y & X variables

Then, the classification of existing data points is done

Stepwise / Fixed Model:

Just as in regression, we have the option of

1. Suppose we have two independent

2. Coefficients of x1 & x2 are the ones which

3. Higher the standardised discriminant

The DA requires us to assign an a priori (before

•.We can assign an equal probability of

•.We can formulate any other rule for the

Connonical Correlation: measures the

Centroid: mean value for the D-scores

Classification Matrix: Contains the no

Hit Ratio: Sum of diagonal elements

D-Scores: Unstandardised coeff are

Eigen Value: ratio of between group to

F-Value and its significance: calculated

Group means and group S.D.

Structure correlations: represent the simple

Wilk’s : ratio of RSS to TSS ( within

Suppose SBB wants to start credit card

Suppose SBB has managed to get from SBI,

The percentage of customers that it is able

 Statistical significance of discriminant

 Which variables (age, income, or years of

 How to classify a new credit card applicant

Interpretation of Computer Output:

Q1. How good is the Model? How many

fig. 3 is a part of DA output from any

Fig. 3 : Classification Matrix

STAT. Classification Matrix

More specifically, it also says that out of 10

Value of Wilks’ Lamba is 0.318. This value

Most imp question to be answered. SBB

The way to do this is to use the outputs in

Mean of Group1 Mean of Group2

We use the applicant’s Age, Income and

STAT. Raw Coefficients

Y = 10.0036 – Age (.24560) – Income (.00008)

Where y would give us the discriminant score of any

= 10.0036 – 40 (.24560) – 25000 (.00008) -

According to decision rule, any D-score to

You might also like