0% found this document useful (0 votes)
231 views

Discriminant Analysis For Risk Classification and Prediction

1. Discriminant analysis can be used to classify individuals into groups such as good/bad lending risks based on characteristics like age, income, and years married. 2. An analysis is performed on data from 18 credit card customers classified as low or high risk. A discriminant function is built that can correctly classify 94.4% of the cases. 3. The analysis indicates which of the variables - age, income, or years married - are relatively better at discriminating between low and high risk applicants. It also provides a decision rule and cutoff score to classify new applicants.

Uploaded by

Sumit Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
231 views

Discriminant Analysis For Risk Classification and Prediction

1. Discriminant analysis can be used to classify individuals into groups such as good/bad lending risks based on characteristics like age, income, and years married. 2. An analysis is performed on data from 18 credit card customers classified as low or high risk. A discriminant function is built that can correctly classify 94.4% of the cases. 3. The analysis indicates which of the variables - age, income, or years married - are relatively better at discriminating between low and high risk applicants. It also provides a decision rule and cutoff score to classify new applicants.

Uploaded by

Sumit Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Discriminant Analysis

for
Risk Classification
and
Prediction

Ajay Kumar Chauhan


Application Areas

 When we want to distinguish between 2 or 3


sets of objects/people, based on the knowledge of
some of their characteristics.
Eg: Selection process for a job, Admission
process of an educational programme in a college,
or dividing a group of people into potential buyers
and non-buyers.

 Used by Credit rating agencies to rate


individuals, to classify them into good lending risks
or bad lending risks.

 Linear DA can be used to classify objects into


2 or more groups based on the knowledge of some
variables related to them. Typically, these groups
would be users-non-users, potentially successful
salesman-potentially unsuccessful salesman, high
risk–low risk consumer, or on similar lines.
Methods, Data etc.

1. Similar to multiple regression. The form of eq. in a


2-variable DA is:
Y = a + k1 x1 + k2 x2

called the discriminant function. Also, like in a


regression analysis, y is dependent variable and x1 &
x2 are independent variables. k1 & k2 are coefficients
of independent variables, and a is a constant. In
practice, there may be any number of x variables.

3. Y is a Categorical variable (In regression


analysis, it is continuous). X1, X2 are however,
continuous (metric) variables. k1 & k2 are determined
by appropriate algorithms in computer package used,
but the underlying objective is that these two
coefficients should maximise the separation or
differences between the two groups of y variable.
4. Y will have 2 possible values in a 2
group DA, and 3 values in a 3 group DA,
and so on.

5. K1 & K2 are also called Unstandardised


discriminant function coefficients

6. Y is a classification into 2 or more


groups and therefore, a ‘grouping’
variable, in the terminology of DA ie
groups are formed on the basis of existing
data, and coded as 1 & 2 or similar to
dummy variable coding.

7. Independent (x) variables are continuous


scale variables, and used as predictors of
the group to which the objects will belong.
Therefore, to be able to use DA, we need to
have some data on y and x variables from
experience and / or past records.
Building a Model for
Prediction/Classification

Assuming we have data on both Y & X variables


Estimate coefficients of the model and use them to
calculate the Y value (Discriminant score) – for any
new data points that we want to classify into one of
the groups. A decision rule is formulated for this
process – to determine the cut off score, which is
usually the midpoint of mean discriminant scores of
two groups.

Accuracy of Classification:

Then, the classification of existing data points is done


using the eq, and the accuracy of model is determined.
This output is given by the classification matrix (also
called Confusion matrix), which tells us what
percentage of the existing data points is correctly
classified by this model.
This percentage is somewhat analogous to R2
in regression analysis (percentage of variation
in dependent variable explained by the
model). Of course, the actual predictive
accuracy of discriminant model may be less
than the figure obtained by applying it to the
data points on which it was based.

Stepwise / Fixed Model:

Just as in regression, we have the option of


entering one variable at a time (Stepwise) into
the discriminant eq, or entering all variables
which we plan to use. Depending on the
correlations between the independent
variables, and the objective of the study
(exploratory or predictive / confirmatory), the
choice is left to the student.
Relative Importance of Independent
Variables

1. Suppose we have two independent


variables, x1 and x2. How do we know which
one is more imp in discriminating between
groups?

2. Coefficients of x1 & x2 are the ones which


provide the answer, but not the raw
(unstandardised) coefficients. To overcome
the problem of different measurement units,
we must obtain standardised discriminant
coefficients. These are available from the
computer output.

3. Higher the standardised discriminant


coefficient of a variable, the higher its
discriminating power.
A Priori Probability of Classification into Groups

The DA requires us to assign an a priori (before


analysis) probability of a given case belonging to one
of the groups. There are two ways of doing this.

•.We can assign an equal probability of


assignment to all groups. Thus, in a 2 group
discriminant analysis, we can assign 0.5 as the
probability of a case being assigned to any
group.

•.We can formulate any other rule for the


assignment of probabilities. For e.g. the
probabilities could proportional to the group
size in sample data. If two thirds of the sample
is in one group, the a priori probability of a
case being in that group would be 0.66 (two
thirds).
Statistic Associated with DA

Connonical Correlation: measures the


association between the D-Scores and the
group or measures the association between the
single DF and and the set of dummy variables
that defines the group membership

Centroid: mean value for the D-scores


for a particular group.

Classification Matrix: Contains the no


of correctly classified and misclassified cases.
Correctly classified cases appears on the
diagonal

Hit Ratio: Sum of diagonal elements


divided by total no of cases.
Discriminant functions coefficients
(unstandardised) are the multiplier of
variables, when the variables are in the
original units of measurement.

D-Scores: Unstandardised coeff are


multiplied by the values of the variables.
These products are summed and added to the
constant terms.

Eigen Value: ratio of between group to


within group sum of squares

F-Value and its significance: calculated


uning anova

Group means and group S.D.


Standardised discriminant function
coefficients: provides the relative influence of
predictors on the distinct groups.

Structure correlations: represent the simple


correlations between the predictors and the
discriminant function.

Wilk’s : ratio of RSS to TSS ( within


group sum of squares to TSS)
Problem

Suppose SBB wants to start credit card


division. They want to use discriminant
analysis and set up a system to screen
applicants and classify them as either ‘low
risk’ or ‘high risk’ (risk of default on credit
card bill payments), based on information
collected from their applications for a credit
card.

Suppose SBB has managed to get from SBI,


its sister bank, some data on SBI’s credit card
holders who turned out to be ‘low risk’ (no
default) and ‘high risk’ (defaulting on
payments) customers. These data on 18
customers are given in fig. 1.
Fig. 1

1 1 3 4
RISKL AG INC YRSM
1 OHI1 E35 4000 ARID
OME 8
2 1 33 4500
0 6
3 1 29 3600
0 5
4 2 22 3200
0 0
5 2 26 3000
0 1
6 1 28 3500
0 6
7 2 30 3100
0 7
8 2 23 2700
0 2
9 1 32 4800
0 6
10 2 24 1200
0 4
11 2 26 1500
0 3
12 1 38 2500
0 7
13 1 40 2000
0 5
14 2 32 1800
0 4
15 1 36 2400
0 3
16 2 31 1700
0 5
17 2 28 1400
0 3
18 1 33 1800
0 6
0
We will perform a DA and advise SBB on
how to set up its system to screen potential
good customers (low risk) from bad customers
(high risk). In particular, we will build a
discriminant function (model) and find out

The percentage of customers that it is able


to classify correctly.

 Statistical significance of discriminant


function.

 Which variables (age, income, or years of


marriage) are relatively better in
discriminating between ‘low’ and ‘high’ risk
applicants.

 How to classify a new credit card applicant


into one of the two groups – ‘low risk’ or
‘high risk’, by building a decision rule and a
cut off score.
Input Data are given in fig. 1.

Interpretation of Computer Output:

Q1. How good is the Model? How many


of 18 data points does it classify
correctly?

fig. 3 is a part of DA output from any


computer package such as SPSS

Fig. 3 : Classification Matrix

STAT. Classification Matrix


DISCR
Group (discrbkl.sta)
Percent G_1 G_2
IM.
G1 Rows: 9
100.000
Correct (Predict Observed
0
(Predict
Total
ANAL
(Obser 94.4444 10
classifications
0 1
ed) 8
ed)
ved)
YSIS Columns:P=.5000 Predicted
P=.500
G2 classifications
88.8889 0 00
(Obser
ved)
This output is called the Classification
matrix (confusion matrix), and it indicates
that the D-function is able to classify 94.44
% of 18 objects correctly. This fig is in the
“percent correct” column of classification
matrix.

More specifically, it also says that out of 10


cases predicted to be in group 1, 9 were
observed to be in group 1 and 1 in Group 2,
(from column G-1). Similarly, from the
column G-2, we understand that out of 8
cases predicted to be in group 2, all 8 were
found to be in group 2. Thus, on the whole,
only 1 case out of 18 was misclassified by
discriminant model, thus giving us a
classification (or prediction) accuracy level
of (18-1)/18, or 94.44 %.
Q2. How significant (statistically) is
the discriminant function?
answered by Wilks’ Lambda and P-Value
for F test shown below:
Discriminant Function Analysis Results
No. of variables in model: 3
Wilks’ Lambda: .3188764 approx.
F (3, 14) = 9.968056
p < .00089

Value of Wilks’ Lamba is 0.318. This value


is between 0 and 1, and a low value (closer
to 0) indicates better discriminating power of
the model. Thus, 0.318 is an indicator of the
model being good. P-value of F test indicates
that the discrimination between 2 groups is
highly significant. This is because p
<.00089, which indicates that the F test
would be significant at a confidence level of
Q3. We have 3 predictor variables – Age,
Income and no of Years Married for.
Which of these is a better predictor of a
person being a low credit risk or high
credit risk?
Look at the standardised coefficients in the
output. These are shown below:
Fig. 5.

STAT Standardized
Variab
. Root 1
Coefficients
AGE
DISC _.9239
le (discrbkl.sta) for
Eigen 2.1360
INCO Canonical
RIM. 55
val
ANA 12
ME Variables
_.7747
This
YRSoutput
Cum.
LYSI 80shows that Age is the best
1.0000
predictor,
Prop with
00 the coefficient of –0.92,
SMARI _.1512
followed
D by 98Income, with a coefficient of –
0.77, Years of Marriage is the last, with a
coefficient of – 0.15, Please recall that the
absolute value of the standardised coefficient
Q4. How do we classify a new credit
card applicant into either a ‘high risk’
or ‘low risk’ category, and make a
decision on accepting or refusing him a
credit card?

Most imp question to be answered. SBB


wished to have a decision model for
screening credit card applicants.

The way to do this is to use the outputs in


fig. 4 (unstandardised coefficients in D-
function) and fig. 6 (Means of canonical
variables). Fig. 6, the means of canonical
variables, gives us the new means for
transformed group centroids.
Fig. 6.

STAT Means of
.Group Canonical
Root 1
DISC
G_1:1 Variables
-
RIM. (discrbkl.sta)
Thus, the new mean for group
1 (low risk) is – 1.37793, and
the new mean for group 2 (high
risk) is + 1.37792. This means
that the midpoint of these two
is 0. This is clear when we plot
the two means on a straight
line, and locate their midpoint,
as shown below-

-1.37 0 +1.37

Mean of Group1 Mean of Group2


(Low Risk) (High Risk)
This gives a decision rule for classifying
any new case. If D-score of an applicant
falls to the right of midpoint, we classify
him as ‘high risk’, and if D-score of an
applicant falls to left of midpoint, we
classify him as ‘low risk’. In this case, the
midpoint is 0. Therefore, any positive
(greater than 0) value of D- score will lead
to classification as ‘high risk’, and any
negative (less than 0) value of D-score
will lead to classification as ‘low risk’.
But how do we compute the D-scores of
an applicant?

We use the applicant’s Age, Income and


Years of Marriage (from his application)
and plug these into the unstandardised D-
function. This gives us his D-score.
Fig. 4.

STAT. Raw Coefficients


Variable
DISCRI Root 1
(discrbkl.sta) for
M. AGE Canonical
-.24560
Constan
INCOM 10.00335
_.00008
ANALY Variables
Eigenval
tSIS E 2.13601
_.08465
Cum.Pr
YRSM 1.00000
op ARID
From Fig. 4 (reproduced above), the unstandardised
(or raw) discriminant function is

Y = 10.0036 – Age (.24560) – Income (.00008)


- Yrs. Married (.08465)

Where y would give us the discriminant score of any


person whose Age, Income and Yrs. Married were
known.
Example: A credit card application of a
customer, Aged=40, Income (Rs. 25,000 per
month) and married for 15 years. Plugging
these values into the D-function above, we
find his D-score y to be

= 10.0036 – 40 (.24560) – 25000 (.00008) -


15 (.08465),
which is = - 3.09015

According to decision rule, any D-score to


the left of the midpoint of 0 leads to a
classification in low risk group. Therefore,
we should give this person a credit card, as
he is a low risk customer. The same process
is to be followed for any new applicant. If
his D-score is to the right of the midpoint of
0, he should be denied a credit card, as he is
a ‘high risk’ customer.

You might also like