0% found this document useful (0 votes)
65 views72 pages

Discriminant and Logit Analysis

502Lec10

Uploaded by

jeremy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views72 pages

Discriminant and Logit Analysis

502Lec10

Uploaded by

jeremy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Chapter Eighteen

Discriminant and Logit Analysis

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-1


Direct marketing scenario

• Every year, the DM company sends to one annual


catalog, four seasonal catalogs, and a number of
calalogs for holiday seasons.
• The company has a list of 10 million potential
buyers.
• The response rate is on average 5%.
• Who should receive a catalog? (mostly likely to
buy)
• How are buyers different from nonbuyers?
• Who are most likely to default?...

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-2


Similarities and Differences between ANOVA,
Regression, and Discriminant Analysis
Table 18.1

ANOVA REGRESSION DISCRIMINANT/LOGIT


Similarities
Number of One One One
dependent
variables
Number of
independent Multiple Multiple Multiple
variables

Differences
Nature of the
dependent Metric Metric Categorical
variables
Nature of the
independent Categorical Metric Metric (or binary, dummies)
variables

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-3


Discriminant Analysis

Discriminant analysis is a technique for analyzing data when


the criterion or dependent variable is categorical and the
predictor or independent variables are interval in nature.

The objectives of discriminant analysis are as follows:


• Development of discriminant functions, or linear
combinations of the predictor or independent variables, which
will best discriminate between the categories of the criterion or
dependent variable (groups). (buyers vs. nonbuyers)
• Examination of whether significant differences exist among the
groups, in terms of the predictor variables.
• Determination of which predictor variables contribute to most
of the intergroup differences.
• Classification of cases to one of the groups based on the values
of the predictor variables.
• Evaluation of the accuracy of classification.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-4


Discriminant Analysis

• When the criterion variable has two categories, the technique


is known as two-group discriminant analysis.
• When three or more categories are involved, the technique is
referred to as multiple discriminant analysis.
• The main distinction is that, in the two-group case, it is
possible to derive only one discriminant function. In multiple
discriminant analysis, more than one function may be
computed. In general, with G groups and k predictors, it is
possible to estimate up to the smaller of G - 1, or k,
discriminant functions.
• The first function has the highest ratio of between-groups to
within-groups sum of squares. The second function,
uncorrelated with the first, has the second highest ratio, and
so on. However, not all the functions may be statistically
significant.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-5


Geometric Interpretation

Fig. 18.1

X2 G1
1 1 2 2
G2
1 1 11 2
1 1 1 1 2
1
2 2
2 22
1 2
21
1 22
22

G1

G2 X1

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-6


Discriminant Analysis Model

The discriminant analysis model involves linear combinations of


the following form:
D = b0 + b1X1 + b2X2 + b3X3 + . . . + bkXk
Where:
D = discriminant score
b 's = discriminant coefficient or weight
X 's = predictor or independent variable

• The coefficients, or weights (b), are estimated so that the groups


differ as much as possible on the values of the discriminant function.
• This occurs when the ratio of between-group sum of squares to
within-group sum of squares for the discriminant scores is at a
maximum.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-7
Statistics Associated with
Discriminant Analysis

• Canonical correlation. Canonical correlation measures


the extent of association between the discriminant scores
and the groups. It is a measure of association between the
single discriminant function and the set of dummy variables
that define the group membership.
• Centroid. The centroid is the mean values for the
discriminant scores for a particular group. There are as
many centroids as there are groups, as there is one for each
group. The means for a group on all the functions are the
group centroids.
• Classification matrix. Sometimes also called confusion or
prediction matrix, the classification matrix contains the
number of correctly classified and misclassified cases.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-8


Statistics Associated with Discriminant
Analysis

• Discriminant function coefficients. The discriminant


function coefficients (unstandardized) are the multipliers of
variables, when the variables are in the original units of
measurement.
• Discriminant scores. The unstandardized coefficients are
multiplied by the values of the variables. These products
are summed and added to the constant term to obtain the
discriminant scores.
• Eigenvalue. For each discriminant function, the Eigenvalue
is the ratio of between-group to within-group sums of
squares. Large Eigenvalues imply superior functions.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-9


Statistics Associated with Discriminant Analysis

• F values and their significance. These are calculated from


a one-way ANOVA, with the grouping variable serving as the
categorical independent variable. Each predictor, in turn,
serves as the metric dependent variable in the ANOVA.

• Group means and group standard deviations. These are


computed for each predictor for each group.

• Pooled within-group correlation matrix. The pooled


within-group correlation matrix is computed by averaging
the separate covariance matrices for all the groups.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-10


Statistics Associated with Discriminant Analysis

• Standardized discriminant function coefficients. The


standardized discriminant function coefficients are the discriminant
function coefficients and are used as the multipliers when the variables
have been standardized to a mean of 0 and a variance of 1.

• Structure correlations. Also referred to as discriminant loadings,


the structure correlations represent the simple correlations between
the predictors and the discriminant function.

• Total correlation matrix. If the cases are treated as if they were


from a single sample and the correlations computed, a total correlation
matrix is obtained.

• Wilks'λ . Sometimes also called the U statistic, Wilks' λ for each


predictor is the ratio of the within-group sum of squares to the total
sum of squares. Its value varies between 0 and 1. Large values
of λ (near 1) indicate that group means do not seem to be different.
Small values of λ (near 0) indicate that the group means seem to be
different.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-11


Conducting Discriminant Analysis

Fig. 18.2
Formulate the Problem

Estimate the Discriminant Function Coefficients

Determine the Significance of the Discriminant Function

Interpret the Results

Assess Validity of Discriminant Analysis

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-12


Conducting Discriminant Analysis
Formulate the Problem

• Identify the objectives, the criterion variable, and the


independent variables.
• The criterion variable must consist of two or more mutually
exclusive and collectively exhaustive categories.
• The predictor variables should be selected based on a
theoretical model or previous research, or the experience of
the researcher.
• One part of the sample, called the estimation or analysis
sample, is used for estimation of the discriminant function.
• The other part, called the holdout or validation sample, is
reserved for validating the discriminant function.
• Often the distribution of the number of cases in the analysis
and validation samples follows the distribution in the total
sample.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-13


Information on Resort Visits: Analysis
Sample
Table 18.2

Annual Attitude Importance Household Age of Amount


Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation

1 1 50.2 5 8 3 43 M (2)
2 1 70.3 6 7 4 61 H (3)
3 1 62.9 7 5 6 52 H (3)
4 1 48.5 7 5 5 36 L (1)
5 1 52.7 6 6 4 55 H (3)
6 1 75.0 8 7 5 68 H (3)
7 1 46.2 5 3 3 62 M (2)
8 1 57.0 2 4 6 51 M (2)
9 1 64.1 7 5 4 57 H (3)
10 1 68.1 7 6 5 45 H (3)
11 1 73.4 6 7 5 44 H (3)
12 1 71.9 5 8 4 64 H (3)
13 1 56.2 1 8 6 54 M (2)
14 1 49.3 4 2 3 56 H (3)
15 1 62.0 5 6 2 58 H (3)

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-14


Information on Resort Visits: Analysis Sample

Annual Attitude Importance Household Age of Amount


Table 18.2, cont.

Resort Family Toward Attached Size Head of Spent on


No. Visit Income Travel to Family Household Family
($000) Vacation Vacation

16 2 32.1 5 4 3 58 L (1)
17 2 36.2 4 3 2 55 L (1)
18 2 43.2 2 5 2 57 M (2)
19 2 50.4 5 2 4 37 M (2)
20 2 44.1 6 6 3 42 M (2)
21 2 38.3 6 6 2 45 L (1)
22 2 55.0 1 2 2 57 M (2)
23 2 46.1 3 5 3 51 L (1)
24 2 35.0 6 4 5 64 L (1)
25 2 37.3 2 7 4 54 L (1)
26 2 41.8 5 1 3 56 M (2)
27 2 57.0 8 3 2 36 M (2)
28 2 33.4 6 8 2 50 L (1)
29 2 37.5 3 2 3 48 L (1)
30 2 41.3 3 3 2 42 L (1)

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-15


Information on Resort Visits:
Holdout Sample
Table 18.3
Annual Attitude Importance Household Age of Amount
Resort Family Toward Attached Size Head of Spent on
No. Visit Income Travel to Family Household Family
($000) Vacation Vacation

1 1 50.8 4 7 3 45 M(2)
2 1 63.6 7 4 7 55 H (3)
3 1 54.0 6 7 4 58 M(2)
4 1 45.0 5 4 3 60 M(2)
5 1 68.0 6 6 6 46 H (3)
6 1 62.1 5 6 3 56 H (3)
7 2 35.0 4 3 4 54 L (1)
8 2 49.6 5 3 5 39 L (1)
9 2 39.4 6 5 3 44 H (3)
10 2 37.0 2 6 5 51 L (1)
11 2 54.5 7 3 3 37 M(2)
12 2 38.2 2 2 3 49 L (1)

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-16


Conducting Discriminant Analysis
Estimate the Discriminant Function Coefficients

• The direct method involves estimating the


discriminant function so that all the
predictors are included simultaneously.

• In stepwise discriminant analysis, the


predictor variables are entered sequentially,
based on their ability to discriminate among
groups.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-17


Results of Two-Group Discriminant Analysis
GROUP MEANS
VISIT INCOME TRAVEL VACATION HSIZE AGE

1 60.52000 5.40000 5.80000 4.33333 53.73333


Table 18.4

2 41.91333 4.33333 4.06667 2.80000 50.13333


Total 51.21667 4.86667 4.9333 3.56667 51.93333

Group Standard Deviations

1 9.83065 1.91982 1.82052 1.23443 8.77062


2 7.55115 1.95180 2.05171 .94112 8.27101
Total 12.79523 1.97804 2.09981 1.33089 8.57395

Pooled Within-Groups Correlation Matrix


INCOME TRAVEL VACATION HSIZE AGE

INCOME 1.00000
TRAVEL 0.19745 1.00000
VACATION 0.09148 0.08434 1.00000
HSIZE 0.08887 -0.01681 0.07046 1.00000
AGE - 0.01431 -0.19709 0.01742 -0.04301 1.00000

Wilks' (U-statistic) and univariate F ratio with 1 and 28 degrees of freedom

Variable Wilks' F Significance

INCOME 0.45310 33.800 0.0000


TRAVEL 0.92479 2.277 0.1425
VACATION 0.82377 5.990 0.0209
HSIZE 0.65672 14.640 0.0007
AGE 0.95441 1.338 0.2572 Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-18
Results of Two-Group Discriminant Analysis
Table 18.4, cont.
CANONICAL DISCRIMINANT FUNCTIONS
% of Cum Canonical After Wilks'
Function Eigenvalue Variance % Correlation Function λ Chi-square df Significance
: 0 0 .3589 26.130 5 0.0001
1* 1.7862 100.00 100.00 0.8007 :

* marks the 1 canonical discriminant functions remaining in the analysis.

Standard Canonical Discriminant Function Coefficients


FUNC 1

INCOME 0.74301
TRAVEL 0.09611
VACATION 0.23329
HSIZE 0.46911
AGE 0.20922

Structure Matrix:
Pooled within-groups correlations between discriminating variables & canonical discriminant functions
(variables ordered by size of correlation within function)

FUNC 1

INCOME 0.82202
HSIZE 0.54096
VACATION 0.34607
TRAVEL 0.21337
AGE 0.16354 Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-19
Results of Two-Group Discriminant Analysis
Table 18.4, cont.
Unstandardized Canonical Discriminant Function Coefficients
FUNC 1
INCOME 0.8476710E-01
TRAVEL 0.4964455E-01
VACATION 0.1202813
HSIZE 0.4273893
AGE 0.2454380E-01
(constant) -7.975476
Canonical discriminant functions evaluated at group means (group centroids)

Group FUNC 1
1 1.29118
2 -1.29118
Classification results for cases selected for use in analysis
Predicted Group Membership
Actual Group No. of Cases 1 2

Group 1 15 12 3
80.0% 20.0%

Group 2 15 0 15
0.0% 100.0%
Percent of grouped cases correctly classified: 90.00%

Cont.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-20
Results of Two-Group Discriminant Analysis
Table 18.4, cont.

Classification Results for cases not selected for use in


the analysis (holdout sample)
Predicted Group Membership
Actual Group No. of Cases 1 2
Group 1 6 4 2
66.7% 33.3%
Group 2 6 0 6
0.0% 100.0%
Percent of grouped cases correctly classified: 83.33%.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-21


Conducting Discriminant Analysis
Interpret the Results

• The interpretation of the discriminant weights, or coefficients, is similar


to that in multiple regression analysis.
• Given the multicollinearity in the predictor variables, there is no
unambiguous measure of the relative importance of the predictors in
discriminating between the groups.
• With this caveat in mind, we can obtain some idea of the relative
importance of the variables by examining the absolute magnitude of the
standardized discriminant function coefficients.
• Some idea of the relative importance of the predictors can also be
obtained by examining the structure correlations, also called canonical
loadings or discriminant loadings. These simple correlations between
each predictor and the discriminant function represent the variance that
the predictor shares with the function.
• Another aid to interpreting discriminant analysis results is to develop a
Characteristic profile for each group by describing each group in
terms of the group means for the predictor variables.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-22


Conducting Discriminant Analysis
Assess Validity of Discriminant Analysis

• Many computer programs, such as SPSS, offer a leave-one-


out cross-validation option.
• The discriminant weights, estimated by using the analysis
sample, are multiplied by the values of the predictor
variables in the holdout sample to generate discriminant
scores for the cases in the holdout sample. The cases are
then assigned to groups based on their discriminant scores
and an appropriate decision rule. The hit ratio, or the
percentage of cases correctly classified, can then be
determined by summing the diagonal elements and dividing
by the total number of cases.
• It is helpful to compare the percentage of cases correctly
classified by discriminant analysis to the percentage that
would be obtained by chance. Classification accuracy
achieved by discriminant analysis should be at least 25%
greater than that obtained by chance.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-23
Results of Three-Group Discriminant Analysis
Group Means
AMOUNT INCOME TRAVEL VACATION HSIZE AGE
Table 18.5

1 38.57000 4.50000 4.70000 3.10000 50.30000


2 50.11000 4.00000 4.20000 3.40000 49.50000
3 64.97000 6.10000 5.90000 4.20000 56.00000
Total 51.21667 4.86667 4.93333 3.56667 51.93333

Group Standard Deviations


1 5.29718 1.71594 1.88856 1.19722 8.09732
2 6.00231 2.35702 2.48551 1.50555 9.25263
3 8.61434 1.19722 1.66333 1.13529 7.60117
Total 12.79523 1.97804 2.09981 1.33089 8.57395

Pooled Within-Groups Correlation Matrix


INCOME TRAVEL VACATION HSIZE AGE
INCOME 1.00000
TRAVEL 0.05120 1.00000
VACATION 0.30681 0.03588 1.00000
HSIZE 0.38050 0.00474 0.22080 1.00000
AGE -0.20939 -0.34022 -0.01326 -0.02512 1.00000 Cont.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-24


All-Groups Scattergram

Fig. 18.3

Across: Function 1
Down: Function 2

4.0
1 1
1 *1 3
23 3 *3 3
1 1 12 * 3 3
0.0 1 1 2 2
3
1 2 2
2
-4.0

* indicates a group
centroid

-6.0 -4.0 -2.0 0.0 2.0 4.0 6.0


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-25
Territorial Map

Fig. 18.4

13
13
13 Across: Function 1
8.0 13 Down: Function 2
13 * Indicates a
13
13 group centroid
4.0 11 3
112 3
112233
* 1 1 1 2 2 2 2 3 3*
112 * 223
0.0 121 2 233
1 12 2233
1 12 1 2 2 223
233
-4.0 1122
223
2
112
11122 233
1 121 2 2 2233
112 223
-8.0 11122 233

-8.0 -6.0 -4.0 -2.0 0.0 2.0 4.0 6.0 8.0


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-26
The Logit Model

• The dependent variable is binary and


there are several independent variables
that are metric
• The binary logit model commonly deals
with the issue of how likely is an
observation to belong to each group
(classification n prediction, buyer vs
nonbuyer)
• It estimates the probability of an
observation belonging to a particular
group

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-27


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-28
Binary Logit Model Formulation

The probability of success may be modeled using the logit model as:

 P 
log   = a +a X +a X +... +a X
1 − P 
e 0 1 1 2 2 k k

 P =
 ∑a X
n

log 
1 − P 
Or e i i
i= 0

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-29


Model Formulation

exp( ∑ a X )
P =
i i
i =0

1 + exp(
k

∑ a X i i )
i =0

Where:
P = Probability of success
Xi = Independent variable i
ai = parameter to be estimated.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-30


Properties of the Logit Model

• Although Xi may vary from − ∞ to + ∞ , P is


constrained to lie between 0 and 1.

• When Xi approaches − ∞ , P approaches 0.

• When Xi approaches + ∞ , P approaches 1.

• When OLS regression is used, P is not


constrained to lie between 0 and 1.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-31


Estimation and Model Fit

• The estimation procedure is called the maximum likelihood


method.
• Fit: Cox & Snell R Square and Nagelkerke R Square.
• Both these measures are similar to R2 in multiple
regression.
• The Cox & Snell R Square can not equal 1.0, even if the
fit is perfect.
• This limitation is overcome by the Nagelkerke R Square.
• Compare predicted and actual values of Y to determine the
percentage of correct predictions.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-32


Significance Testing

The significance of the estimated coefficients is based on Wald’s statistic.

Wald = (ai / SEai)2

Where,

ai = logistical coefficient for that predictor variable

SEai= standard error of the logistical coefficient

The Wald statistic is chi-square distributed with 1 degree of freedom if the

variable is metric and the number of categories minus 1 if the variable is

nonmetric.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-33


Interpretation of Coefficients

• If Xi is increased by one unit, the log odds


will change by ai units, when the effect of
other independent variables is held
constant.

• The sign of ai will determine whether the


probability increases (if the sign is positive)
or decreases (if the sign is negative) by this
amount.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-34


Explaining Brand Loyalty

Table 18.6 No.


1
Loyalty
1
Brand
4
Product
3
Shopping
5
2 1 6 4 4
3 1 5 2 4
4 1 7 5 5
5 1 6 3 4
6 1 3 4 5
7 1 5 5 5
8 1 5 4 2
9 1 7 5 4
10 1 7 6 4
11 1 6 7 2
12 1 5 6 4
13 1 7 3 3
14 1 5 1 4
15 1 7 5 5
16 0 3 1 3
17 0 4 6 2
18 0 2 5 2
19 0 5 2 4
20 0 4 1 3
21 0 3 3 4
22 0 3 4 5
23 0 3 6 3
24 0 4 4 2
25 0 6 3 6
26 0 3 6 3
27 0 4 3 2
28 0 3 5 2
29 0 5 5 3
30 0 1 3 2

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-35


Results of Logistic Regression

Table 18.7

Dependent Variable Encoding

Original Value Internal Value


Not Loyal 0
Loyal 1

Model Summary

-2 Log Cox & Snell Nagelkerke R


Step likelihood R Square Square
1 23.471(a) .453 .604
a Estimation terminated at iteration number 6 because parameter estimates changed by less than .001.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-36


Results of Logistic Regression

Table 18.7, cont.


Classification Table a
Predicted

Loyalty to the Brand Percentage


Observed Not Loyal Loyal Correct
Step 1 Loyalty to the Not Loyal 12 3 80.0
Brand Loyal 3 12 80.0
Overall Percentage 80.0

a. The cut value is .500


Variables in the Equation a
B S.E. Wald df Sig. Exp(B)
Step Brand 1.274 .479 7.075 1 .008 3.575
1 Product .186 .322 .335 1 .563 1.205
Shopping .590 .491 1.442 1 .230 1.804
Constant -8.642 3.346 6.672 1 .010 .000

a.Variable(s) entered on step 1: Brand, Product, Shopping.


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-37
Results of Three-Group Discriminant Analysis
Multinomial Logistic regression
Group Means
Table 18.5

AMOUNT INCOME TRAVEL VACATION HSIZE AGE


1 38.57000 4.50000 4.70000 3.10000 50.30000
2 50.11000 4.00000 4.20000 3.40000 49.50000
3 64.97000 6.10000 5.90000 4.20000 56.00000
Total 51.21667 4.86667 4.93333 3.56667 51.93333

Group Standard Deviations


1 5.29718 1.71594 1.88856 1.19722 8.09732
2 6.00231 2.35702 2.48551 1.50555 9.25263
3 8.61434 1.19722 1.66333 1.13529 7.60117
Total 12.79523 1.97804 2.09981 1.33089 8.57395

Pooled Within-Groups Correlation Matrix


INCOME TRAVEL VACATION HSIZE AGE
INCOME 1.00000
TRAVEL 0.05120 1.00000
VACATION 0.30681 0.03588 1.00000
HSIZE 0.38050 0.00474 0.22080 1.00000 Cont.
AGE © 2010 Pearson
Copyright -0.20939 -0.34022as Prentice
Education, Inc. publishing -0.01326
Hall -0.02512 1.00000 18-38
Chapter Nineteen

Factor Analysis
The company asked 20 questions
about casual dining and lifestyle.
How are these questions related
to one another? What are the
important dimensions or factors?

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-39


Factor Analysis

• Factor analysis is a general name denoting a class of


procedures primarily used for data reduction and summarization.
• Factor analysis is an interdependence technique in that an
entire set of interdependent relationships is examined without
making the distinction between dependent and independent
variables.
• Factor analysis is used in the following circumstances:
• To identify underlying dimensions, or factors, that explain
the correlations among a set of variables.
• To identify a new, smaller, set of uncorrelated variables to
replace the original set of correlated variables in subsequent
multivariate analysis (regression or discriminant analysis).
• To identify a smaller set of salient variables from a larger set
for use in subsequent multivariate analysis.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-40
Factors Underlying Selected Psychographics
and Lifestyles

Fig. 19.1
Factor 2

Football Baseball

Evening at home
Factor 1
Go to a party
Home is best place

Plays
Movies

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-41


Statistics Associated with Factor Analysis

• Bartlett's test of sphericity. Bartlett's test of sphericity is a


test statistic used to examine the hypothesis that the
variables are uncorrelated in the population. In other words,
the population correlation matrix is an identity matrix; each
variable correlates perfectly with itself (r = 1) but has no
correlation with the other variables (r = 0).

• Correlation matrix. A correlation matrix is a lower triangle


matrix showing the simple correlations, r, between all
possible pairs of variables included in the analysis. The
diagonal elements, which are all 1, are usually omitted.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-42


Statistics Associated with Factor Analysis

• Communality. Communality is the amount of variance a


variable shares with all the other variables being considered.
This is also the proportion of variance explained by the
common factors.
• Eigenvalue. The eigenvalue represents the total variance
explained by each factor.
• Factor loadings. Factor loadings are simple correlations
between the variables and the factors.
• Factor loading plot. A factor loading plot is a plot of the
original variables using the factor loadings as coordinates.
• Factor matrix. A factor matrix contains the factor loadings of
all the variables on all the factors extracted.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-43


Statistics Associated with Factor Analysis

• Factor scores. Factor scores are composite scores estimated


for each respondent on the derived factors.
• Kaiser-Meyer-Olkin (KMO) measure of sampling
adequacy. The Kaiser-Meyer-Olkin (KMO) measure of
sampling adequacy is an index used to examine the
appropriateness of factor analysis. High values (between 0.5
and 1.0) indicate factor analysis is appropriate. Values below
0.5 imply that factor analysis may not be appropriate.
• Percentage of variance. The percentage of the total
variance attributed to each factor.
• Residuals are the differences between the observed
correlations, as given in the input correlation matrix, and the
reproduced correlations, as estimated from the factor matrix.
• Scree plot. A scree plot is a plot of the Eigenvalues against
the number of factors in order of extraction.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-44
Conducting Factor Analysis

Table 19.1
RESPONDENT
NUMBER V1 V2 V3 V4 V5 V6
1 7.00 3.00 6.00 4.00 2.00 4.00
2 1.00 3.00 2.00 4.00 5.00 4.00
3 6.00 2.00 7.00 4.00 1.00 3.00
4 4.00 5.00 4.00 6.00 2.00 5.00
5 1.00 2.00 2.00 3.00 6.00 2.00
6 6.00 3.00 6.00 4.00 2.00 4.00
7 5.00 3.00 6.00 3.00 4.00 3.00
8 6.00 4.00 7.00 4.00 1.00 4.00
9 3.00 4.00 2.00 3.00 6.00 3.00
10 2.00 6.00 2.00 6.00 7.00 6.00
11 6.00 4.00 7.00 3.00 2.00 3.00
12 2.00 3.00 1.00 4.00 5.00 4.00
13 7.00 2.00 6.00 4.00 1.00 3.00
14 4.00 6.00 4.00 5.00 3.00 6.00
15 1.00 3.00 2.00 2.00 6.00 4.00
16 6.00 4.00 6.00 3.00 3.00 4.00
17 5.00 3.00 6.00 3.00 3.00 4.00
18 7.00 3.00 7.00 4.00 1.00 4.00
19 2.00 4.00 3.00 3.00 6.00 3.00
20 3.00 5.00 3.00 6.00 4.00 6.00
21 1.00 3.00 2.00 3.00 5.00 3.00
22 5.00 4.00 5.00 4.00 2.00 4.00
23 2.00 2.00 1.00 5.00 4.00 4.00
24 4.00 6.00 4.00 6.00 4.00 7.00
25 6.00 5.00 4.00 2.00 1.00 4.00
26 3.00 5.00 4.00 6.00 4.00 7.00
27 4.00 4.00 7.00 2.00 2.00 5.00
28 3.00 7.00 2.00 6.00 4.00 3.00
29 4.00 6.00 3.00 7.00 2.00 7.00
30 2.00 3.00 2.00 4.00 7.00 2.00

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-45


Correlation Matrix

Table 19.2
Variables V1 V2 V3 V4 V5 V6
V1 1.000
V2 -0.530 1.000
V3 0.873 -0.155 1.000
V4 -0.086 0.572 -0.248 1.000
V5 -0.858 0.020 -0.778 -0.007 1.000
V6 0.004 0.640 -0.018 0.640 -0.136 1.000

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-46


Conducting Factor Analysis:
Determine the Method of Factor Analysis

• In principal components analysis, the total variance in the


data is considered. The diagonal of the correlation matrix
consists of unities, and full variance is brought into the factor
matrix. Principal components analysis is recommended when
the primary concern is to determine the minimum number of
factors that will account for maximum variance in the data for
use in subsequent multivariate analysis. The factors are called
principal components.

• In common factor analysis, the factors are estimated based


only on the common variance. Communalities are inserted in
the diagonal of the correlation matrix. This method is
appropriate when the primary concern is to identify the
underlying dimensions and the common variance is of interest.
This method is also known as principal axis factoring.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-47


Results of Principal Components Analysis

Table 19.3
Communalities
Variables Initial Extraction
V1 1.000 0.926
V2 1.000 0.723
V3 1.000 0.894
V4 1.000 0.739
V5 1.000 0.878
V6 1.000 0.790

Initial Eigen values


Factor Eigen value % of variance Cumulat. %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
6 0.085 1.420 100.000

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-48


Results of Principal Components Analysis

Table 19.3, cont.


Extraction Sums of Squared Loadings
Factor Eigen value % of variance Cumulat. %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
Factor Matrix
Variables Factor 1 Factor 2
V1 0.928 0.253
V2 -0.301 0.795
V3 0.936 0.131
V4 -0.342 0.789
V5 -0.869 -0.351
V6 -0.177 0.871

Rotation Sums of Squared Loadings


Factor Eigenvalue % of variance Cumulat. %
1 2.688 44.802 44.802
2 2.261 37.687 82.488

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-49


Results of Principal Components Analysis

Table 19.3, cont.


Rotated Factor Matrix
Variables Factor 1 Factor 2
V1 0.962 -0.027
V2 -0.057 0.848
V3 0.934 -0.146
V4 -0.098 0.845
V5 -0.933 -0.084
V6 0.083 0.885

Factor Score Coefficient Matrix


Variables Factor 1 Factor 2
V1 0.358 0.011
V2 -0.001 0.375
V3 0.345 -0.043
V4 -0.017 0.377
V5 -0.350 -0.059
V6 0.052 0.395
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-50
Conducting Factor Analysis: Rotate Factors

• Although the initial or unrotated factor matrix


indicates the relationship between the factors and
individual variables, it seldom results in factors that
can be interpreted, because the factors are
correlated with many variables. Therefore, through
rotation, the factor matrix is transformed into a
simpler one that is easier to interpret.
• In rotating the factors, we would like each factor to
have nonzero, or significant, loadings or
coefficients for only some of the variables.
Likewise, we would like each variable to have
nonzero or significant loadings with only a few
factors, if possible with only one.
• The rotation is called orthogonal rotation if the
axes are maintained at right angles.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-51
Conducting Factor Analysis: Rotate Factors

• The most commonly used method for rotation is


the varimax procedure. This is an orthogonal
method of rotation that minimizes the number of
variables with high loadings on a factor, thereby
enhancing the interpretability of the factors.
Orthogonal rotation results in factors that are
uncorrelated.
• The rotation is called oblique rotation when the
axes are not maintained at right angles, and the
factors are correlated. Sometimes, allowing for
correlations among factors can simplify the factor
pattern matrix. Oblique rotation should be used
when factors in the population are likely to be
strongly correlated.
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-52
Factor Matrix Before and After Rotation

Fig. 19.5

Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
(a) (b)
High Loadings High Loadings
Before Rotation After Rotation
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-53
Conducting Factor Analysis: Interpret Factors

• A factor can then be interpreted in terms of


the variables that load high on it.

• Another useful aid in interpretation is to plot


the variables, using the factor loadings as
coordinates. Variables at the end of an axis
are those that have high loadings on only
that factor, and hence describe the factor.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-54


Chapter Twenty

Cluster Analysis

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-55


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-56
Chapter Outline

1) Overview
2) Basic Concept (e.g., segmentation without prior
known groups)
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-57


Cluster Analysis

• Cluster analysis is a class of techniques used to classify


objects or cases into relatively homogeneous groups called
clusters. Objects in each cluster tend to be similar to each
other and dissimilar to objects in the other clusters.
Cluster analysis is also called classification analysis, or
numerical taxonomy.
• Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant
analysis requires prior knowledge of the cluster or group
membership for each object or case included, to develop
the classification rule. In contrast, in cluster analysis there
is no a priori information about the group or cluster
membership for any of the objects. Groups or clusters are
suggested by the data, not defined a priori.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-58


A Practical Clustering Situation

Fig. 20.2

Variable 1

X
Variable 2

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-59


An Ideal Clustering Situation

Fig. 20.1

Variable 1

Variable 2

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-60


Statistics Associated with Cluster Analysis

• Agglomeration schedule. An agglomeration schedule


gives information on the objects or cases being combined at
each stage of a hierarchical clustering process.

• Cluster centroid. The cluster centroid is the mean values


of the variables for all the cases or objects in a particular
cluster.

• Cluster centers. The cluster centers are the initial starting


points in nonhierarchical clustering. Clusters are built
around these centers, or seeds.

• Cluster membership. Cluster membership indicates the


cluster to which each object or case belongs.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-61


Statistics Associated with Cluster Analysis

• Dendrogram. A dendrogram, or tree graph, is a graphical


device for displaying clustering results. Vertical lines
represent clusters that are joined together. The position of
the line on the scale indicates the distances at which
clusters were joined. The dendrogram is read from left to
right. Figure 20.8 is a dendrogram.

• Distances between cluster centers. These distances


indicate how separated the individual pairs of clusters are.
Clusters that are widely separated are distinct, and
therefore desirable.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-62


Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
Table 20.1

1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7 2
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-63
Conducting Cluster Analysis:
Select a Distance or Similarity Measure

• The most commonly used measure of similarity is the Euclidean


distance or its square. The Euclidean distance is the square root of
the sum of the squared differences in values for each variable. Other
distance measures are also available. The city-block or Manhattan
distance between two objects is the sum of the absolute differences
in values for each variable. The Chebychev distance between two
objects is the maximum absolute difference in values for any
variable.
• If the variables are measured in vastly different units, the clustering
solution will be influenced by the units of measurement. In these
cases, before clustering respondents, we must standardize the data
by rescaling each variable to have a mean of zero and a standard
deviation of unity. It is also desirable to eliminate outliers (cases
with atypical values).
• Use of different distance measures may lead to different clustering
results. Hence, it is advisable to use different measures and compare
the results.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-64


A Classification of Clustering Procedures

Fig. 20.4 Clustering Procedures

Hierarchical Nonhierarchical Other

Agglomerative Divisive Two-Step

Linkage Variance Centroid Sequential Parallel Optimizing


Methods Methods Methods Threshold Threshold Partitioning

Ward’s
Method

Single Complete Average


Linkage Linkage Linkage
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-65
Other Agglomerative Clustering Methods
Fig. 20.6
Ward’s Procedure

Centroid Method

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-66


Results of Hierarchical Clustering

Table 20.2
Agglomeration Schedule Using Ward’s Procedure
Stage cluster
Clusters combined first appears
Stage Cluster 1 Cluster 2 Coefficient Cluster 1 Cluster 2 Next stage
1 14 16 1.000000 0 0 6
2 6 7 2.000000 0 0 7
3 2 13 3.500000 0 0 15
4 5 11 5.000000 0 0 11
5 3 8 6.500000 0 0 16
6 10 14 8.160000 0 1 9
7 6 12 10.166667 2 0 10
8 9 20 13.000000 0 0 11
9 4 10 15.583000 0 6 12
10 1 6 18.500000 6 7 13
11 5 9 23.000000 4 8 15
12 4 19 27.750000 9 0 17
13 1 17 33.100000 10 0 14
14 1 15 41.333000 13 0 16
15 2 5 51.833000 3 11 18
16 1 3 64.500000 14 5 19
17 4 18 79.667000 12 0 18
18 2 4 172.662000 15 17 19
19 1 2 328.600000 16 18 0
Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-67
Conducting Cluster Analysis:
Decide on the Number of Clusters

• Theoretical, conceptual, or practical considerations may


suggest a certain number of clusters.
• In hierarchical clustering, the distances at which clusters
are combined can be used as criteria. This information can
be obtained from the agglomeration schedule or from the
dendrogram.
• In nonhierarchical clustering, the ratio of total within-group
variance to between-group variance can be plotted against
the number of clusters. The point at which an elbow or a
sharp bend occurs indicates an appropriate number of
clusters.
• The relative sizes of the clusters should be meaningful.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-68


Conducting Cluster Analysis:
Interpreting and Profiling the Clusters

• Interpreting and profiling clusters involves


examining the cluster centroids. The
centroids enable us to describe each cluster
by assigning it a name or label.

• It is often helpful to profile the clusters in


terms of variables that were not used for
clustering. These may include demographic,
psychographic, product usage, media usage,
or other variables.

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-69


Cluster Distribution

Table 20.5, cont.

% of
N Combined % of Total
Cluster 1 6 30.0% 30.0%
2 6 30.0% 30.0%
3 8 40.0% 40.0%
Combined 20 100.0% 100.0%
Total 20 100.0%

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-70


Cluster Profiles

Table 20.5, cont.

Fun Bad for Budget Eating Out


Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation
Cluster 1 1.67 .516 3.00 .632 1.83 .753
2 3.50 .548 5.83 .753 3.33 .816
3 5.75 1.035 3.63 .916 6.00 1.069
Combined 3.85 1.899 4.10 1.410 3.95 2.012

Best Buys Don't Care Compare Prices


Mean Std. Deviation Mean Std. Deviation Mean Std. Deviation
3.50 1.049 5.50 1.049 3.33 .816
6.00 .632 3.50 .837 6.00 1.549
3.13 .835 1.88 .835 3.88 .641
4.10 1.518 3.45 1.761 4.35 1.496

Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-71


Copyright © 2010 Pearson Education, Inc. publishing as Prentice Hall 18-72

You might also like