Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
Cluster Analysis
Cluster analysis is a class of techniques used to
classify objects or cases into relatively homogeneous
groups called clusters. Objects in each cluster tend
to be similar to each other and dissimilar to objects
in the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.
Both cluster analysis and discriminant analysis are
concerned with classification. However, discriminant
analysis requires prior knowledge of the cluster or
group membership for each object or case included,
to develop the classification rule. In contrast, in
cluster analysis there is no a priori information about
the group or cluster membership for any of the
objects. Groups or clusters are suggested by the
data, not defined a priori.
20-2
•10
•1 •2
•11 •12 •6
•3 •5 Branches
•7 •4
•9
Variable 1
Variable 2
20-5
Variable 1
X
Variable 2
20-6
Agglomerative
01
02
Observation No.
03
04
05
06
07
08
0
1 2 3 4 5 6 7
Divisive
Conducting Cluster Analysis
20-13
Cluster 1 Cluster 2
Complete Linkage
Maximum
Distance
Cluster 1 Cluster 2
Average Linkage
Average Distance
Cluster 1 Cluster 2
20-15
20-16
Longest
Shortest
Complete Linkage
Single Linkage
Conducting Cluster Analysis
20-17
Centroid Method
Conducting Cluster Analysis
20-19
Clustering Variables
In this instance, the units used for analysis are the
variables, and the distance measures are computed
for all pairs of variables.
Hierarchical clustering of variables can aid in the
identification of unique variables, or variables that
make a unique contribution to the data.
Clustering can also be used to reduce the number of
variables. Associated with each cluster is a linear
combination of the variables in the cluster, called the
cluster component. A large set of variables can often
be replaced by the set of cluster components with
little loss of information. However, a given number
of cluster components does not generally explain as
much variance as the same number of principal
components.
20-25
Ajit 0 0 2 7
Balu 0 0 2 7
Chandra 2 2 0 5
Dilip 7 7 5 0
20-27
Ajit 0 0 2 7
Balu 0 0 2 7
Chandra 2 2 0 5
Dilip 7 7 5 0
Ajit 0 0 4 49
Balu 0 0 4 49
Chandra 4 4 0 25
Dilip 49 49 25 0
20-28
X1 X2 X3 X4 X5
R1
R2
R3
R4
R1 R2 R3 R4
R1
R2
R3
R4
20-29
X1 X2
R1 25 11
R2 33 11
R3 34 13
. R4 35 18
R1 R2 R3 R4 R1 R2 R3 R4
R1 0 64 81 100 R1 0 0 4 49
R2 64 0 1 4 R2 0 0 4 49
R3 81 1 0 1 R3 4 4 0 25
R4 100 4 1 0
+ R4 49 49 25 0
R1 R2 R3 R4
R1 0 64 85 149
R2 64 0 5 53
= R3 85 5 0 26
R4 149 53 26 0
20-30
Ajit
Balu
Chandra
Dilip
Eswar
Farook
Govind
Hari
Indira
Kumar
20-31
Chandra 36 16 0 9 64 81
Dilip 81 49 9 0 25 36
Ajit 0 4 36 81 196
Balu 4 0 16 49 144
Chandra 36 16 0 9 64
Dilip 81 49 9 0 25
Ajit 0 4 36 81 196
Balu 4 0 16 49 144
Chandra 36 16 0 9 64
Dilip 81 49 9 0 25
Chandra 16 0 9 64
Dilip 49 9 0 25
.
Ajit & Balu Chandra & Dilip Eswar & Farook
Respondents
Clustering
Varaibles
A B C D E F G
V1 3 4 4 2 6 7 6
V2 2 5 7 7 6 7 4
Scatterplot
10
9
8
D C F
7
E
6
V2
B
5
4
G
3
A
2
1
0
0 1 2 3 4 5 6 7 8 9 10
V1
20-36
Observation
Observation A B C D E F G
A
B 3.162
C 5.099 2.000
D 5.099 2.828 2.000
E 5.000 2.236 2.236 4.123
F 6.403 3.606 3.000 5.000 1.414
G 3.606 2.306 3.606 5.000 2.000 3.162
20-37
10
9
8
D 3 C 1 F
7
E
6
4 2
B
5
4
G
5
3
A 6
2
1
0
0 1 2 3 4 5 6 7 8 9 10
Observation 20-39
Observation A B C D E F G
A
B 3.162
C 5.099 2.000
D 5.099 2.828 2.000
E 5.000 2.236 2.236 4.123
F 6.403 3.606 3.000 5.000 1.414
G 3.606 2.306 3.606 5.000 2.000 3.162
0 1 2 3 4 5 6 7 8 9 10
D 3 C 1F
E
4 B 2
G
5
A 6
0 1 2 3 4 5 6 7 8 9 10
A
B
4 6
C
Observation
3
D
5
E
1
F 2
G
0 1 2 3 4
Distance at Combination
Problem: Page 590
Clustering of consumers based on attitudes
towards shopping:
Six attitudinal variables and 20 respondents
V1: Shopping is a fun
V2:Shopping is bad for your budget
V3:I combine shopping with eating out
V4: I try to get the best buys when shopping
V5: I do not care about shopping
V6: You can save a lot of money by
comparing prices.
20-42
1 1 1 1
2 2 2 2
3 1 1 1
4 3 3 2
5 2 2 2
6 1 1 1
7 1 1 1
8 1 1 1
9 2 2 2
10 3 3 2
11 2 2 2
12 1 1 1
13 2 2 2
14 3 3 2
15 1 1 1
16 3 3 2
17 1 1 1
18 4 3 2
19 3 3 2
20 2 2 2
20-45
Cluster Centroids
Table 20.3
Means of Variables
Cluster No. V1 V2 V3 V4 V5 V6
Cluster
1 2 3
V1 4 2 6
V2 6 3 4
V3 3 2 6
V4 6 4 3
V5 4 6 2
V6 6 3 4
Cluster 1 2 3
1 5.568 5.698
2 5.568 6.928
3 5.698 6.928
20-50
SPSS Windows
To select this procedures using SPSS for Windows click:
Analyze>Classify>Hierarchical Cluster …
Analyze>Classify>K-Means Cluster …