Hierarchal Clustering
Hierarchal Clustering
Hierarchical clustering
• Build a hierarchy of clusters.
• Unlike other clustering methods (like k-means, which requires specifying the number of clusters in
advance)
• Hierarchical clustering automatically creates a structure in which clusters are successively nested within
each other.
Point Feature 1 Feature 2
Hierarchal Clustering A 1 1
B 2 1
Step 1: Calculate Pairwise Distances C 4 3
D 5 4
𝑑( 𝐴 , 𝐵)=√ ( 𝐹 1𝐵 − 𝐹 1 𝐴 ) +( 𝐹 2𝐵 − 𝐹 2 𝐴 )
2 2
E 3 5
1 𝑑 ( 𝐵 , 𝐷 )=4.242
𝑑 ( 𝐴 ,𝐶 ) =3.605 𝑑 ( 𝐵 , 𝐸 )=4.123 A B C D E
A 0 1 3.605 5 4.472
B 1 0 2.828 4.242 4.123
𝑑 ( 𝐴 , 𝐷 )=5 𝑑 ( 𝐶 , 𝐷 )=1.414 C 3.605 2.828 0 1.414 2.236
D 5 4.242 1.414 0 2.236
E 4.472 4.123 2.236 2.236 0
𝑑 ( 𝐴 , 𝐸 )=4.472 𝑑 ( 𝐶 , 𝐸 )=2.236
𝑑 ( 𝐵 , 𝐶 )=2.828 𝑑 ( 𝐷 , 𝐸 ) =2.236 A B C D E
A B C D E
Hierarchal Clustering A
B
0
1
1
0
3.605
2.828
5
4.242
4.472
4.123
C 3.605 2.828 0 1.414 2.236
Step 2: Start Clustering (Single Linkage) D 5 4.242 1.414 0 2.236
E 4.472 4.123 2.236 2.236 0
2.1) Initial step: Each point is its own cluster. We start with the clusters:
2.2) Find the closest pair of clusters: The closest pair is A and B
with a distance of 1. Merge them into one cluster
{A, B} C D E
Update the distance matrix by keeping the minimum distance
between the new cluster and all other clusters (single linkage):
{A, B} 0 2.828 4.242 4.123
C 2.828 0 1.414 2.236
Distance between D 4.242 1.414 0 2.236
E 4.123 2.236 2.236 0
𝑀𝑖𝑛 ¿
Distance between
𝑀𝑖𝑛 ¿
Distance between
A B C D E
𝑀𝑖𝑛 ¿
{A, B} C D E
Hierarchal Clustering {A, B}
C
0
2.828
2.828
0
4.242
1.414
4.123
2.236
D 4.242 1.414 0 2.236
Step 2: Start Clustering (Single Linkage) E 4.123 2.236 2.236 0
2.3) Find the closest pair of clusters: The closest pair is A and B
with a distance of 1. Merge them into one cluster
{ 𝐴 , 𝐵} , {𝐶 , 𝐷 } , {𝐸 }
{A, B} {C, D} E
{A, B} 0 2.828 4.123
Update the distance matrix by keeping the minimum distance {C, D} 2.828 0 2.236
between the new cluster and all other clusters (single linkage): E 4.123 2.236 0
Distance between
M in ( d (C, A, B), d ( D, A, B)) = Min(2.828, 4.242) = 2.828
Distance between
M in ( d (C, E), d (D, E)) = Min (2.236, 2.236) = 2.236
A B C D E
{A, B} {C, D} E
Hierarchal Clustering {A, B}
{C, D}
0
2.828
2.828
0
4.123
2.236
E 4.123 2.236 0
Step 2: Start Clustering (Single Linkage)
2.4) Find the closest pair of clusters: The next closest pair is C,
D and E with a distance of 2.236. We merge them:
{ 𝐴 , 𝐵 } , { 𝐶 ,𝐷 , 𝐸 }
{A, B} {C, D, E}
{A, B} 0 2.828
{C, D, E} 2.828 0
Update the distance matrix by keeping the minimum distance
between the new cluster and all other clusters (single linkage):
Distance between
M in ( d( C, D, E), d ( A, B)) = M in (2.828, 4.123) = 2.828
A B C D E
Hierarchal Clustering
Linkage
Definition Characteristics
Method
Minimum distance between Can form elongated clusters;
Single Linkage
clusters sensitive to noise
Complete Maximum distance between Produces compact clusters;
Linkage clusters less sensitive to outliers
Balances single and complete
Average Average distance between all
linkage; moderate cluster
Linkage points
shapes
Sensitive to cluster shape;
Centroid
Distance between centroids may yield non-intuitive
Linkage
results
Hierarchal Clustering
• Hierarchical structure is needed: Useful for building nested groupings, like taxonomy (e.g.,
species into genera).
• Unknown number of clusters: No need to predefine the number of clusters; use the dendrogram to
decide.
• Visualizing cluster relationships: Dendrogram provides a clear visualization of how clusters form
at different levels.
• Small to medium-sized datasets: Suitable for datasets with a few hundred points due to
computational complexity.
• Clusters of varying shapes and sizes: Handles irregular, non-spherical clusters better than
methods like k-means.
• When outliers are not a concern: Sensitive to outliers, so should be used when data is relatively
clean.
Comparative Analysis
Feature DBSCAN Hierarchical Clustering K-Means
Cluster Shape Arbitrary-shaped clusters Handles clusters of various shapes Spherical or circular clusters
Number of
Automatically determined No need to predefine Must be specified in advance
Clusters
Efficient with large
Scalability Best for small to medium datasets Efficient for large datasets
datasets
Handling Identifies and separates
Sensitive to outliers Sensitive to outliers
Outliers outliers
Based on neighborhood
Distance Metric Flexible (various linkage types) Based on Euclidean distance
radius (ε)
ε (radius) and minPts (min
Parameters No parameters required initially k (number of clusters)
points)
Data Size Large datasets Small to medium datasets Large datasets
Handles clusters of Can handle clusters of different Tends to form clusters of
Cluster Size
different sizes sizes similar size
Can effectively separate
Handling Noise Less robust to noise Poor handling of noise
noise
Initialization No random initialization No random initialization Randomly initializes centroids
Thank You