
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Scipy Cluster Hierarchy: Cut Hierarchical Clustering into Flat Clustering
The scipy.cluster.hierarchy module provides functions for hierarchical clustering and its types such as agglomerative clustering. It has various routines which we can use to −
Cut hierarchical clustering into the flat clustering.
Implement agglomerative clustering.
Compute statistics on hierarchies
Visualize flat clustering.
To check isomorphism of two flat cluster assignments.
Plot the clusters.
The routine scipy.cluster.hierarchy.fcluster is used to cut hierarchical clustering into flat clustering, which they obtain as a result an assignment of the original data point to single clusters. Let’s understand the concept with the help of below given example −
Example
#Importing the packages from scipy.cluster.hierarchy import ward, fcluster from scipy.spatial.distance import pdist #The cluster linkage method i.e., scipy.cluster.hierarchy.ward will generate a linkage matrix as their output: A = [ [0, 0], [0, 1], [1, 0], [0, 3], [0, 2], [1, 4], [3, 0], [2, 0], [4, 1], [3, 3], [2, 3], [4, 3] ] X = ward(pdist(A)) print(X)
Output
[[ 0. 1. 1. 2. ] [ 2. 7. 1. 2. ] [ 3. 4. 1. 2. ] [ 9. 10. 1. 2. ] [ 6. 8. 1.41421356 2. ] [11. 15. 1.73205081 3. ] [ 5. 14. 2.081666 3. ] [12. 13. 2.23606798 4. ] [16. 17. 3.94968353 5. ] [18. 19. 5.15012714 7. ] [20. 21. 6.4968857 12. ]]
The matrix X as received in the above output represents a dendrogram. In this dendrogram the first and second elements are the two clusters which merged at each step. The distance between these clusters is given by the third element of above dendrogram. The size of the new cluster is provided by the fourth element.
#Flatting the dendrogram by using fcluster() where the assignation of the original data points to single clusters mostly depend on the distance threshold t. fcluster(X, t=1.5, criterion='distance') #when t= 1.5
Output
array([6, 6, 7, 4, 4, 5, 1, 7, 1, 2, 2, 3], dtype=int32)
Example
fcluster(X, t=0.9, criterion='distance') #when t= 0.9
Output
array([ 9, 10, 11, 6, 7, 8, 1, 12, 2, 3, 4, 5], dtype=int32)
Example
fcluster(X, t=9, criterion='distance') #when t= 9
Output
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)