Hierarchical Clustering with Scikit-Learn

Hierarchical Clustering is an unsupervised learning technique that groups data into a hierarchy of clusters based on similarity. It builds a tree-like structure called a dendrogram, which helps visualise relationships and decide the optimal number of clusters.

Does not require pre-selecting the number of clusters
Uses agglomerative (bottom up) or divisive (top down) approaches
Commonly applied in data exploration and pattern discovery
Widely used in pattern recognition, customer segmentation and image grouping

Types-of-Hierarchical-Clustering — Hierarchical Clustering

Implementing Agglomerative Hierarchical Clustering

Scikit Learn provides a straightforward implementation of Agglomerative hierarchical clustering through the Agglomerative Clustering class.

Step 1: Import Required Libraries

Here we will import numpy, pandas, matplotlib and scikit learn for its implementation.

Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
from scipy.cluster.hierarchy import dendrogram, linkage

Step 2: Load the Dataset

Each row represents an image flattened into numerical features.
No labels are used during clustering this is purely unsupervised.

Python

digits = load_digits()
X = digits.data

Step 3: Feature Scaling

Feature scaling matters in Hierarchical Clustering because the algorithm relies on distance calculations and is highly sensitive to feature magnitudes.

Python

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Visualizing the Dendrogram

Before selecting the number of clusters, we visualize the hierarchy using a dendrogram.

Python

linked = linkage(X_scaled, method='ward')

plt.figure(figsize=(12, 6))
dendrogram(
    linked,
    truncate_mode='lastp',
    p=30
)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Cluster Size")
plt.ylabel("Distance")
plt.show()

Output:

Step 5: Building the Hierarchical Clustering Model

Based on dendrogram inspection, we choose a reasonable number of clusters.

Python

hc_model = AgglomerativeClustering(
    n_clusters=10,
    linkage='ward'
)

Step 6: Fit the Model and Assign Clusters

This step fits the hierarchical clustering model to the scaled data and assigns a cluster label to each data point. Each label represents the cluster formed from the hierarchical structure defined by the dendrogram.

Python

cluster_labels = hc_model.fit_predict(X_scaled)

Step 7: Cluster Distribution Analysis

This step shows how data points are distributed across clusters. It provides quick insight into cluster balance, helps detect over fragmentation and highlights dominant groupings, making it an important checkpoint before using the clusters downstream.

Python

pd.Series(cluster_labels).value_counts()

Output:

Heirarchial — Points assigned in each cluster

Step 8: Evaluating Clustering Quality

The Silhouette Score evaluates how well clusters are formed by comparing cohesion within clusters to separation between clusters.
Scores closer to +1 indicate well defined clusters, while values near 0 or negative suggest overlap or poor grouping.

Python

sil_score = silhouette_score(X_scaled, cluster_labels)
print("Score :", round(sil_score, 2))

Output:

Score : 0.13

Implementing Divisive Hierarchical Clustering

Scikit Learn does not provide a dedicated library or built in API for divisive clustering. Instead, the approach is implemented manually by applying a top down recursive splitting strategy, most commonly using K Means clustering from scikit learn to divide clusters step by step.

Step 1: Import Required Libraries

Here we will import numpy and scikit learn library.

Python

from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

Step 2: Load the Dataset

Each row represents an image flattened into numerical features.
No labels are used during clustering, this is purely unsupervised.

Python

digits = load_digits()
X = digits.data

Step 3: Feature Scaling

Feature scaling ensures stable and meaningful partitioning.

Python

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Defining Divisive Clustering Function

This function applies a top down divisive clustering strategy, where the dataset is repeatedly split into smaller clusters to reveal finer patterns.

Starts by treating the entire dataset as one cluster
Splits the cluster into two using K Means
Recursively repeats the same process on each sub cluster
Stops when the maximum depth or minimum cluster size is reached

Python

def divisive_clustering(X, depth=0, max_depth=3, min_size=50):
  
    if depth == max_depth or len(X) <= min_size:
        return [X]

    
    kmeans = KMeans(n_clusters=2, random_state=42)
    labels = kmeans.fit_predict(X)

    cluster_1 = X[labels == 0]
    cluster_2 = X[labels == 1]

    
    return (
        divisive_clustering(cluster_1, depth + 1, max_depth, min_size) +
        divisive_clustering(cluster_2, depth + 1, max_depth, min_size)
    )

Step 5: Execute Divisive Clustering

This step applies divisive clustering to the scaled data using the defined depth and minimum size.

Python

final_clusters = divisive_clustering(
    X_scaled,
    max_depth=3,
    min_size=50
)

Step 6: Analyze Cluster Sizes

This step calculates the number of data points in each final cluster to understand how the data has been split.

Python

cluster_sizes = [len(cluster) for cluster in final_clusters]
cluster_sizes

Output:

[39, 568, 643, 369, 178]

Step 7: Assign Flat Cluster Labels

This step transforms the hierarchical, tree based clustering output into flat cluster labels, which are required by most machine learning pipelines to enable evaluation, visualization and deployment.

Python

labels = np.empty(X_scaled.shape[0], dtype=int)

start = 0
for i, cluster in enumerate(final_clusters):
    size = len(cluster)
    labels[start:start + size] = i
    start += size

Step 8: Evaluate Cluster Quality

The Silhouette Score evaluates how well clusters are formed by comparing cohesion within clusters to separation between clusters.
Scores closer to +1 indicate well defined clusters, while values near 0 or negative suggest overlap or poor grouping.

Python

sil_score = silhouette_score(X_scaled, labels)
print("Score :", round(sil_score, 2))

Output:

Score : -0.04

Step 9: Visualizing Divisive Hierarchical Clustering Tree

Now we will visualize the divisive clustering process as a tree.

Begins with the entire dataset as a single cluster (Root).
Divides it into two clusters using KMeans.
Recursively splits each resulting cluster into smaller groups.
Stops when the maximum depth is reached or the cluster size becomes too small.

Python

def build_tree(X, depth=0, max_depth=3, min_size=50, node_name="Root"):
    
    if depth == max_depth or len(X) <= min_size:
        return {
            "name": f"{node_name}\n(size={len(X)})",
            "children": []
        }

    kmeans = KMeans(n_clusters=2, random_state=42)
    labels = kmeans.fit_predict(X)

    cluster_1 = X[labels == 0]
    cluster_2 = X[labels == 1]

    return {
        "name": f"{node_name}\n(size={len(X)})",
        "children": [
            build_tree(cluster_1, depth + 1, max_depth, min_size, node_name + " → C1"),
            build_tree(cluster_2, depth + 1, max_depth, min_size, node_name + " → C2")
        ]
    }

tree = build_tree(X_scaled, max_depth=3, min_size=50)

Now, Assigns X and Y positions to each node, where depth controls the vertical placement and child nodes are distributed horizontally to maintain proper spacing.

Python

def compute_positions(node, depth=0, x=0, positions=None, width=8):
    
    if positions is None:
        positions = {}

    positions[node["name"]] = (x, -depth)

    children = node["children"]
    if children:
        dx = width / len(children)
        start_x = x - width/2 + dx/2

        for i, child in enumerate(children):
            compute_positions(child,
                              depth + 1,
                              start_x + i * dx,
                              positions,
                              width / 2)
    return positions

Now we extract the parent child relationships by recursively traversing the tree. For each node, we store its connection to its children so that these relationships can later be drawn as lines in the visualization. This step builds the structural backbone required to clearly represent the hierarchical clustering tree.

Python

def extract_edges(node, edges=None):
    
    if edges is None:
        edges = []

    for child in node["children"]:
        edges.append((node["name"], child["name"]))
        extract_edges(child, edges)

    return edges

Now we draw the tree by connecting each parent node to its children and displaying the cluster size inside every node. Colours are used to make the structure easier to read. This creates a clear top down view of how the data was split during divisive clustering.

Python

def plot_tree(tree):

    positions = compute_positions(tree)
    edges = extract_edges(tree)

    fig, ax = plt.subplots(figsize=(12, 6))
    ax.axis('off')

    
    for parent, child in edges:
        x1, y1 = positions[parent]
        x2, y2 = positions[child]
        ax.plot([x1, x2], [y1, y2], 'k-')

    
    for node, (x, y) in positions.items():

        if "Root" in node:
            color = "lightblue"
        elif "C1" in node or "C2" in node:
            color = "lightgreen"
        else:
            color = "lightyellow"

        ax.text(x, y, node,
                ha='center',
                va='center',
                bbox=dict(boxstyle="round",
                          facecolor=color,
                          edgecolor="black"))

    plt.title("Divisive Hierarchical Clustering Tree", fontsize=14)
    plt.tight_layout()
    plt.show()
plot_tree(tree)

Output:

Divisive-Heirarchical-clsutering — Divisive Clustering

You can download the code from here

Hierarchical Clustering with Scikit-Learn

Implementing Agglomerative Hierarchical Clustering

Step 1: Import Required Libraries

Step 2: Load the Dataset

Step 3: Feature Scaling

Step 4: Visualizing the Dendrogram

Step 5: Building the Hierarchical Clustering Model

Step 6: Fit the Model and Assign Clusters

Step 7: Cluster Distribution Analysis

Step 8: Evaluating Clustering Quality

Implementing Divisive Hierarchical Clustering

Step 1: Import Required Libraries

Step 2: Load the Dataset

Step 3: Feature Scaling

Step 4: Defining Divisive Clustering Function

Step 5: Execute Divisive Clustering

Step 6: Analyze Cluster Sizes

Step 7: Assign Flat Cluster Labels

Step 8: Evaluate Cluster Quality

Step 9: Visualizing Divisive Hierarchical Clustering Tree

Explore