Hierarchical Clustering with Scikit-Learn

Last Updated : 13 Feb, 2026

Hierarchical Clustering is an unsupervised learning technique that groups data into a hierarchy of clusters based on similarity. It builds a tree-like structure called a dendrogram, which helps visualise relationships and decide the optimal number of clusters.

  • Does not require pre-selecting the number of clusters
  • Uses agglomerative (bottom up) or divisive (top down) approaches
  • Commonly applied in data exploration and pattern discovery
  • Widely used in pattern recognition, customer segmentation and image grouping
Types-of-Hierarchical-Clustering
Hierarchical Clustering

Implementing Agglomerative Hierarchical Clustering

Scikit Learn provides a straightforward implementation of Agglomerative hierarchical clustering through the Agglomerative Clustering class.

Step 1: Import Required Libraries

Here we will import numpy, pandas, matplotlib and scikit learn for its implementation.

Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
from scipy.cluster.hierarchy import dendrogram, linkage

Step 2: Load the Dataset

  • Each row represents an image flattened into numerical features.
  • No labels are used during clustering this is purely unsupervised.
Python
digits = load_digits()
X = digits.data

Step 3: Feature Scaling

Feature scaling matters in Hierarchical Clustering because the algorithm relies on distance calculations and is highly sensitive to feature magnitudes.

Python
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Visualizing the Dendrogram

Before selecting the number of clusters, we visualize the hierarchy using a dendrogram.

Python
linked = linkage(X_scaled, method='ward')

plt.figure(figsize=(12, 6))
dendrogram(
    linked,
    truncate_mode='lastp',
    p=30
)
plt.title("Hierarchical Clustering Dendrogram")
plt.xlabel("Cluster Size")
plt.ylabel("Distance")
plt.show()

Output:

dendogram
Dendogram

Step 5: Building the Hierarchical Clustering Model

Based on dendrogram inspection, we choose a reasonable number of clusters.

Python
hc_model = AgglomerativeClustering(
    n_clusters=10,
    linkage='ward'
)

Step 6: Fit the Model and Assign Clusters

This step fits the hierarchical clustering model to the scaled data and assigns a cluster label to each data point. Each label represents the cluster formed from the hierarchical structure defined by the dendrogram.

Python
cluster_labels = hc_model.fit_predict(X_scaled)

Step 7: Cluster Distribution Analysis

This step shows how data points are distributed across clusters. It provides quick insight into cluster balance, helps detect over fragmentation and highlights dominant groupings, making it an important checkpoint before using the clusters downstream.

Python
pd.Series(cluster_labels).value_counts()

Output:

Heirarchial
Points assigned in each cluster

Step 8: Evaluating Clustering Quality

  • The Silhouette Score evaluates how well clusters are formed by comparing cohesion within clusters to separation between clusters.
  • Scores closer to +1 indicate well defined clusters, while values near 0 or negative suggest overlap or poor grouping.
Python
sil_score = silhouette_score(X_scaled, cluster_labels)
print("Score :", round(sil_score, 2))

Output:

Score : 0.13

Implementing Divisive Hierarchical Clustering

Scikit Learn does not provide a dedicated library or built in API for divisive clustering. Instead, the approach is implemented manually by applying a top down recursive splitting strategy, most commonly using K Means clustering from scikit learn to divide clusters step by step.

Step 1: Import Required Libraries

Here we will import numpy and scikit learn library.

Python
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np

Step 2: Load the Dataset

  • Each row represents an image flattened into numerical features.
  • No labels are used during clustering, this is purely unsupervised.
Python
digits = load_digits()
X = digits.data

Step 3: Feature Scaling

Feature scaling ensures stable and meaningful partitioning.

Python
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Defining Divisive Clustering Function

This function applies a top down divisive clustering strategy, where the dataset is repeatedly split into smaller clusters to reveal finer patterns.

  • Starts by treating the entire dataset as one cluster
  • Splits the cluster into two using K Means
  • Recursively repeats the same process on each sub cluster
  • Stops when the maximum depth or minimum cluster size is reached
Python
def divisive_clustering(X, depth=0, max_depth=3, min_size=50):
  
    if depth == max_depth or len(X) <= min_size:
        return [X]

    
    kmeans = KMeans(n_clusters=2, random_state=42)
    labels = kmeans.fit_predict(X)

    cluster_1 = X[labels == 0]
    cluster_2 = X[labels == 1]

    
    return (
        divisive_clustering(cluster_1, depth + 1, max_depth, min_size) +
        divisive_clustering(cluster_2, depth + 1, max_depth, min_size)
    )

Step 5: Execute Divisive Clustering

This step applies divisive clustering to the scaled data using the defined depth and minimum size.

Python
final_clusters = divisive_clustering(
    X_scaled,
    max_depth=3,
    min_size=50
)

Step 6: Analyze Cluster Sizes

This step calculates the number of data points in each final cluster to understand how the data has been split.

Python
cluster_sizes = [len(cluster) for cluster in final_clusters]
cluster_sizes

Output:

[39, 568, 643, 369, 178]

Step 7: Assign Flat Cluster Labels

This step transforms the hierarchical, tree based clustering output into flat cluster labels, which are required by most machine learning pipelines to enable evaluation, visualization and deployment.

Python
labels = np.empty(X_scaled.shape[0], dtype=int)

start = 0
for i, cluster in enumerate(final_clusters):
    size = len(cluster)
    labels[start:start + size] = i
    start += size

Step 8: Evaluate Cluster Quality

  • The Silhouette Score evaluates how well clusters are formed by comparing cohesion within clusters to separation between clusters.
  • Scores closer to +1 indicate well defined clusters, while values near 0 or negative suggest overlap or poor grouping.
Python
sil_score = silhouette_score(X_scaled, labels)
print("Score :", round(sil_score, 2))

Output:

Score : -0.04

Step 9: Visualizing Divisive Hierarchical Clustering Tree

Now we will visualize the divisive clustering process as a tree.

  • Begins with the entire dataset as a single cluster (Root).
  • Divides it into two clusters using KMeans.
  • Recursively splits each resulting cluster into smaller groups.
  • Stops when the maximum depth is reached or the cluster size becomes too small.
Python
def build_tree(X, depth=0, max_depth=3, min_size=50, node_name="Root"):
    
    if depth == max_depth or len(X) <= min_size:
        return {
            "name": f"{node_name}\n(size={len(X)})",
            "children": []
        }

    kmeans = KMeans(n_clusters=2, random_state=42)
    labels = kmeans.fit_predict(X)

    cluster_1 = X[labels == 0]
    cluster_2 = X[labels == 1]

    return {
        "name": f"{node_name}\n(size={len(X)})",
        "children": [
            build_tree(cluster_1, depth + 1, max_depth, min_size, node_name + " → C1"),
            build_tree(cluster_2, depth + 1, max_depth, min_size, node_name + " → C2")
        ]
    }

tree = build_tree(X_scaled, max_depth=3, min_size=50)

Now, Assigns X and Y positions to each node, where depth controls the vertical placement and child nodes are distributed horizontally to maintain proper spacing.

Python
def compute_positions(node, depth=0, x=0, positions=None, width=8):
    
    if positions is None:
        positions = {}

    positions[node["name"]] = (x, -depth)

    children = node["children"]
    if children:
        dx = width / len(children)
        start_x = x - width/2 + dx/2

        for i, child in enumerate(children):
            compute_positions(child,
                              depth + 1,
                              start_x + i * dx,
                              positions,
                              width / 2)
    return positions

Now we extract the parent child relationships by recursively traversing the tree. For each node, we store its connection to its children so that these relationships can later be drawn as lines in the visualization. This step builds the structural backbone required to clearly represent the hierarchical clustering tree.

Python
def extract_edges(node, edges=None):
    
    if edges is None:
        edges = []

    for child in node["children"]:
        edges.append((node["name"], child["name"]))
        extract_edges(child, edges)

    return edges

Now we draw the tree by connecting each parent node to its children and displaying the cluster size inside every node. Colours are used to make the structure easier to read. This creates a clear top down view of how the data was split during divisive clustering.

Python
def plot_tree(tree):

    positions = compute_positions(tree)
    edges = extract_edges(tree)

    fig, ax = plt.subplots(figsize=(12, 6))
    ax.axis('off')

    
    for parent, child in edges:
        x1, y1 = positions[parent]
        x2, y2 = positions[child]
        ax.plot([x1, x2], [y1, y2], 'k-')

    
    for node, (x, y) in positions.items():

        if "Root" in node:
            color = "lightblue"
        elif "C1" in node or "C2" in node:
            color = "lightgreen"
        else:
            color = "lightyellow"

        ax.text(x, y, node,
                ha='center',
                va='center',
                bbox=dict(boxstyle="round",
                          facecolor=color,
                          edgecolor="black"))

    plt.title("Divisive Hierarchical Clustering Tree", fontsize=14)
    plt.tight_layout()
    plt.show()
plot_tree(tree)

Output:

Divisive-Heirarchical-clsutering
Divisive Clustering

You can download the code from here

Comment