Open In App

Hierarchical Classification

Last Updated : 17 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Hierarchical classification is a task in machine learning where the goal is to assign an instance to one or more classes organized in a hierarchy, rather than choosing from a flat label set. This structure can improve prediction accuracy and make outputs more interpretable.

Hierarchical classification assigns instances to labels that are part of a structured taxonomy, where labels may have parent-child relationships. Instead of treating categories as independent, it models the relationships among them to better reflect the data's semantics.

Types of Hierarchical Structures

1) Tree Hierarchy

  • Each node has exactly one parent (except the root).
  • Every instance is assigned a unique path from the root to a leaf.
  • Example: Animal → Mammal → Dog

2) DAG (Directed Acyclic Graph)

  • A node can have multiple parents.
  • Useful when concepts belong to multiple categories.
  • Example: "Tablet" can belong to both "Electronics" and "Computing Devices"

3) Taxonomy

  • A domain-specific organizational structure that can be a tree or DAG.
  • Adds semantic meaning to the labels (e.g., product taxonomy in retail, medical coding in healthcare).

Why Use Hierarchical Classification?

Aspect

Flat Classification

Hierarchical Classification

Output

Single Label

Label with hierarchy (e.g., path)

Error penalty

Equal for all errors

Penalizes mistakes at higher levels more

Interpretability

Moderate

High (provides structured output)

Use Cases and Applications

  • Medical diagnosis (ICD coding)
  • Product categorization in e-commerce
  • Document topic classification
  • Biological classification (taxonomy)
  • News categorization by topics and subtopics

Methods of Hierarchical Classification

1. Local Classifier per Node

  • A binary classifier is trained for each node to decide whether an instance belongs to that class.
  • Prediction proceeds top-down from the root.

2. Local Classifier per Parent Node

  • For each internal node, a multi-class classifier is trained to distinguish among its child nodes.
  • This reduces the number of classifiers but may increase complexity at each node.

3. Local Classifier per Level

  • One classifier per hierarchy level.
  • Useful when hierarchy is well-balanced.

4. Global Classifier

  • A single model is trained to consider the full hierarchy.
  • Often requires custom loss functions to enforce structural constraints.

5. Constraint-Based Models

  • Uses the hierarchy during inference (and optionally training) to enforce logical constraints.
  • Example: If a child node is predicted, all its ancestors must also be predicted.

Hierarchical Cross-Entropy Loss

To account for the hierarchical structure in the loss function, we can use hierarchical cross-entropy loss, which penalizes errors at higher levels more heavily:

L = -\sum_{i=1}^{N} \sum_{j \in \mathcal{A}(y_i)} \log P(j \mid x_i)

where:

  • N is the number of training samples,
  • y_i is the true label for instance x_i ,
  • \mathcal{A}(y_i) is the set of ancestors of y_i , including y_i itself.

Evaluation Metrics

  • Hierarchical Precision / Recall: Evaluate precision and recall at all levels of the hierarchy.
  • H-loss: Penalizes incorrect ancestor or descendant predictions.
  • Path Accuracy: Accuracy of the entire predicted path.

Tools and Libraries

  • scikit-multilearn for hierarchical multi-label classification
  • keras-han (for hierarchical attention networks)
  • Custom architectures using PyTorch or TensorFlow
  • Graph Neural Networks: To learn hierarchical embeddings over DAGs

Challenges

  1. Data sparsity in deeper levels of hierarchy.
  2. Error propagation in top-down models.
  3. Scalability for large taxonomies.
  4. Imbalanced data due to uneven class distribution.

Next Article

Similar Reads