Open In App

How to choose α in cost-complexity pruning?

Last Updated : 09 Jan, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Cost-complexity pruning is a method used in decision trees to balance the trade-off between accuracy and complexity, helping to prevent overfitting. The key parameter in this process is alpha, which controls how much emphasis is placed on simplifying the tree. A higher alpha leads to more pruning, resulting in a simpler tree, while a lower alpha retains more complexity. The challenge is finding the optimal value of alpha, which minimizes both error and complexity. It acts like a trimming tool, after the tree is grown, it looks for branches that add little value and trims them off. Larger values mean a more compact, simpler tree.

Quick Example: Imagine you have a decision tree that predicts whether a customer will buy a product based on age and income. If the tree is too complex, it might perfectly classify the training data but fail to generalize to new data (overfitting). By adjusting alpha, you can prune unnecessary branches, simplifying the tree while maintaining accuracy. For instance, with a higher alpha, you may prune branches that only slightly improve accuracy but add complexity.

Decion_Tree_pruning
Decision Tree Pruning Example

How to choose α in cost-complexity pruning?

Step 1: Using Cross-Validation to Tune alpha:

  • Perform k-fold cross-validation on dataset with various alpha values.
  • For each 𝛼, calculate the average validation score across folds, recording where the accuracy is highest.
  • Choosing the alpha value that yields the highest validation score or where the score plateaus, indicating the tree is appropriately pruned without losing predictive power.

Now: Finding the Ideal Balance

  • As alpha increases, pruning becomes more aggressive, reducing tree complexity but potentially lowering model accuracy.
  • Start with a small alpha and gradually increase it while monitoring validation accuracy.
  • Select the αα value where further pruning begins to degrade the validation accuracy, ensuring the tree generalizes well without overfitting

Example: Step-by-Step Tree Pruning with Increasing alpha value

Visual Example: Imagine you prune a tree step-by-step, increasing α:

  • α=0.01 : Large tree, slightly better accuracy (may overfit).
  • α=0.1 : Moderate-sized tree, decent accuracy.
  • α=1.0 : Very small tree, low accuracy (underfitting).

Choose α=0.1 if it provides the best balance between tree size and accuracy.

This graph plots the total impurity of the leaves against the effective alpha values. This plot helps in visualizing how the complexity of the tree changes with different alpha values. The x-axis represents the alpha values, and the y-axis represents the total impurity.

The optimal alpha value is chosen based on the maximum testing accuracy. This ensures that the model generalizes well to unseen data while avoiding overfitting.

Let's observe the effects of alpha pruning by an example:

Code Example for Alpha Pruning:

Python
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, roc_auc_score
from collections import Counter

X, y = make_classification(n_samples=2000, n_features=2, n_redundant=0, n_clusters_per_class=1, 
                           weights=[0.7], flip_y=0.5, random_state=42)

print("Class distribution:", Counter(y))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

alpha_configs = {
    'No Pruning (alpha=0)': 0,
    '(alpha=0.001)': 0.003,
    '(alpha=0.01)': 0.01,
    'alpha=0.1)': 0.1
}

my_dpi = 90
fig, axes = plt.subplots(1, 4, figsize=(1100/my_dpi, (1100/4)/my_dpi))
cmap_light = ListedColormap(['#FFAAAA', '#AAAAFF'])
plt.ylabel('Accuracy')
plt.title('Tree Depth vs. Training and Validation Accuracy')
plt.legend()
plt.grid(True)
plt.show()

param_grid = {'max_depth': range(1, 16)}
grid_search = GridSearchCV(DecisionTreeClassifier(random_state=42), param_grid, cv=5)
grid_search.fit(X_train, y_train)
best_depth = grid_search.best_params_['max_depth']

clf_pruned = DecisionTreeClassifier(max_depth=best_depth, min_samples_split=4, min_samples_leaf=2, random_state=42)
clf_pruned.fit(X_train, y_train)

train_accuracy_pruned = accuracy_score(y_train, clf_pruned.predict(X_train))
test_accuracy_pruned = accuracy_score(y_test, clf_pruned.predict(X_test))

print(f"Optimal Depth: {best_depth}")
print(f"Training Accuracy (Pruned): {train_accuracy_pruned}")
print(f"Test Accuracy (Pruned): {test_accuracy_pruned}")


This code generates an imbalanced classification dataset and trains a Decision Tree classifier with varying levels of pruning (controlled by `ccp_alpha` values). It visualizes the effect of each pruning level on the decision boundaries and calculates the ROC AUC and classification report for each model. The plot illustrates how pruning impacts model complexity and generalization.

Output:

Classification report with No Pruning (alpha=0) (ROC AUC: 0.67):
precision recall f1-score support

0 0.71 0.80 0.75 358
1 0.63 0.50 0.56 242

accuracy 0.68 600
macro avg 0.67 0.65 0.66 600
weighted avg 0.68 0.68 0.67 600

Classification report with (alpha=0.001) (ROC AUC: 0.68):
precision recall f1-score support

0 0.71 0.80 0.75 358
1 0.64 0.52 0.57 242

accuracy 0.69 600
macro avg 0.67 0.66 0.66 600
weighted avg 0.68 0.69 0.68 600

Classification report with (alpha=0.01) (ROC AUC: 0.67):
precision recall f1-score support

0 0.69 0.83 0.76 358
1 0.65 0.45 0.53 242

accuracy 0.68 600
macro avg 0.67 0.64 0.65 600
weighted avg 0.67 0.68 0.67 600

Classification report with alpha=0.1) (ROC AUC: 0.50):
precision recall f1-score support

0 0.60 1.00 0.75 358
1 0.00 0.00 0.00 242

accuracy 0.60 600
macro avg 0.30 0.50 0.37 600
weighted avg 0.36 0.60 0.45 600

The difference can be further observed using the decision boundaries.

Decision_trees_pruning
Effect of various alpha parameters on the decision boundaries


Next Article

Similar Reads