Open In App

Implementing the AdaBoost Algorithm From Scratch

Last Updated : 05 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

AdaBoost means Adaptive Boosting and it is a is a powerful ensemble learning technique that combines multiple weak classifiers to create a strong classifier. It works by sequentially adding classifiers to correct the errors made by previous models giving more weight to the misclassified data points.

In this article we will learn to implement AdaBoost algorithm from scratch. By making it from scratch we will have a deep understanding of how AdaBoost works and key principles behind it.

Boosting Algorithms

Boosting Algorithms 

Python implementation of AdaBoost 

Python provides special packages for applying AdaBoost we will see how we can use Python for applying AdaBoost on a machine learning problem. 

In this problem we are creating a synthetic dataset to check implement it.

1. Import Libraries

Let’s begin with importing important libraries that we will require to do our classification task:

Python
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, roc_auc_score

2. Defining the AdaBoost Class

Python
class AdaBoost:
    def __init__(self, n_estimators=50):
        self.n_estimators = n_estimators
        self.alphas = []
        self.models = []
  • AdaBoost class is initialized with the number of weak learners (n_estimators).
  • self.alphas: Stores the weight of each model based on its performance.
  • self.models: Stores the weak classifiers (decision stumps) used in AdaBoost.

 3. Training the AdaBoost Model (Fit Method)

Python
    def fit(self, X, y):
        n_samples, n_features = X.shape
        w = np.ones(n_samples) / n_samples
  • n_samples, n_features: Retrieves the number of samples and features from the dataset.
  • w: Initializes sample weights uniformly.
Python
        for _ in range(self.n_estimators):
            model = DecisionTreeClassifier(max_depth=1)
            model.fit(X, y, sample_weight=w)
            predictions = model.predict(X)
            err = np.sum(w * (predictions != y)) / np.sum(w)
            alpha = 0.5 * np.log((1 - err) / (err + 1e-10))
            self.alphas.append(alpha)
            self.models.append(model)
            w = w * np.exp(-alpha * y * predictions)
            w = w / np.sum(w)
  • err: Computes the weighted error, penalizing misclassified samples more.
  • alpha: Calculates the model weight based on its error. Models with lower error receive higher weight (alpha).
  • self.alphas.append(alpha): Appends the model’s weight to the list.
  • self.models.append(model): Appends the trained weak classifier to the list.
  • w: Updates the sample weights based on whether they were correctly or incorrectly classified

4. Making Predictions

Python
    def predict(self, X):
        strong_preds = np.zeros(X.shape[0])
        for model, alpha in zip(self.models, self.alphas):
            strong_preds += alpha * model.predict(X)
        return np.sign(strong_preds).astype(int)
  • strong_preds: Stores the aggregated predictions from all weak classifiers.

5. Example Usage

Python
if __name__ == "__main__":

    X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

    adaboost = AdaBoost(n_estimators=50)
    adaboost.fit(X_train, y_train)

    predictions = adaboost.predict(X_test)

    accuracy = accuracy_score(y_test, predictions)
    precision = precision_score(y_test, predictions)
    recall = recall_score(y_test, predictions)
    f1 = f1_score(y_test, predictions)
    try:
        roc_auc = roc_auc_score(y_test, predictions)
    except ValueError:
        roc_auc = 'Undefined (requires probability scores)'

    # Print results
    print(f"Accuracy: {accuracy * 100}%")
    print(f"Precision: {precision}")
    print(f"Recall: {recall}")
    print(f"F1 Score: {f1}")
    print(f"ROC-AUC: {roc_auc}")

Output:

Accuracy: 84.0%

Precision: 0.8364779874213837

Recall: 0.8580645161290322

F1 Score: 0.8471337579617835

ROC-AUC: 0.839377085650723

The model performs well with an accuracy of 84%, meaning it makes correct predictions most of the time. It has a good balance between precision (0.836), which shows it makes accurate positive predictions, and recall (0.858), which means it catches most of the actual positive cases. The F1 score (0.847) combines these two measures, and the ROC-AUC (0.839) shows the model does a good job of telling the difference between the two classes. Overall, these metrics indicate strong performance.



Next Article

Similar Reads