Implementation of Stacking - ML

Stacking is a technique in machine learning where we combine the predictions of multiple models to create a new model that can make better predictions than any individual model.

In stacking, we first train several base models (also called first-layer models) on the training data.
Then, a meta-model (also called final estimator) is trained using the predictions of the base models as input.
The core idea is that if one model is sometimes right and another model is right in other cases, combining them intelligently can improve overall accuracy.

Step 1: Importing the required Libraries

We will import pandas, matplotlib and scikit learn for data handling, visualization and modeling.

python

import pandas as pd
import matplotlib.pyplot as plt
from mlxtend.classifier import StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.metrics import accuracy_score

Step 2: Loading the Dataset

We will load the dataset into a pandas DataFrame and separate features from the target variable.

pd.read_csv(): Reads the dataset from a CSV file.
drop(): Removes the target column from features.
df['target']: Selects the target column for prediction.

You can Download the dataset from this link Heart Dataset.

python

df = pd.read_csv('heart.csv')    
                 
X = df.drop('target', axis = 1)
y = df['target']

df.head()

Output:

Step 3: Splitting the Data into Training and Testing Sets

We will split the dataset into training and testing sets so we can train models and evaluate their performance.

train_test_split(): Splits data into train and test sets.
test_size = 0.2: Specifies that 20% of the data should be used for testing, leaving 80% for training.
random_state = 42: Ensures reproducibility by setting a fixed seed for random number generation.

python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

Step 4: Standardizing the Data

We will standardize numerical features so they have a mean of 0 and standard deviation of 1. This helps some models perform better.

StandardScaler(): Standardizes features.
fit_transform(): Learns scaling parameters from training data and applies them.
transform(): Applies learned scaling to test data.
var_transform: Specifies the list of feature columns that need to be standardized.
X_train[var_transform]: Applies the fit_transform method to standardize the selected columns in the training data.
X_test[var_transform]: Applies the transform method to standardize the corresponding columns in the test data using the scaling parameters from the training data.

python

sc = StandardScaler()  

var_transform = ['thalach', 'age', 'trestbps', 'oldpeak', 'chol']
X_train[var_transform] = sc.fit_transform(X_train[var_transform])   
X_test[var_transform] = sc.transform(X_test[var_transform])           

X_train.head()

Output:

Step 5: Building First Layer Estimators

We will create base models that will form the first layer of our stacking model. For this example we’ll use K-Nearest Neighbors classifier and Naive Bayes classifier.

KNeighborsClassifier(): A model based on nearest neighbors.
GaussianNB(): A Naive Bayes classifier assuming Gaussian distribution.

python

KNC = KNeighborsClassifier()   
NB = GaussianNB()

Step 6: Training and Evaluating KNeighborsClassifier

We will Train the KNN model and check its accuracy on the test set.

fit(): Trains the model.
predict(): Makes predictions on test data.
accuracy_score(): Calculates accuracy

python

model_kNeighborsClassifier = KNC.fit(X_train, y_train)  
pred_knc = model_kNeighborsClassifier.predict(X_test)

acc_knc = accuracy_score(y_test, pred_knc)  
print('Accuracy Score of KNeighbors Classifier:', acc_knc * 100)

Output:

Accuracy Score of KNeighbors Classifier: 86.88524590163934

Step 7: Training and Evaluating Naive Bayes Classifier

Similarly, we will train the Naive Bayes model and check its accuracy.

python

model_NaiveBayes = NB.fit(X_train, y_train)
pred_nb = model_NaiveBayes.predict(X_test)

acc_nb = accuracy_score(y_test, pred_nb)
print('Accuracy of Naive Bayes Classifier:', acc_nb * 100)

Output:

Accuracy of Naive Bayes Classifier: 86.88524590163934

Step 8: Implementing the Stacking Classifier

Now, we will combine the base models using a Stacking Classifier. The meta-model will be a logistic regression model which will take the predictions of KNN and Naive Bayes as input.

StackingClassifier(): Combines base models and a meta-model.
classifiers: List of base learners.
meta_classifier: Model that learns from base learners’ predictions.
use_probas=True: Passes probability outputs to the meta-model instead of class labels.

python

base_learners = [
    KNeighborsClassifier(),
    GaussianNB()
]
meta_model = LogisticRegression()

stacking_model = StackingClassifier(classifiers=base_learners, meta_classifier=meta_model, use_probas=True)

Step 9: Training Stacking Classifier

Next we will rain the stacking classifier and evaluate its accuracy.

python

model_stack = stacking_model.fit(X_train, y_train)   
pred_stack = model_stack.predict(X_test)

acc_stack = accuracy_score(y_test, pred_stack)
print('Accuracy Score of Stacked Model:', acc_stack * 100)

Output:

Accuracy Score of Stacked Model: 88.52459016393442

Both individual models (KNN and Naive Bayes) achieved an accuracy of approximately 86.88%, while the stacked model achieved an accuracy of around 88.52%. This shows that combining the predictions of multiple models using stacking can slightly improve overall performance compared to using a single model.

Stacking in Machine Learning

Implementation of Stacking - ML

Step 1: Importing the required Libraries

Step 2: Loading the Dataset

Step 3: Splitting the Data into Training and Testing Sets

Step 4: Standardizing the Data

Step 5: Building First Layer Estimators

Step 6: Training and Evaluating KNeighborsClassifier

Step 7: Training and Evaluating Naive Bayes Classifier

Step 8: Implementing the Stacking Classifier

Step 9: Training Stacking Classifier

Related Articles

Explore