Exp 3
Exp 3
Experiment No 3
Aim - To implement ensemble learning bagging and boosting
Objective: LO3: To demonstrate ensemble techniques to combine predictions from different models.
Theory:
What is ensemble
Figure 1. Bagging and Boosting | Spreadsheet, Robot and Idea icons by Freepik on Flaticon
The bias and variance tradeoff is one of the key concerns when working
with machine learning algorithms. Fortunately there are
some Ensemble Learning based techniques that machine learning
practitioners can take advantage of in order to tackle the bias and
variance tradeoff, these techniques are bagging and boosting. So, in
this blog we are going to explain how bagging and boosting works,
what theirs components are and how you can implement them in your
ML problem, thus this blog will be divided in the following sections:
What is Bagging?
What is Boosting?
AdaBoost
What is Bagging?
In the previous code snippet was created a bagging based model for
the well know breast cancer dataset. As base learner was implemented
a Decision Tree, 5 subsets were created randomly with replacement
from the training set (to train 5 decision tree models). The number of
items per subset were 50. By running it we will get:
Train score: 0.9583568075117371
Test score: 0.941048951048951
Great, so far we’ve already seen what bagging is and how it works.
Let’s see what boosting is, its components and why it is related
to bagging, let’s go for it!
What is Boosting?
The boosting technique has been studied and improved over the
years, several variations have been added to the core idea of boosting,
some of the most popular are: AdaBoost (Adaptive
Boosting), Gradient Boosting and XGBoost (Extreme Gradient
Boosting). As mentioned above, the key differentiator
between boosting-based techniques is the way in which errors are
penalized (by modifying weights or minimizing a loss function) as
well as how the data is sampled.
AdaBoost
Implementation
Bagging
# For this basic implementation, we only need these modules
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
# For simplicity, we are going to use as base estimator a Decision Tree with
fixed parameters
tree = DecisionTreeClassifier(max_depth=3, random_state=23)
# Training
bagging.fit(x_train, y_train)
# Evaluating
print(f"Train score: {bagging.score(x_train, y_train)}")
print(f"Test score: {bagging.score(x_test, y_test)}")
Output:
Boosting
# For this basic implementation, we only need these modules
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
x, y = load_breast_cancer(return_X_y=True)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25,
random_state=23)
# AdaBoost initialization
# It's defined the decision tree as the base learner
# The number of estimators will be 5
# The penalizer for the weights of each estimator is 0.1
adaboost = AdaBoostClassifier(base_estimator=tree, n_estimators=5,
learning_rate=0.1, random_state=23)
# Train!
adaboost.fit(x_train, y_train)
# Evaluation
print(f"Train score: {adaboost.score(x_train, y_train)}")
print(f"Test score: {adaboost.score(x_test, y_test)}")
Output:
and Boosting.