An Introduction of Ensemble Learning

This document discusses classification models and ensemble learning techniques. It defines ensemble learning as combining multiple base models to produce a stronger predictive model. It then describes common ensemble techniques like bagging, boosting, stacking, majority voting and averaging. Specific ensemble algorithms discussed include random forest, AdaBoost, gradient boosted machines (GBM), XGBoost, LightGBM and CatBoost.

Uploaded by

Friday Jones

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

87 views

An Introduction of Ensemble Learning

Uploaded by

Friday Jones

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Classification Models

•In supervised learning

1. Logistic Regression
2. Practical Implementation of Logistic Regression
3. Support Vector Machine (SVM)
4. Practical Implementation of SVM
5. K-Nearest Neighbor (KNN)
6. A Numerical of KNN
7. Practical Implementation of KNN
Ensemble Learning
(An Introduction of Ensemble Learning)
Dr. Virendra Singh Kushwah
Assistant Professor Grade-II
School of Computing Science and Engineering
[email protected]
7415869616
Ensemble Methods in Machine Learning
• Ensemble = “a group of items viewed as a whole rather than
individually”

• Ensemble methods is a machine learning technique that

combines several base models in order to produce one
optimal predictive model. To better understand this
definition lets take a step back into ultimate goal of machine
learning and model building.
• Let’s understand the concept of ensemble learning
with an example. Suppose you are a movie director
and you have created a short movie on a very
important and interesting topic. Now, you want to
take preliminary feedback (ratings) on the movie
before making it public. What are the possible ways by
which you can do that?
• A: You may ask one of your friends to rate the movie for you.

• Now it’s entirely possible that the person you have chosen
loves you very much and doesn’t want to break your heart
by providing a 1-star rating to the horrible work you have
created.
• B: Another way could be by asking 5 colleagues of yours to
rate the movie.

• This should provide a better idea of the movie. This method

may provide honest ratings for your movie. But a problem
still exists. These 5 people may not be “Subject Matter
Experts” on the topic of your movie. Sure, they might
understand the cinematography, the shots, or the audio, but
at the same time may not be the best judges of dark humor.
• C: How about asking 50 people to rate the movie?

• Some of which can be your friends, some of them can

be your colleagues and some may even be total
strangers.
• The responses, in this case, would be more generalized and
diversified since now you have people with different sets of
skills. And as it turns out – this is a better approach to get
honest ratings than the previous cases we saw.

• With these examples, you can infer that a diverse group of

people are likely to make better decisions as compared to
individuals. Similar is true for a diverse set of models in
comparison to single models. This diversification in Machine
Learning is achieved by a technique called Ensemble
Learning.
Simple Ensemble Techniques
1. Max Voting
2. Averaging
3. Weighted Averaging
Max Voting
• The max voting method is generally used for classification
problems. In this technique, multiple models are used to make
predictions for each data point. The predictions by each model are
considered as a ‘vote’. The predictions which we get from the
majority of the models are used as the final prediction.

• For example, when you asked 5 of your colleagues to rate your

movie (out of 5); we’ll assume three of them rated it as 4 while
two of them gave it a 5. Since the majority gave a rating of 4, the
final rating will be taken as 4. You can consider this as taking the
mode of all the predictions.
The result of max voting would be
something like this:

Colleague Colleague Colleague Colleague Colleague

Final rating
1 2 3 4 5
5 4 5 4 4 4
Averaging
• Similar to the max voting technique, multiple predictions are made for each
data point in averaging. In this method, we take an average of predictions
from all the models and use it to make the final prediction. Averaging can be
used for making predictions in regression problems or while calculating
probabilities for classification problems.

• For example, in the below case, the averaging method would take the
average of all the values.

• i.e. (5+4+5+4+4)/5 = 4.4

Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating

5 4 5 4 4 4.4
Weighted Average
• This is an extension of the averaging method. All models are assigned
different weights defining the importance of each model for prediction. For
instance, if two of your colleagues are critics, while others have no prior
experience in this field, then the answers by these two friends are given
more importance as compared to the other people.

• The result is calculated as [(50.23) + (40.23) + (50.18) + (40.18) +

(4*0.18)] = 4.41.
Colleague Colleague Colleague Colleague Colleague Final
1 2 3 4 5 rating

weight 0.23 0.23 0.18 0.18 0.18

rating 5 4 5 4 4 4.41
Ensemble learning Types
Stacking
• Stacking is an ensemble learning
technique that uses predictions from
multiple models (for example decision
tree, knn or svm) to build a new
model. This model is used for making
predictions on the test set. Below is a
step-wise explanation for a simple
stacked ensemble:

• The train set is split into 10 parts.

• A base model (suppose a
decision tree) is fitted on 9
parts and predictions are
made for the 10th part. This
is done for each part of the
train set.
• The base model (in this case,
decision tree) is then fitted on
the whole train dataset.
• Using this model, predictions
are made on the test set.
• Steps 2 to 4 are repeated for
another base model (say knn)
resulting in another set of
predictions for the train set and
test set.
• The predictions from the train set
are used as features to build a new
model.

• This model is used to make final

predictions on the test prediction
set.
Bagging
• The idea behind bagging is combining the results of multiple models (for instance,
all decision trees) to get a generalized result. Here’s a question: If you create all the
models on the same set of data and combine it, will it be useful? There is a high
chance that these models will give the same result since they are getting the same
input. So how can we solve this problem? One of the techniques is bootstrapping.

• Bootstrapping is a sampling technique in which we create subsets of observations

from the original dataset, with replacement. The size of the subsets is the same as
the size of the original set.

• Bagging (or Bootstrap Aggregating) technique uses these subsets (bags) to get a
fair idea of the distribution (complete set). The size of subsets created for bagging
may be less than the original set.
• Multiple subsets are created from the original
dataset, selecting observations with replacement.
• A base model (weak model) is created on each of
these subsets.
• The models run in parallel and are independent of
each other.
• The final predictions are determined by combining
the predictions from all the models.
Boosting
• Before we go further, here’s another question for you: If a
data point is incorrectly predicted by the first model, and
then the next (probably all models), will combining the
predictions provide better results? Such situations are taken
care of by boosting.

• Boosting is a sequential process, where each subsequent

model attempts to correct the errors of the previous model.
The succeeding models are dependent on the previous
model.
• Let’s understand the way boosting works in the
below steps.

• A subset is created from the original dataset.

• Initially, all data points are given equal weights.
• A base model is created on this subset.
• This model is used to make predictions on the
whole dataset.
• Errors are calculated using the actual
values and predicted values.
• The observations which are incorrectly
predicted, are given higher weights.
• (Here, the three misclassified blue-plus
points will be given higher weights)
• Another model is created and
predictions are made on the dataset.
• (This model tries to correct the errors
from the previous model)
•Similarly,
multiple models
are created,
each correcting
the errors of the
previous model.

•The final model

(strong learner)
is the weighted
mean of all the
models (weak
learners)
• Thus, the boosting algorithm
combines a number of weak
learners to form a strong
learner. The individual models
would not perform well on the
entire dataset, but they work
well for some part of the
dataset. Thus, each model
actually boosts the performance
of the ensemble.
Algorithms based on Bagging and
Boosting
• Bagging and Boosting are two of the most commonly used techniques in machine learning. In this
section, we will look at them in detail. Following are the algorithms we will be focusing on:

• Bagging algorithms:

• Bagging meta-estimator
• Random forest

• Boosting algorithms:

• AdaBoost
• GBM
• XGBM
• Light GBM
• CatBoost

Human Race - Get Off Your Knees - The Lion Sleeps No More (PDFDrive)
67% (3)
Human Race - Get Off Your Knees - The Lion Sleeps No More (PDFDrive)
1,091 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
Bagging, Boosting
100% (1)
Bagging, Boosting
32 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Classification and Regression Trees
100% (1)
Classification and Regression Trees
60 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
No ratings yet
Ensemble Methods - Bagging, Boosting and Stacking - Towards Data Science PDF
37 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
10 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Homoscedasticity, Heteroscedasticity and Multicollinearity
100% (1)
Homoscedasticity, Heteroscedasticity and Multicollinearity
10 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
LDA KNN Logistic
100% (1)
LDA KNN Logistic
29 pages
PR01
100% (1)
PR01
41 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Unit - 4 Machine Learning
100% (1)
Unit - 4 Machine Learning
84 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
ML Lab Manual
100% (1)
ML Lab Manual
37 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
Vinee
100% (1)
Vinee
28 pages
ML Lec07 KNN
100% (2)
ML Lec07 KNN
37 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
Assignment No - 6-1
100% (1)
Assignment No - 6-1
3 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Presentation GPT 4
100% (1)
Presentation GPT 4
25 pages
Classification Problems
100% (1)
Classification Problems
25 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
ML Interview Questions and Answers
100% (1)
ML Interview Questions and Answers
25 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Fingerprinting As A Method of Collecting Fingerprints
No ratings yet
Fingerprinting As A Method of Collecting Fingerprints
19 pages
04-038 Concept Mapping
No ratings yet
04-038 Concept Mapping
24 pages
What Is X-Ray Powder Diffraction (XRD)
No ratings yet
What Is X-Ray Powder Diffraction (XRD)
7 pages
Mirrors
No ratings yet
Mirrors
10 pages
The Nature and Types of Organization Structures
No ratings yet
The Nature and Types of Organization Structures
14 pages
Report Format Project
No ratings yet
Report Format Project
4 pages
Spanish First Year Handbook 15-16-1
No ratings yet
Spanish First Year Handbook 15-16-1
40 pages
The Lunar Nodes Are Invisible Points in The Sky That Form An Axis in The Horoscope of Apparent Significance
100% (1)
The Lunar Nodes Are Invisible Points in The Sky That Form An Axis in The Horoscope of Apparent Significance
10 pages
A Genome-Wide Survey of The NAC Transcription Factor Family in Monocots and Eudicots
No ratings yet
A Genome-Wide Survey of The NAC Transcription Factor Family in Monocots and Eudicots
20 pages
Powered Exoskeleton
100% (1)
Powered Exoskeleton
22 pages
Theories of Leadership
100% (1)
Theories of Leadership
33 pages
Ifr220 Deck Drain Assembly (A3)
No ratings yet
Ifr220 Deck Drain Assembly (A3)
1 page
38i7 Ijaet0703704 Review On Theory of Constraints
No ratings yet
38i7 Ijaet0703704 Review On Theory of Constraints
11 pages
A Restorative Approach To Parent Teacher Meetings
No ratings yet
A Restorative Approach To Parent Teacher Meetings
2 pages
DCPD Vedic Maths
100% (1)
DCPD Vedic Maths
22 pages
Jul 01
No ratings yet
Jul 01
24 pages
1 - Using Background Knowledge in The Interpretation of Discourse
No ratings yet
1 - Using Background Knowledge in The Interpretation of Discourse
26 pages
IAME SKPCS Filipino 2 Mam Valle
No ratings yet
IAME SKPCS Filipino 2 Mam Valle
26 pages
Nursing Leadership and Management
50% (2)
Nursing Leadership and Management
8 pages
Omron PLC & Remote Io - Cat2014
No ratings yet
Omron PLC & Remote Io - Cat2014
40 pages
Oracle11gR2 Installation Steps
No ratings yet
Oracle11gR2 Installation Steps
6 pages
ANM Assignment
No ratings yet
ANM Assignment
12 pages
Chiron
67% (9)
Chiron
12 pages
Comparison Between Unit Cell and Plane Strain Models of Stone Column Ground Improvement
No ratings yet
Comparison Between Unit Cell and Plane Strain Models of Stone Column Ground Improvement
8 pages
DPNS PHD D
No ratings yet
DPNS PHD D
2 pages
Trimbur - 2000 - Composition and The Circulation of Writing
No ratings yet
Trimbur - 2000 - Composition and The Circulation of Writing
33 pages
PR Plan Part 1
No ratings yet
PR Plan Part 1
9 pages
Lead Tracker Manual - Vtiger CRM
No ratings yet
Lead Tracker Manual - Vtiger CRM
29 pages
02-Foundation of Individual Behavior (Personality)
100% (1)
02-Foundation of Individual Behavior (Personality)
32 pages