100% found this document useful (1 vote)
118 views

Ensemble Learning Methods

Ensemble learning methods use multiple machine learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. There are three main categories of ensemble methods: bagging, boosting, and stacking. Bagging methods like random forests build diverse models by training base learners on random subsets of data. Boosting methods incrementally build an ensemble by training each new model to be accurate on data instances previously misclassified. Gradient boosting works by training models sequentially to minimize predictive error.

Uploaded by

khatri81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
118 views

Ensemble Learning Methods

Ensemble learning methods use multiple machine learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. There are three main categories of ensemble methods: bagging, boosting, and stacking. Bagging methods like random forests build diverse models by training base learners on random subsets of data. Boosting methods incrementally build an ensemble by training each new model to be accurate on data instances previously misclassified. Gradient boosting works by training models sequentially to minimize predictive error.

Uploaded by

khatri81
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Ensemble Learning Methods

Ensemble Methods
• Problem Statement:
• Mining complex knowledge from complex data such as from
enterprise data mining solutions, has been one of the most
challenging problems in knowledge discovery process.
• In such situations, it is observed that not a single learning
method able to provide best informative knowledge in all
kinds of datasets.
• The word ‘ensemble’ literally mean group.
• Ensemble methods involve group of predictive models to
achieve a better accuracy and model stability.
• Ensemble methods are known to impart supreme boost to
tree based models.
Bias and Variance
• Bias means, ‘how much on an average are the predicted values
different from the actual value.’
• Variance means, ‘how different will the predictions of the model be
at the same point if different samples are taken from the same
population’.
• You build a small tree and you will get a model with low variance
and high bias. How do you manage to balance the trade off
between bias and variance ?
• Increasing in the complexity of model, reduction in prediction error
due to lower bias in the model.
• As the model gets more complex, it end up with over-fitting i.e.
model will start suffering from high variance.
• Successful model is one that maintains a balance between these
two types of errors. This is known as the trade-off management of
bias-variance errors.
•  Ensemble learning is one way to execute this trade off analysis.
Categories of Ensemble Methods
• Some of the commonly used ensemble methods include:

 Bagging,
 Boosting
 Stacking
Bagging
• Bagging stands for bootstrap aggregating is a technique used
to reduce the variance of predictions by combining the
result of multiple classifiers modeled on different sub-samples
of the same data set.
Pseudocode of Bagging Algorithm

Note: Bagging works especially well for unstable algorithms , where a small
changes in the training data might result some significant changes in the output
classifiers.
• However, this algorithm would yield poor ensembles for stable algorithms.
• Stable algorithms include K-nearest neighbor, whereas decision tree, rule
learners and neural networks are considered to be unstable
Bagging (Random Forest Model)
• Random Forest is considered to be a panacea of all data
science problems.

• Random Forest is a versatile machine learning method capable


of performing both regression and classification tasks. 

• It also undertakes dimensional reduction methods, treats


missing values, outlier values and other essential 
steps of data exploration, and does a fairly good job. 

• It is a type of ensemble learning method, where a group of


weak models combine to form a powerful model (Strong
Model).
How Random Forest Work
• Random forest builds multiple decision trees and merges them
together to get a more accurate and stable prediction.
• In order to classify a new object based on attributes, each tree
gives a classification and we say the tree “votes” for that class.
• The forest chooses the classification having the most votes
(over all the trees in the forest) and in case of regression, it
takes the average of outputs by different trees.
Different B/W Decision Trees and Random Forest

• In contrast to decision tree, Random Forest adds additional


randomness to the model, while growing the trees.
• Instead of searching for the most important feature while
splitting a node, it searches for the best feature among a
random subset of features.
• This results in a wide diversity that generally results in a better
model.
• Therefore, in Random Forest, only a random subset of the
features is taken into consideration by the algorithm for
splitting a node.
• It can even make trees more random, by additionally using
random thresholds for each feature rather than searching for
the best possible thresholds (like a normal decision tree does).
Pros and Cons
Pros:
• Handle large data set with higher dimensionality.
• Effective method for estimating missing data.
• balance errors in data sets where classes are imbalanced.

• Cons:
• Do good job at classification but not as good as for regression
problem as it does not give precise continuous nature
predictions.
• Random Forest can feel like a black box approach for statistical
modelers – provide very little control on what the model does.
•  
Boosting
• Boosting is a machine learning ensemble meta-algorithm for
primarily reducing bias, and also variance in 
supervised learning, and a family of machine learning
algorithms that convert weak learners to strong ones.
(Wikipedia)

• This is done by building a model from the training data, then


creating a second model that attempts to correct the errors
from the first model.

• Models are added until the training set is predicted perfectly


or a maximum number of models are added.
How Boosting Works
• For E.g. Solving a problem of spam email identification:
• We can identify ‘spam’ and ‘not spam’ emails using following
criteria. If:
• Email has only one image file (promotional image), It’s a SPAM
• Email has only link(s), It’s a SPAM
• Email body consist of sentence like “You won a prize money of
$ xxxxxx”, It’s a SPAM
• Email from our official domain “SMIU” , Not a SPAM
• Email from known source, Not a SPAM
• Do you think these rules individually are strong enough to
successfully classify an email?
• Yes or No ???
Weak Learners
• Individually, these rules are not powerful enough to classify an
email into ‘spam’ or ‘not spam’. Therefore, these rules are
called as weak learner.
• To convert weak learner to strong learner, combine the
prediction of each weak learner using methods like:
• Using average/ weighted average
• Considering prediction has higher vote
• For example:  Above, we have defined 5 weak learners. Out of
these 5, 3 are voted as ‘SPAM’ and 2 are voted as ‘Not a
SPAM’.
• In this case, by default, we’ll consider an email as SPAM
because we have higher(3) vote for ‘SPAM’.
How does it work?
• As mentioned earlier, boosting combines weak learner a.k.a.
base learner to form a strong rule.
• An immediate question

• ‘How boosting identify weak rules?‘

 To find weak rule, we apply base learning (ML) algorithms


with a different distribution.
 Each time base learning algorithm is applied, it generates a
new weak prediction rule.
 This is an iterative process.
 After many iterations, the boosting algorithm combines
these weak rules into a single strong prediction rule.
Algorithm Steps in Boosting
• Step 1:  The base learner takes all the distributions and assign
equal weight or attention to each observation.
• Step 2: If there is any prediction error caused by first base
learning algorithm, then we pay higher attention to
observations having prediction error. Then, we apply the next
base learning algorithm.
• Step 3: Iterate Step 2 till the limit of base learning algorithm is
reached or higher accuracy is achieved.
• Finally, it combines the outputs from weak learner and creates
 a strong learner which eventually improves the prediction
power of the model.
• Boosting pays higher focus on examples which are mis-
classified or have higher errors by preceding weak rules.
Pictorial View of Bagging and Boosting
Flow of Algorithms
Gradient Boosting Algorithm
• Gradient boosting is a machine learning technique for
regression and classification problems, which produces a
prediction model in the form of an ensemble of weak
prediction models, typically decision trees. (Wikipedia
definition)

• The main objective lies in the minimization of loss function in


supervised learning algorithm For e.g. assume mean squared
error (MSE) as loss which can be defined as:
Nutshell (GBA)

• We first model data with simple models and analyze data for
errors. 

• These errors signify data points that are difficult to fit by a


simple model. 

• Then for later models, we particularly focus on those hard to


fit data to get them right. 

• In the end, we combine all the predictors by giving some


weights to each predictor.
Building a Gradient Boosting Model

• Suppose we have data as shown in scatter plot with 1 input (x)


and 1 output (y) variables.

Fit a simple linear regressor or decision tree on data, consider using a decision tree. 
Main Steps of GB Algorithm
• 1. Decision tree on data  [Input X, and Output Y]

• 2. Calculate error residuals. Actual target value, minus predicted target


value [e1= y - y_predicted1 ]

• 3. Fit a new model on error residuals as target variable with same input
variables [call it e1_predicted]

• 4. Add the predicted residuals to the previous predictions


[y_predicted2 = y_predicted1 + e1_predicted]

• 5. Fit another model on residuals that is still left. i.e. [e2 = y -


y_predicted2]and repeat steps 2 to 5 until it starts overfitting or the sum
of residuals become constant.
• Overfitting can be controlled by consistently checking accuracy on
validation data.
Visualization of working Gradient Boosting Tree

1) Blue dots (left) plots are input (x) vs.


output (y)

2) Red line (left) shows values predicted


by decision tree

3) Green dots (right) shows residuals vs.


input(x) for ith iteration

4) Iteration represent sequential order


of fitting gradient boosting tree
Visualizations of GBA at 20th and 50th iteration

• It can be observed from above figures that at 50th iteration and 20th iteration ,residuals vs. x
• plot look similar.
• But at 50th the model is becoming more complex and predictions are overfitting on the training data and are trying to learn
each training data.
• So, it would have been better to stop at 20th iteration.
• Thank you

You might also like