Ensemble Learning Methods
Ensemble Learning Methods
Ensemble Methods
• Problem Statement:
• Mining complex knowledge from complex data such as from
enterprise data mining solutions, has been one of the most
challenging problems in knowledge discovery process.
• In such situations, it is observed that not a single learning
method able to provide best informative knowledge in all
kinds of datasets.
• The word ‘ensemble’ literally mean group.
• Ensemble methods involve group of predictive models to
achieve a better accuracy and model stability.
• Ensemble methods are known to impart supreme boost to
tree based models.
Bias and Variance
• Bias means, ‘how much on an average are the predicted values
different from the actual value.’
• Variance means, ‘how different will the predictions of the model be
at the same point if different samples are taken from the same
population’.
• You build a small tree and you will get a model with low variance
and high bias. How do you manage to balance the trade off
between bias and variance ?
• Increasing in the complexity of model, reduction in prediction error
due to lower bias in the model.
• As the model gets more complex, it end up with over-fitting i.e.
model will start suffering from high variance.
• Successful model is one that maintains a balance between these
two types of errors. This is known as the trade-off management of
bias-variance errors.
• Ensemble learning is one way to execute this trade off analysis.
Categories of Ensemble Methods
• Some of the commonly used ensemble methods include:
Bagging,
Boosting
Stacking
Bagging
• Bagging stands for bootstrap aggregating is a technique used
to reduce the variance of predictions by combining the
result of multiple classifiers modeled on different sub-samples
of the same data set.
Pseudocode of Bagging Algorithm
Note: Bagging works especially well for unstable algorithms , where a small
changes in the training data might result some significant changes in the output
classifiers.
• However, this algorithm would yield poor ensembles for stable algorithms.
• Stable algorithms include K-nearest neighbor, whereas decision tree, rule
learners and neural networks are considered to be unstable
Bagging (Random Forest Model)
• Random Forest is considered to be a panacea of all data
science problems.
• Cons:
• Do good job at classification but not as good as for regression
problem as it does not give precise continuous nature
predictions.
• Random Forest can feel like a black box approach for statistical
modelers – provide very little control on what the model does.
•
Boosting
• Boosting is a machine learning ensemble meta-algorithm for
primarily reducing bias, and also variance in
supervised learning, and a family of machine learning
algorithms that convert weak learners to strong ones.
(Wikipedia)
• We first model data with simple models and analyze data for
errors.
Fit a simple linear regressor or decision tree on data, consider using a decision tree.
Main Steps of GB Algorithm
• 1. Decision tree on data [Input X, and Output Y]
• 3. Fit a new model on error residuals as target variable with same input
variables [call it e1_predicted]
• It can be observed from above figures that at 50th iteration and 20th iteration ,residuals vs. x
• plot look similar.
• But at 50th the model is becoming more complex and predictions are overfitting on the training data and are trying to learn
each training data.
• So, it would have been better to stop at 20th iteration.
• Thank you