MODULE-3
Decision by Committee: Ensemble Learning: Boosting: Adaboost , Stumping, Bagging: Subagging,
Random Forests, Comparison With Boosting, Different Ways To Combine Classifiers.
Unsupervised Learning: The K-MEANS algorithm : Dealing with Noise ,The k-Means Neural
Network , Normalisation ,A Better Weight Update Rule ,Using Competitive Learning for Clustering
Decision by Committee:
Ensemble Learning
Ensemble learning is a widely used and preferred machine learning technique in which
multiple individual models, often called base models, are combined to produce an effective
optimal prediction model. The Random Forest algorithm is an example of ensemble learning.
Boosting
Boosting is another ensemble procedure for creating a collection of predictors. In other words,
we fit successive trees, usually random samples, and at each step the goal is to resolve the net
error from the previous trees.
If a given input is misclassified by the theory, then its weight is increased so that the upcoming
hypothesis is more likely to classify it correctly by consolidating the whole set eventually
converting weak learners to more powerful models.
Gradient Boosting is an extension of the boosting procedure.
Gradient Boosting = Gradient Descent + Boosting
Advantages of using Gradient Boosting methods
• It supports different loss functions.
• It works well with interactions.
Boosting Algorithm Steps
Train a classifier A1 that best classify the data with respect to accuracy.
Identify the regions where A1 produces error, add weight to them and produce a A2
classifier.
Aggregate those samples for where ‘A1’ gives the different result from ‘A2’ and produce
‘A3’ classifier. Repeat step 2 for a new classifier.
AdaBoost
Boosting is technique of changing weak learner into strong learner . Each. new tree is a fit
on modified version of original dataset .
AdaBoost is the first boosting algorithm, to be adapted in solving practices.
It helps mixing multiple weak classifier into one strong classifier.
AdaBoost
Step 1
Assign equal weights to each data point and apply a decision stump to classify them as ‘+’ (plus)
and ‘-‘ (minus). For distinct attributes, the tree consists only of a single interior node. Now, apply
higher weights to incorrectly predicted three ‘+’(plus) and add another decision stump.
Step 2
The size of three incorrectly predicted + (plus) is much bigger than the rest of the data points.
The second decision stump (D2) will try to predict them correctly.
Now, Vertical plane (D2) has classified three misclassified ‘+’(plus) correctly.
D2 has also caused misclassified reporting to three ‘-‘ (minus)
Step 3
D3 adds higher weights to three ‘-‘ (minus)
A horizontal line is generated to classify ‘+’ (plus) and ‘-‘ (minus) based on higher weight of
misclassified observations.
Step 4
D1,D2 and D3 are combined to form a strong prediction that has a more complex rule than
individual weak learners.
AdaBoost Algorithm
Algorithm Summary:
Stumping:
A decision stump is a simple machine learning model that acts as a one-level decision tree. It makes a
decision based on a single attribute, splitting the input space into two regions using a threshold.
Stumps are extremely weak learners on their own, often giving poor classification performance if used
individually. However, they become powerful when combined using ensemble methods like AdaBoost.
In boosting, multiple stumps are trained sequentially, and each stump focuses on correcting the errors
made by the previous ones. The process begins with all training examples having equal weights. After
each stump is trained, the weights of the misclassified examples are increased, so that the next stump
focuses more on those difficult examples. Over several iterations, the boosted model builds a strong
classifier by combining the outputs of these simple stumps, each weighted according to its accuracy.
Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that
helps improve the performance and accuracy of machine learning algorithms. It is used
to deal with the bias-variance trade-offs and reduce the variance of the prediction
model. Bagging avoids data overfitting and is used for both regression and classification
models, specifically decision tree algorithms.
Example:
The Random Forest model uses Bagging, where decision tree models with higher
variance are present. It makes random feature selection to grow trees. Several random
trees make a Random Forest.
Implementation of Bagging
Multiple subsets are created from the original data set with equal tuples,
selecting observations with replacement.
A base model is created on each of these subsets.
Each model is learned in parallel with each training set and independent of each
other.
The final predictions are determined by combining the predictions from all the
models.
Advantages of Bagging
• Bagging minimizes the overfitting of data
• It improves the model’s accuracy
• It deals with higher dimensional data effcienty
Random Forest
Random Forest is a popular ensemble learning algorithm, which is an extension of the vanilla
bagging algorithm.
The first algorithm for random decision forests was created in 1995 by Tin Kam Ho. In this
algorithm, he introduced the idea of random feature selection for a high cardinality of feature
space, which is the key difference between vanilla bagging and random forest.
The algorithm for random forests is similar to that of bagging methods. However, in random
forests, a subset of pre-decided length is formed from original feature space for each of the
bootstrapped dataset.
The feature subset is chosen randomly without replacements. The length of the feature subsets,
ξf , is a hyperparameter.
A decision tree is formed for each dataset and corresponding feature space, leading to a
prediction from each. Final prediction is made following the same rules as of bagging, i.e. taking
mean for regression and mode for classification.
Random Forest, in general, is a good choice when we want a high-performing model with low
variance and low bias. It is particularly useful when we have a large number of strongly correlated
features, as the feature subsampling helps to decorrelate the models. Although, in instances, when
we don’t have a large sample-space or feature-space, or we need to find co-dependencies or
strong interpretation, it is more useful to use a simpler algorithm such as decision tree or support
vector machine. Nevertheless, Random Forest is one of the most powerful machine learning
algorithms we have and it’s been used in several complicated real life problems.
Random Forest is used in the banking and finance industry for tasks such as credit risk analysis,
fraud detection, and loan approval processes.
In e-commerce, it is used for tasks such as customer segmentation, personalized product
recommendations, and fraud detection.
Different Ways To Combine Classifiers.
In ensemble learning, combining the outputs of multiple classifiers is a critical step. There are several
methods to do this, depending on the type of task and the structure of the ensemble. The most basic
and widely used method is majority voting (for classification), where each base classifier gives a
predicted class, and the class that gets the most votes becomes the final output. For example, if three
classifiers predict the following for a sample: Class A, Class B, and Class A — the majority vote will
select Class A as the final prediction.
A refinement of this is weighted voting, where each classifier’s vote is weighted based on its past
performance or confidence level. For instance, if Classifier 1 is 90% accurate, Classifier 2 is 70%, and
Classifier 3 is 60%, their predictions can be weighted accordingly. Suppose their predictions are A, A,
and B — but the first classifier is more reliable — then Class A will dominate the final output due to its
higher weight.
For regression tasks, classifier outputs are numerical. Here, simple averaging is commonly used.
Suppose three regression models output predicted values: 3.2, 3.5, and 3.8. The final prediction will be
the average:
Prediction=3.2+3.5+3.83=3.5\text{Prediction} = \frac{3.2 + 3.5 + 3.8}{3} =
3.5Prediction=33.2+3.5+3.8=3.5
Again, this can be extended to weighted averaging, where better-performing models on validation
data contribute more to the final output.
Another more sophisticated approach is stacking (or stacked generalization). In this method, the
predictions from base classifiers are treated as inputs for a meta-learner, which learns how to best
combine them. For example, if we use a Decision Tree, a k-NN classifier, and an SVM to classify data,
and their predictions on an input are: A, B, A — then a Logistic Regression model can be trained as a
meta-learner to decide the final class. This is done by learning from the pattern of base classifier
outputs on a separate validation set. Marsland gives an example where this method outperforms both
bagging and boosting, especially when base models are heterogeneous.
In addition, Bayesian model averaging combines models probabilistically, weighting each model’s
prediction by its posterior probability. This method is powerful but computationally expensive and
less commonly used in standard ensemble setups.