Bagging

The document discusses ensemble learning techniques, focusing on boosting methods like AdaBoost and Gradient Boosting, as well as bagging techniques including Random Forests. It explains the algorithms' steps, advantages, and applications in various fields such as finance and e-commerce. Additionally, it covers different ways to combine classifiers, including majority voting, weighted voting, and stacking.

Uploaded by

santhiya.s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views7 pages

Bagging

Uploaded by

santhiya.s

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

MODULE-3

Decision by Committee: Ensemble Learning: Boosting: Adaboost , Stumping, Bagging: Subagging,

Random Forests, Comparison With Boosting, Different Ways To Combine Classifiers.
Unsupervised Learning: The K-MEANS algorithm : Dealing with Noise ,The k-Means Neural
Network , Normalisation ,A Better Weight Update Rule ,Using Competitive Learning for Clustering

Decision by Committee:
Ensemble Learning
Ensemble learning is a widely used and preferred machine learning technique in which
multiple individual models, often called base models, are combined to produce an effective
optimal prediction model. The Random Forest algorithm is an example of ensemble learning.

Boosting
Boosting is another ensemble procedure for creating a collection of predictors. In other words,
we fit successive trees, usually random samples, and at each step the goal is to resolve the net
error from the previous trees.
If a given input is misclassified by the theory, then its weight is increased so that the upcoming
hypothesis is more likely to classify it correctly by consolidating the whole set eventually
converting weak learners to more powerful models.
Gradient Boosting is an extension of the boosting procedure.

Gradient Boosting = Gradient Descent + Boosting

Advantages of using Gradient Boosting methods

• It supports different loss functions.
• It works well with interactions.
Boosting Algorithm Steps

Train a classifier A1 that best classify the data with respect to accuracy.

Identify the regions where A1 produces error, add weight to them and produce a A2
classifier.

Aggregate those samples for where ‘A1’ gives the different result from ‘A2’ and produce
‘A3’ classifier. Repeat step 2 for a new classifier.

AdaBoost

Boosting is technique of changing weak learner into strong learner . Each. new tree is a fit
on modified version of original dataset .

 AdaBoost is the first boosting algorithm, to be adapted in solving practices.

 It helps mixing multiple weak classifier into one strong classifier.

AdaBoost

Step 1
 Assign equal weights to each data point and apply a decision stump to classify them as ‘+’ (plus)
and ‘-‘ (minus). For distinct attributes, the tree consists only of a single interior node. Now, apply
higher weights to incorrectly predicted three ‘+’(plus) and add another decision stump.

Step 2
 The size of three incorrectly predicted + (plus) is much bigger than the rest of the data points.
 The second decision stump (D2) will try to predict them correctly.
 Now, Vertical plane (D2) has classified three misclassified ‘+’(plus) correctly.
 D2 has also caused misclassified reporting to three ‘-‘ (minus)
Step 3
 D3 adds higher weights to three ‘-‘ (minus)
 A horizontal line is generated to classify ‘+’ (plus) and ‘-‘ (minus) based on higher weight of
misclassified observations.

Step 4
 D1,D2 and D3 are combined to form a strong prediction that has a more complex rule than
individual weak learners.

AdaBoost Algorithm

Algorithm Summary:

Stumping:

A decision stump is a simple machine learning model that acts as a one-level decision tree. It makes a
decision based on a single attribute, splitting the input space into two regions using a threshold.
Stumps are extremely weak learners on their own, often giving poor classification performance if used
individually. However, they become powerful when combined using ensemble methods like AdaBoost.
In boosting, multiple stumps are trained sequentially, and each stump focuses on correcting the errors
made by the previous ones. The process begins with all training examples having equal weights. After
each stump is trained, the weights of the misclassified examples are increased, so that the next stump
focuses more on those difficult examples. Over several iterations, the boosted model builds a strong
classifier by combining the outputs of these simple stumps, each weighted according to its accuracy.

Bagging
Bagging, also known as Bootstrap aggregating, is an ensemble learning technique that
helps improve the performance and accuracy of machine learning algorithms. It is used
to deal with the bias-variance trade-offs and reduce the variance of the prediction
model. Bagging avoids data overfitting and is used for both regression and classification
models, specifically decision tree algorithms.
Example:

The Random Forest model uses Bagging, where decision tree models with higher
variance are present. It makes random feature selection to grow trees. Several random
trees make a Random Forest.

Implementation of Bagging
 Multiple subsets are created from the original data set with equal tuples,
selecting observations with replacement.
 A base model is created on each of these subsets.
 Each model is learned in parallel with each training set and independent of each
other.
 The final predictions are determined by combining the predictions from all the
models.

Advantages of Bagging
• Bagging minimizes the overfitting of data
• It improves the model’s accuracy
• It deals with higher dimensional data effcienty

Random Forest

Random Forest is a popular ensemble learning algorithm, which is an extension of the vanilla
bagging algorithm.
 The first algorithm for random decision forests was created in 1995 by Tin Kam Ho. In this
algorithm, he introduced the idea of random feature selection for a high cardinality of feature
space, which is the key difference between vanilla bagging and random forest.
 The algorithm for random forests is similar to that of bagging methods. However, in random
forests, a subset of pre-decided length is formed from original feature space for each of the
bootstrapped dataset.
 The feature subset is chosen randomly without replacements. The length of the feature subsets,
ξf , is a hyperparameter.
 A decision tree is formed for each dataset and corresponding feature space, leading to a
prediction from each. Final prediction is made following the same rules as of bagging, i.e. taking
mean for regression and mode for classification.

 Random Forest, in general, is a good choice when we want a high-performing model with low
variance and low bias. It is particularly useful when we have a large number of strongly correlated
features, as the feature subsampling helps to decorrelate the models. Although, in instances, when
we don’t have a large sample-space or feature-space, or we need to find co-dependencies or
strong interpretation, it is more useful to use a simpler algorithm such as decision tree or support
vector machine. Nevertheless, Random Forest is one of the most powerful machine learning
algorithms we have and it’s been used in several complicated real life problems.
 Random Forest is used in the banking and finance industry for tasks such as credit risk analysis,
fraud detection, and loan approval processes.
 In e-commerce, it is used for tasks such as customer segmentation, personalized product
recommendations, and fraud detection.


Different Ways To Combine Classifiers.

In ensemble learning, combining the outputs of multiple classifiers is a critical step. There are several
methods to do this, depending on the type of task and the structure of the ensemble. The most basic
and widely used method is majority voting (for classification), where each base classifier gives a
predicted class, and the class that gets the most votes becomes the final output. For example, if three
classifiers predict the following for a sample: Class A, Class B, and Class A — the majority vote will
select Class A as the final prediction.

A refinement of this is weighted voting, where each classifier’s vote is weighted based on its past
performance or confidence level. For instance, if Classifier 1 is 90% accurate, Classifier 2 is 70%, and
Classifier 3 is 60%, their predictions can be weighted accordingly. Suppose their predictions are A, A,
and B — but the first classifier is more reliable — then Class A will dominate the final output due to its
higher weight.

For regression tasks, classifier outputs are numerical. Here, simple averaging is commonly used.
Suppose three regression models output predicted values: 3.2, 3.5, and 3.8. The final prediction will be
the average:

Prediction=3.2+3.5+3.83=3.5\text{Prediction} = \frac{3.2 + 3.5 + 3.8}{3} =

3.5Prediction=33.2+3.5+3.8=3.5

Again, this can be extended to weighted averaging, where better-performing models on validation
data contribute more to the final output.

Another more sophisticated approach is stacking (or stacked generalization). In this method, the
predictions from base classifiers are treated as inputs for a meta-learner, which learns how to best
combine them. For example, if we use a Decision Tree, a k-NN classifier, and an SVM to classify data,
and their predictions on an input are: A, B, A — then a Logistic Regression model can be trained as a
meta-learner to decide the final class. This is done by learning from the pattern of base classifier
outputs on a separate validation set. Marsland gives an example where this method outperforms both
bagging and boosting, especially when base models are heterogeneous.

In addition, Bayesian model averaging combines models probabilistically, weighting each model’s
prediction by its posterior probability. This method is powerful but computationally expensive and
less commonly used in standard ensemble setups.

Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
54 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Ensembling Techniques
No ratings yet
Ensembling Techniques
11 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Bagging Vs Boosting - Javatpoint
No ratings yet
Bagging Vs Boosting - Javatpoint
8 pages
Module 2
No ratings yet
Module 2
34 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble Learning
No ratings yet
Ensemble Learning
16 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Unit 3
No ratings yet
Unit 3
63 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Unit 3
No ratings yet
Unit 3
59 pages
Bagging Vs Boosting in Machine Learning
100% (1)
Bagging Vs Boosting in Machine Learning
4 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
ML U3 Notes
No ratings yet
ML U3 Notes
10 pages
Ensemble Learning Techniques Explained
100% (1)
Ensemble Learning Techniques Explained
12 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Ensembles
No ratings yet
Ensembles
9 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Ensemble - Part 1
No ratings yet
Ensemble - Part 1
33 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Module 7 Notes
No ratings yet
Module 7 Notes
3 pages
Bagging Vs Boosting in Machine Learning
100% (1)
Bagging Vs Boosting in Machine Learning
5 pages
Understanding Bagging and Boosting in ML
No ratings yet
Understanding Bagging and Boosting in ML
6 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Ensemble Methods Final PDF
No ratings yet
Ensemble Methods Final PDF
25 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Understanding Ensemble Learning Techniques
No ratings yet
Understanding Ensemble Learning Techniques
4 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Spearman Rank Correlation Coefficient
No ratings yet
Spearman Rank Correlation Coefficient
1 page
Written Assignment STATISTIC Unit2
No ratings yet
Written Assignment STATISTIC Unit2
4 pages
Brochure Online Internship Summer Training and Orientation Program During 20 June-20 10 July - 2025
No ratings yet
Brochure Online Internship Summer Training and Orientation Program During 20 June-20 10 July - 2025
6 pages
Project Report IITM SHALINI
No ratings yet
Project Report IITM SHALINI
8 pages
Statistical Analysis of Home Appliances and Cigarette Nicotine Content
No ratings yet
Statistical Analysis of Home Appliances and Cigarette Nicotine Content
8 pages
Correlation Coefficient Basics
No ratings yet
Correlation Coefficient Basics
13 pages
Econometrics Project - Maternal Mortality Analysis
No ratings yet
Econometrics Project - Maternal Mortality Analysis
23 pages
PCA Guide for Data Scientists
No ratings yet
PCA Guide for Data Scientists
11 pages
Note Multivariate Analysis of Variance
No ratings yet
Note Multivariate Analysis of Variance
3 pages
ISEN 350 Lab 4 502
No ratings yet
ISEN 350 Lab 4 502
2 pages
Phishing URL Detection Analysis
No ratings yet
Phishing URL Detection Analysis
25 pages
MATHEMATICAL FOUNDATIONS FOR COMPUTER SCIENCE Paper
No ratings yet
MATHEMATICAL FOUNDATIONS FOR COMPUTER SCIENCE Paper
2 pages
Stat Support Activity Significance Levels
No ratings yet
Stat Support Activity Significance Levels
3 pages
Activity3 SaezJohnCarlo NW3C
No ratings yet
Activity3 SaezJohnCarlo NW3C
2 pages
(Ebook PDF) An Introduction To Categorical Data Analysis by Alan Agrestiinstant Download
86% (7)
(Ebook PDF) An Introduction To Categorical Data Analysis by Alan Agrestiinstant Download
52 pages
Discrete Probability Distributions Explained
No ratings yet
Discrete Probability Distributions Explained
27 pages
E - Jurnal Riset Manajemen Fakultas Ekonomi Dan Bisnis Unisma Website
No ratings yet
E - Jurnal Riset Manajemen Fakultas Ekonomi Dan Bisnis Unisma Website
14 pages
Data SPSS
No ratings yet
Data SPSS
14 pages
CBSE Class 10 Maths Worksheet - Statistics (9) - 0
No ratings yet
CBSE Class 10 Maths Worksheet - Statistics (9) - 0
3 pages
تقدير الذات و علاقته بمستوى الطموح الاكاديمي لدى الطال
No ratings yet
تقدير الذات و علاقته بمستوى الطموح الاكاديمي لدى الطال
125 pages
Data Science Cs3362 Lab Record
No ratings yet
Data Science Cs3362 Lab Record
39 pages
4 Modeling Causes of Cost Overrun in Large Construction Projects
No ratings yet
4 Modeling Causes of Cost Overrun in Large Construction Projects
10 pages
Mini Project Answer
No ratings yet
Mini Project Answer
5 pages
Assignment 7-Inference-for-Numerical-Data
No ratings yet
Assignment 7-Inference-for-Numerical-Data
5 pages
Points and Interval Estimates in Statistics
No ratings yet
Points and Interval Estimates in Statistics
23 pages
CS373 Lecture18.1
No ratings yet
CS373 Lecture18.1
33 pages
Intro to Machine Learning Algorithms
No ratings yet
Intro to Machine Learning Algorithms
72 pages
Chapter 10 Research
No ratings yet
Chapter 10 Research
3 pages
Ratio Estimators in Simple Random Sampling Using Information On Auxiliary Attribute
No ratings yet
Ratio Estimators in Simple Random Sampling Using Information On Auxiliary Attribute
12 pages
M-Estimation in Robust Regression
No ratings yet
M-Estimation in Robust Regression
8 pages