0% found this document useful (0 votes)

22 views25 pages

Random Forest

Uploaded by

sakshamdharmik2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views25 pages

Random Forest

Uploaded by

sakshamdharmik2004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Random Forest

Introduction to Random Forest


Definition: Random Forest is an ensemble machine learning algorithm that uses multiple decision
trees to make a prediction, often improving accuracy and robustness.

Purpose: Used for both classification and regression tasks, Random Forest reduces overfitting
compared to individual decision trees.

Why Use It?: It provides high accuracy, handles large datasets well, and can manage missing values
and noisy data effectively.
Motivation

Limitations of Decision Trees
 Overfitting

 Instability

 sensitivity to data variations


Ensemble Learning

 powerful technique that combines multiple models to improve predictive performance

https://medium.com/@ilyurek/ensemble-learning-random-forests-bagging-random-subspace-and-boosting-713c7dbe6823
Random Forest?


It can be used for both
classification and regression
problems

Instead of searching for the most
important feature while splitting a
node, it searches for the best
feature among a random subset of
features

https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-
images/national/two-tree-random-forest.png
Random Forest Models vs. Decision Trees

Decision Tree 
Random Forest
 Uses the entire dataset to create a single set of  Builds multiple decision trees from random
rules subsets of data and features
 Can easily overfit when trees are deep (too  Combines results from all trees to improve
many rules) accuracy
  Reduces the risk of overfitting by using smaller
trees
 Slower to compute due to multiple trees but
more robust predictions
A real-life example Start

"Do you prefer warm

climates?"


Andrew wants to decide where to go during
his one-year vacation, so he asks the people Yes No

who know him best for suggestions "Do you like "Do you like
beaches?" mountains?"


The first friend he seeks out asks him about
Yes No Yes No
the likes and dislikes of his past travels. Based
on the answers, he will give Andrew some Recommendation Recommendation Recommendation Recommendation
(Hawai) (Alaska) (Switzerland) (Japan)
advice

Decision Tree Approach

A real-life example

Friend1 Friend 2 Friend 3

"Do you like cities" "Do you prefer warm "Do you like beaches"
climates"

Yes No Yes No Yes No

Recommendation Recommendation Recommendation Recommendation Recommendation Recommendation

(Paris) (Alaska) (Hawai) (Switzerland) (Maldives) (Sweden)

Random Forest Approach

How Random Forest Works?
Generate multiple subsets (samples with replacement) from the training data
Bootstrap Sampling

Train Decision Train each decision tree on a unique sample from the bootstrap sampling
Trees process

Random Feature Each tree only considers a random subset of features at each split, increasing
Selection diversity among trees

Aggregation of 
Classification: Each tree votes for a class, and the most common class among
Results trees is the final prediction (majority voting).


Regression: The average prediction across trees is used as the final result
Key Concepts and Parameters

Number of Trees (n_estimators): Controls the number of trees in the forest. More trees can
improve performance but increase computation.

Max Features (max_features): Sets the maximum number of features considered at each split,
which helps in controlling overfitting and tree diversity.

Out-of-Bag (OOB) Error: Explain how the unused data points (left out of the bootstrap samples)
help estimate the model’s accuracy without cross-validation.

Tree Depth: Controlling the depth of each tree affects the model’s complexity and risk of
overfitting.
Out-of-bag error example


We will build a Random Forest
Student Hours
with 3 decision trees. Each tree is studied
1 Yes
trained on a bootstrap sample
2 No
(sampling with replacement) from 4 No
5 Yes
this dataset
6 No
7 Yes
8 No
Step 1: Training each tree on a bootstrap sample


Tree 1 is trained on samples: {1, 2, 3, 4, 5, 6}
Student Hours
 OOB samples for Tree 1: {7, 8} studied
1 Yes

Tree 2 is trained on samples: {2, 3, 5, 6, 7, 8} 2 No
4 No
 OOB samples for Tree 2: {1, 4} 5 Yes

Tree 3 is trained on samples: {1, 3, 4, 5, 7, 8} 6 No
7 Yes
 OOB samples for Tree 3: {2, 6} 8 No
Step 2: Making predictions for out-of-bag samples


Tree 1 OOB predictions:
Student Hours
 Sample 7: Predicts Yes (correct) studied
 Sample 8: Predicts Yes (incorrect) 1 Yes
2 No

Tree 2 OOB predictions:
4 No
5 Yes
 Sample 1: Predicts Yes (correct)
6 No
 Sample 4: Predicts No (correct) 7 Yes

Tree 3 OOB predictions: 8 No

 Sample 2: Predicts No (correct)

 Sample 6: Predicts Yes (incorrect)
Step 3: Aggregating OOB predictions for each sample


Sample 1: 
Sample 6:

 OOB prediction from Tree 2: Yes (correct)  OOB prediction from Tree 3: Yes (incorrect)

 Final prediction: Yes  Final prediction: Yes

 Correct  Incorrect


Sample 2: 
Sample 7:

 OOB prediction from Tree 3: No (correct)  OOB prediction from Tree 1: Yes (correct)

 Final prediction: No  Final prediction: Yes

 Correct  Correct


Sample 4: 
Sample 8:

 OOB prediction from Tree 2: No (correct)  OOB prediction from Tree 1: Yes (incorrect)

 Final prediction: No  Final prediction: Yes

 Correct  Incorrect
Step 4: Calculating the OOB error


Out of the six samples with OOB predictions, four were correctly classified, and two were
misclassified.


The OOB error rate is

 2/6 = 33.3 %


The OOB error rate is 33.3%, which gives an estimate of the Random Forest's generalization
error without needing a separate test set
Random forest example


Suppose we have a dataset of students with
two features:
Student Hours Preparation Pass/Fail
studied quality
 Hours Studied
A 5 Good Pass(1)
 Test Preparation Quality (rated as B 2 Poor Fail(0)
C 8 Good Pass(1)
"Good" or "Poor")
D 1 Poor Fail(0)

Our target is to predict if a student will E 7 Poor Pass(1)
Pass (Class 1) or Fail (Class 0) based on F 3 Good Fail(0)

these features
Step 1: Bootstrapping (Sampling with Replacement)
Student Hours studied Preparation Pass/Fail
quality
A 5 Good Pass(1)

Tree 1: Sampled Data C 8 Good Pass(1)
E 7 Poor Pass(1)
B 2 Poor Fail(0)

Student Hours studied Preparation Pass/Fail

quality
 Tree 2: Sampled Data D 1 Poor Fail(0)
F 3 Good Fail(0)
A 5 Good Pass(1)
C 8 Good Pass(1)

Student Hours studied Preparation Pass/Fail

quality
 Tree 3: Sampled Data E 7 Poor Pass(1)
B 2 Poor Fail(0)
F 3 Good Fail(0)
C 8 Good Pass(1)
Step 2: Building Individual Decision Trees
Student Hours studied Preparation Pass/Fail
quality

Tree 1: A 5 Good Pass(1)
C 8 Good Pass(1)
 If Hours Studied > 6, predict Pass (1).
E 7 Poor Pass(1)
 If Hours Studied ≤ 6, predict Fail (0). B 2 Poor Fail(0)

Tree 2: Student Hours studied Preparation Pass/Fail
quality
 If Preparation Quality is Good, predict D 1 Poor Fail(0)
Pass (1). F 3 Good Fail(0)
A 5 Good Pass(1)
 If Preparation Quality is Poor, predict Fail
C 8 Good Pass(1)
(0).
Student Hours studied Preparation Pass/Fail

Tree 3: quality
E 7 Poor Pass(1)
 If Hours Studied > 4, predict Pass (1).
B 2 Poor Fail(0)
 If Hours Studied ≤ 4, predict Fail (0). F 3 Good Fail(0)
C 8 Good Pass(1)
Step 3: Classifying a New Data Point


Suppose we want to classify a new student 
Tree 1:
who:  If Hours Studied > 6, predict Pass (1).
 Studied for 4 hours and has a  If Hours Studied ≤ 6, predict Fail (0).
Preparation Quality of "Good" 
Tree 2:

Each tree in the forest makes its prediction:  If Preparation Quality is Good, predict
 Tree 1: Since 4 ≤ 6, Tree 1 predicts Fail Pass (1).
(0)  If Preparation Quality is Poor, predict Fail
(0).
 Tree 2: Since Preparation Quality is
"Good", Tree 2 predicts Pass (1)

Tree 3:

 Tree 3: Since 4 ≤ 4, Tree 3 predicts Fail  If Hours Studied > 4, predict Pass (1).
(0)  If Hours Studied ≤ 4, predict Fail (0).
Step 4: Aggregating Predictions


Take the majority vote of all trees for the final prediction

 Tree 1 predicts Fail (0)

 Tree 2 predicts Pass (1)
 Tree 3 predicts Fail (0)

The majority prediction is Fail (0), so the final prediction for this new student is Fail (0)
Parameters and impact


n_estimators: number of decision trees in the forest.


Impact on Performance:

 Accuracy: Increasing the number of trees generally improves the model’s accuracy because it
reduces the overall variance

 Overfitting: Having too many trees rarely leads to overfitting since Random Forest is naturally
resistant to it. However, after a certain point, adding more trees yields diminishing returns, and
accuracy improvements plateau

 Computational Cost: Higher n_estimators increase computational cost and memory usage since
each tree requires computation and storage.

 Typical Values: Common values are 100-500, but it depends on the dataset size and computational
resources
Parameters and impact


max_features: controls the maximum number of features considered for splitting at each node in a tree


Impact on Performance

 Diversity and Overfitting: A lower max_features increases the diversity among trees, making them more independent,
which helps prevent overfitting and improves generalization. If max_features is too high (close to the total number of
features), each tree becomes more similar, reducing the benefit of having multiple trees
 Accuracy: The optimal max_features value balances accuracy and independence among trees. Setting max_features to the
square root of the total features for classification or one-third of the total features for regression is a common rule of
thumb
 Computational Efficiency: Lowering max_features can speed up training since fewer features are evaluated at each
split. However, setting it too low may decrease accuracy as each tree might lack sufficient information to make accurate
splits.

Typical Values
 Classification: Often set to sqrt(total_features)
 Regression: Often set to total_features / 3
Parameters and impact


max_depth: maximum number of levels (depth) allowed for each tree


Impact on Performance

 Overfitting vs. Underfitting: A higher max_depth allows trees to learn more complex patterns but also
increases the risk of overfitting, especially if the trees become too deep and learn noise in the training data.
Conversely, a low max_depth may lead to underfitting, as each tree might not capture enough information to make
accurate predictions
 Model Complexity and Interpretability: Higher depths lead to more complex trees that are harder to
interpret. Restricting max_depth can help simplify the model and reduce overfitting, especially in cases with limited
data
 Computational Cost: Deeper trees require more computation. Limiting max_depth can reduce the training time
and memory usage, making the model more efficient.

Typical Values
 Values between 10-20 are common, though they vary depending on the dataset size and complexity
Grid Search and Cross-Validation


Cross-Validation:

 technique for assessing how a machine learning model generalizes to unseen data.

In K-fold cross-validation, the dataset is split into K subsets (or folds)

For each fold, the model trains on K−1 folds and tests on the remaining fold


This process repeats K times, with each fold serving once as the test set


The model’s performance metrics (like accuracy or RMSE) are averaged across all K runs to get a
more reliable estimate


In the case of a Random Forest classifier, cross-validation helps in evaluating the model's accuracy and
robustness
Grid Search and Cross-Validation


Grid Search:

 method for finding the best combination of hyperparameters for a model by exhaustively
searching through a specified parameter grid

Each possible combination of these hyperparameters is tested, and the model’s performance is
evaluated (usually with cross-validation) for each combination

The result is the combination of parameters that yields the best performance
Thank you for
your attention

The Design Thinking Series
100% (1)
The Design Thinking Series
12 pages
Dissertation Zum Selben Thema
100% (2)
Dissertation Zum Selben Thema
6 pages
Architecture Thesis Statement Examples
100% (3)
Architecture Thesis Statement Examples
4 pages
CE880_Lecture7_slides
No ratings yet
CE880_Lecture7_slides
78 pages
Guided Reading Templates and Resources
No ratings yet
Guided Reading Templates and Resources
46 pages
Decision Tree, Random Forest
No ratings yet
Decision Tree, Random Forest
37 pages
MIS410-Chapter6
No ratings yet
MIS410-Chapter6
47 pages
Present
No ratings yet
Present
20 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
16 pages
Lecture 19 Different Classification Models
No ratings yet
Lecture 19 Different Classification Models
22 pages
Random Forest
No ratings yet
Random Forest
16 pages
Random Forest
No ratings yet
Random Forest
16 pages
Comprehension Correlation, Translation, Translation Ability, Reading
No ratings yet
Comprehension Correlation, Translation, Translation Ability, Reading
9 pages
Info4 en
No ratings yet
Info4 en
9 pages
The English Language and Linguistic Imperialism: The Trojan Horse?
No ratings yet
The English Language and Linguistic Imperialism: The Trojan Horse?
18 pages
Random Forest PDF
No ratings yet
Random Forest PDF
14 pages
Gabungan Tes Proyektif
No ratings yet
Gabungan Tes Proyektif
111 pages
Introduction and Implementation of E-Learning System in Secondary Schools in Nigeria.
No ratings yet
Introduction and Implementation of E-Learning System in Secondary Schools in Nigeria.
21 pages
The Enhancement Functions - PROPOSAL VERSION
No ratings yet
The Enhancement Functions - PROPOSAL VERSION
5 pages
INF1 CG 2014 Assignment 1 Literature Review Greebles
No ratings yet
INF1 CG 2014 Assignment 1 Literature Review Greebles
5 pages
A Compleate Guide To Creating Charters
No ratings yet
A Compleate Guide To Creating Charters
4 pages
Andria B. Navarra: Home Address: #73 San Nicolas 1 Mobile Number: +639165913801 Email Address
No ratings yet
Andria B. Navarra: Home Address: #73 San Nicolas 1 Mobile Number: +639165913801 Email Address
3 pages
CONTEXTUAL
No ratings yet
CONTEXTUAL
4 pages
Date: March 18, 2022 Score: GEC 2: Readings in Philippine History
No ratings yet
Date: March 18, 2022 Score: GEC 2: Readings in Philippine History
3 pages
FS1 Chapter3
No ratings yet
FS1 Chapter3
15 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Gravity Rules! Integrated Math and Science Lesson Plan
No ratings yet
Gravity Rules! Integrated Math and Science Lesson Plan
3 pages
CS326Report
No ratings yet
CS326Report
36 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
No ratings yet
DistilBERT, A Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter
5 pages
Unit 1 Introduction To School Psychology: Structure
No ratings yet
Unit 1 Introduction To School Psychology: Structure
12 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Coursework Details Philosophical Ideas in Education
No ratings yet
Coursework Details Philosophical Ideas in Education
5 pages
Fm-Ims-Gr-003 - Caf Rev. 0
No ratings yet
Fm-Ims-Gr-003 - Caf Rev. 0
3 pages
Random Forests
No ratings yet
Random Forests
43 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
The Relationship Between Self-Confidence With Achievement Based On Academic Motivation
No ratings yet
The Relationship Between Self-Confidence With Achievement Based On Academic Motivation
6 pages
Random Forest
No ratings yet
Random Forest
3 pages
Da MS
No ratings yet
Da MS
24 pages
Learning Module 1 - Understanding Diversity
No ratings yet
Learning Module 1 - Understanding Diversity
13 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Random Forests
No ratings yet
Random Forests
35 pages
Cognitive Distortions: Carrie L. Yurica and Robert A. Ditomasso
No ratings yet
Cognitive Distortions: Carrie L. Yurica and Robert A. Ditomasso
2 pages
Study Notebook - Answer Key - Soft
No ratings yet
Study Notebook - Answer Key - Soft
54 pages
Random Forest
No ratings yet
Random Forest
29 pages
ML Mid Question Solve
No ratings yet
ML Mid Question Solve
19 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random Forest
No ratings yet
Random Forest
6 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
9 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Machine Learning Algorithms, Real World Applications and Research
No ratings yet
Machine Learning Algorithms, Real World Applications and Research
21 pages
Lecture 11 Slides - After
No ratings yet
Lecture 11 Slides - After
55 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
03_Random Forest
No ratings yet
03_Random Forest
24 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
PDS+LVC+2+Post-Session+Summary
No ratings yet
PDS+LVC+2+Post-Session+Summary
11 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Machine learning
No ratings yet
Machine learning
5 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
ML Asst.-01(25) (1)
No ratings yet
ML Asst.-01(25) (1)
21 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest
No ratings yet
Random Forest
8 pages
ML-Lec6
No ratings yet
ML-Lec6
4 pages
13030822039_Aditri Chaudhuri_DM_
No ratings yet
13030822039_Aditri Chaudhuri_DM_
10 pages
Random Forest Algorithm Updated PPT
No ratings yet
Random Forest Algorithm Updated PPT
11 pages
05.Random Forest (2)
No ratings yet
05.Random Forest (2)
3 pages
Random Forest
No ratings yet
Random Forest
14 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Random+Forest+Summary
No ratings yet
Random+Forest+Summary
6 pages
Content-Area Instruction For Ells: Connecti Es Arch
No ratings yet
Content-Area Instruction For Ells: Connecti Es Arch
15 pages
Practical Research Ii Week 1
No ratings yet
Practical Research Ii Week 1
44 pages
Project Basa and Project Kaugop For Annual Report
100% (2)
Project Basa and Project Kaugop For Annual Report
3 pages
INC3701 Assignment 2
No ratings yet
INC3701 Assignment 2
7 pages
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
From Everand
Learn The Basics Of Decision Trees A Popular And Powerful Machine Learning Algorithm
UBER AUTHOR
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
High School Pre-Calculus Tutor
From Everand
High School Pre-Calculus Tutor
The Editors of REA
4/5 (1)
Answers to College General Biology Exams
From Everand
Answers to College General Biology Exams
John Janovy Jr.
5/5 (3)

Random Forest

Uploaded by

Random Forest

Uploaded by

Random Forest

Introduction to Random Forest

 sensitivity to data variations

 powerful technique that combines multiple models to improve predictive performance

"Do you prefer warm

Decision Tree Approach

Friend1 Friend 2 Friend 3

Yes No Yes No Yes No

Recommendation Recommendation Recommendation Recommendation Recommendation Recommendation

Random Forest Approach

 Sample 2: Predicts No (correct)

 Final prediction: Yes  Final prediction: Yes

 Final prediction: No  Final prediction: Yes

 Final prediction: No  Final prediction: Yes

Student Hours studied Preparation Pass/Fail

Student Hours studied Preparation Pass/Fail

 Tree 1 predicts Fail (0)

You might also like