0% found this document useful (0 votes)

128 views

Random Forest Presentation

Uploaded by

mohanmanidharb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

128 views

Random Forest Presentation

Uploaded by

mohanmanidharb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

A Deep Dive into Random Forests

By: Amit Kumar(EC21B004)

Anurudh kumar(EC21B010)
Abhishek kumar(EC21B001)
Random Forests May Seem Scary…
But They’re Actually Not Too Bad!
Plan
• Decision Tree
• Random Forests (+ Bagging)
• Tree Optimization and Feature Importance (Gini Criterion)
• Model Regularization
• Closing Notes
Decision Tree
Decision Tree
Decision Tree
Properties
• Feature can show up more
than once in different
branches (e.g. windy).
• Node can have both a
branch and leaf stemming
from it.
Decision Tree – Pros and Cons

Pros:
• Non-linear decision boundaries
• Easy to interpret
• Numerical & Categorical Data
Decision Tree – Pros and Cons

Cons:
• Easy to Overfit
• High Variance (i.e. unstable).
Random Forests
What is Random Forest?
• Random Forest is an ensemble learning method used for classification
and regression tasks.

• It builds multiple decision trees and merges them to get a more

accurate and stable prediction.

• Key Term: Ensemble Learning - Combining multiple models to

improve accuracy.
How Does Random Forest Work?
• Random Forest creates multiple decision trees from different subsets
of data.

• Each decision tree makes a prediction.

• For classification: uses majority voting.

• For regression: averages the outputs of all decision trees.

Random Forests – Many Decision
Trees
• We can guess that a Random Forest = many decision trees. But how?
• Many copies of the exact same tree is useless…
Random Forests – Many Decision
Trees
• OK, so we want some tree variation, but how…
Random Forests – Many Decision
Trees Remember: Decision Tree
• We want tree variation, but how…
• Vary trees such that overall variance is reduced:
Random Forests – Many Decision
Trees Remember: Decision Tree
• We want tree variation, but how…
• Vary trees such that overall variance is reduced:
• STATS101:
Given a set of independent, uncorrelated observations
each with variance , the variance of is .
Random Forests – Many Decision
Trees Remember: Decision Tree
• We want tree variation, but how…
• Vary trees such that overall variance is reduced:
• STATS101:
Given a set of independent, uncorrelated observations
each with variance , the variance of is .

This is why a forest of

identical trees is useless.
Random Forests – Many Decision
Trees Remember: Decision Tree
• We want tree variation, but how…
• Vary trees such that overall variance is reduced:
• STATS101:
Given a set of independent, uncorrelated observations
each with variance , the variance of is .

This is why a forest of This is why ensembling

identical trees is useless. many models together
always improves results.
Random Forests – Many Decision
Trees
• How do we make trees as independent and uncorrelated
as possible?
Random Forests – Randomize Data
Random Forests – Randomize Data
Random Forests – Intuition Check

• What happens if you assign more/less data per tree?

• What happens if you select more/less of the total features per tree:
Random Forests – Intuition Check

• What happens if you assign more/less data per tree?

Less: Trees more uncorrelated, but at some point too little data hurts training.
More: Trees become more correlated, but training of each tree improved.

• What happens if you select more/less of the total features per tree:
Less: Trees more uncorrelated, but at some point many trees become “dead”,
i.e. fitting entire trees on unimportant features.
More: Trees become more correlated, but training of each tree improved.
Tree Optimization and Feature
Importance
Tree Optimization – Greedy Criterion
• Trees grown according to what the local
best option is.
• Criterion: Gini, Information Gain.
Short Aside - Greedy Algorithm
Example
Example: Find largest path.
Tree Optimization – Greedy Criterion
• Note: The criterion governing tree
growth is different than your global cost
function (e.g. precision-recall, accuracy,
etc.), which determines how well your
entire model is doing.
Tree Optimization – Gini Impurity
“Gini impurity is a measure of how often a randomly chosen element
from the set would be incorrectly labeled if it was randomly labeled
according to the distribution of labels in the subset.”

( )
𝐶𝑛 2
𝑁 𝑛 ,𝑖
𝐺𝑛 =1 − ∑
𝑖= 0 𝑁𝑛
Tree Optimization – Gini Impurity
Example

Splits decided such that the gini impurity is locally minimized

Model Regularization
Model Regularization
• Main complexity parameter
is max_depth of the tree.
• Deep trees can split the
data up more, leading to
overfitting.
• Some nodes here have a
single sample in it!
Model Regularization

Deep, Unregularized Tree Shallower, Regularized Tree

Advantage of Random forest
1. High Accuracy:
Random Forests usually provide more accurate predictions than a single decision tree because they
combine the outputs of many trees, reducing overfitting.

2. Handles Both Categorical and Numerical Data:

Random Forests can work with both types of data, making them versatile in various applications.

3. Robust to Overfitting:
While a single decision tree may overfit to noisy data, Random Forests reduce this risk by averaging
the predictions of multiple trees, which smoothens out irregularities.

4. Handles Missing Data Well:

Random Forests can handle missing data through the use of surrogate splits, which allows the
model to still make predictions when some feature values are missing.
Disadvantage of Random forest
1. Computationally Intensive:
Training a Random Forest can be computationally expensive, especially when there are a large
number of trees or when the dataset is very large.

2. Slower Prediction Time:

Because it averages the output of many trees, Random Forests may take longer to make predictions
compared to simpler models like logistic regression or individual decision trees.

3. Not Suitable for Real-Time Predictions:

Due to its complexity and slower prediction time, Random Forests may not be the best choice for real-
time or low-latency applications.

4. Less Effective for Small Data Sets:

If the dataset is small, Random Forests may not perform significantly better than simpler algorithms, as
the gain from assembling multiple trees is minimal when there isn't much data to work with."
Applications of Random Forest
• Classification: Email spam detection, image classification, sentiment
analysis.

• Regression: Stock market prediction, weather forecasting, medical

diagnosis.

• Feature Selection: Identifying important features in datasets.

Conclusion
• Random Forest is a powerful algorithm for classification and
regression.

• It is robust to overfitting and works well with diverse datasets.

• However, it can be resource-intensive and difficult to interpret.

The End
Insight Fello w 𝑖

Watched the Paid attention

clock/dozed off the entire time

Test Help Stat
No ratings yet
Test Help Stat
18 pages
The Fundamentals of Segmented Woodturning: Projects, Techniques & Innovations for Today’s Woodturner
From Everand
The Fundamentals of Segmented Woodturning: Projects, Techniques & Innovations for Today’s Woodturner
James Rodgers
4/5 (1)
(Worksheet # 5) Advanced ANOVA Procedures
0% (1)
(Worksheet # 5) Advanced ANOVA Procedures
2 pages
Random Forest Medical Diagnosis 1684665707
No ratings yet
Random Forest Medical Diagnosis 1684665707
10 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Random_Forest_Algorithm
No ratings yet
Random_Forest_Algorithm
2 pages
Machine learning
No ratings yet
Machine learning
5 pages
Da MS
No ratings yet
Da MS
24 pages
ML-Lec6
No ratings yet
ML-Lec6
4 pages
Random Forest in ML
No ratings yet
Random Forest in ML
13 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
hamza samad 3
No ratings yet
hamza samad 3
2 pages
Random Forest
No ratings yet
Random Forest
29 pages
Random Forest
No ratings yet
Random Forest
18 pages
Random Forest
No ratings yet
Random Forest
8 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Random forest algorithm 1
No ratings yet
Random forest algorithm 1
14 pages
Random Forest
No ratings yet
Random Forest
21 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Random Forest
No ratings yet
Random Forest
2 pages
Random Forest
No ratings yet
Random Forest
13 pages
Random Forest
No ratings yet
Random Forest
6 pages
Random Forest
No ratings yet
Random Forest
25 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Random Forest
No ratings yet
Random Forest
8 pages
8. Unleashing the power of random forest- A journey through algorithmic canopies (1)
No ratings yet
8. Unleashing the power of random forest- A journey through algorithmic canopies (1)
14 pages
Report On Random Forest
No ratings yet
Report On Random Forest
3 pages
Random Forest
No ratings yet
Random Forest
32 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Ushna FYP
No ratings yet
Ushna FYP
25 pages
03_Random Forest
No ratings yet
03_Random Forest
24 pages
PA 5 UNIT
No ratings yet
PA 5 UNIT
35 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
Random Forest Intro Presented
No ratings yet
Random Forest Intro Presented
38 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
3 pages
Random Forest Algorithm unit 3
No ratings yet
Random Forest Algorithm unit 3
2 pages
015 - Random Forest
No ratings yet
015 - Random Forest
15 pages
Ilovepdf Merged-3
No ratings yet
Ilovepdf Merged-3
70 pages
Week 6 - Random Forest
No ratings yet
Week 6 - Random Forest
12 pages
RANDOM FOREST
No ratings yet
RANDOM FOREST
4 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest
No ratings yet
Random Forest
2 pages
Lecture Notes - Random Forests PDF
100% (1)
Lecture Notes - Random Forests PDF
4 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Random Forests
No ratings yet
Random Forests
43 pages
Class 7 Random Forest Algorithm
No ratings yet
Class 7 Random Forest Algorithm
13 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
9 pages
05.Random Forest (2)
No ratings yet
05.Random Forest (2)
3 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Random Forests For Beginners PDF
No ratings yet
Random Forests For Beginners PDF
71 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Tree Faller's Manual: Techniques for Standard and Complex Tree-Felling Operations
From Everand
Tree Faller's Manual: Techniques for Standard and Complex Tree-Felling Operations
ForestWorks
5/5 (3)
Wood Finishing 101, Revised Edition
From Everand
Wood Finishing 101, Revised Edition
Bob Flexner
5/5 (1)
MAT6001 - Digital Assignment I
No ratings yet
MAT6001 - Digital Assignment I
2 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
10-Correlation and Linear Regression
No ratings yet
10-Correlation and Linear Regression
25 pages
STAT2 Modelling with Regression and ANOVA 2nd Edition Ann R. Cannon instant download
100% (3)
STAT2 Modelling with Regression and ANOVA 2nd Edition Ann R. Cannon instant download
80 pages
Adam Asrafi Amirrudin_19618355_assignsubmission_file_CASE STUDY - GROUP 20 FKB22303 OCT 2023
No ratings yet
Adam Asrafi Amirrudin_19618355_assignsubmission_file_CASE STUDY - GROUP 20 FKB22303 OCT 2023
24 pages
Using AMOS To Estimate Composite Reliability McDonald's Omega
No ratings yet
Using AMOS To Estimate Composite Reliability McDonald's Omega
18 pages
Construction Cost Estimation - A Parametric Approach For
No ratings yet
Construction Cost Estimation - A Parametric Approach For
11 pages
Boletín Científico de La Escuela Superior Atotonilco de Tula
No ratings yet
Boletín Científico de La Escuela Superior Atotonilco de Tula
5 pages
Simple Linear Regression: Y ($) X ($) Y ($) X ($)
No ratings yet
Simple Linear Regression: Y ($) X ($) Y ($) X ($)
5 pages
Final Model 2 With Random Effects: The Mixed Procedure
No ratings yet
Final Model 2 With Random Effects: The Mixed Procedure
3 pages
Session 1 Forecasting: Advanced Management Accounting
100% (1)
Session 1 Forecasting: Advanced Management Accounting
40 pages
Bivariate Logistic Regression
No ratings yet
Bivariate Logistic Regression
12 pages
BQQ6214 Statistical Formulae
No ratings yet
BQQ6214 Statistical Formulae
3 pages
Robust Regression and Outlier Detection With The ROBUSTREG Procedure PDF
No ratings yet
Robust Regression and Outlier Detection With The ROBUSTREG Procedure PDF
14 pages
Auto Regressive Model
No ratings yet
Auto Regressive Model
3 pages
Project SPSS 2016
No ratings yet
Project SPSS 2016
18 pages
Int 354 ML-1
No ratings yet
Int 354 ML-1
4 pages
Mumbai Educational Trust: MET Institute of Computer Science
No ratings yet
Mumbai Educational Trust: MET Institute of Computer Science
11 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
Ist 407 Presentation
No ratings yet
Ist 407 Presentation
12 pages
Course Outline Eco 422 2020 2023
No ratings yet
Course Outline Eco 422 2020 2023
4 pages
Data Analysis and Modeling Bcis New Course
No ratings yet
Data Analysis and Modeling Bcis New Course
2 pages
STT153A Paper
No ratings yet
STT153A Paper
8 pages
(MAA 4.4) LINEAR REGRESSION - Solutions
100% (1)
(MAA 4.4) LINEAR REGRESSION - Solutions
6 pages
Of Abbay River Basin, Ethiopia
No ratings yet
Of Abbay River Basin, Ethiopia
10 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
109 Sourabh Vivek Chougule
No ratings yet
109 Sourabh Vivek Chougule
75 pages