0% found this document useful (0 votes)
22 views25 pages

Random Forest

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views25 pages

Random Forest

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Random Forest

Introduction to Random Forest



Definition: Random Forest is an ensemble machine learning algorithm that uses multiple decision
trees to make a prediction, often improving accuracy and robustness.

Purpose: Used for both classification and regression tasks, Random Forest reduces overfitting
compared to individual decision trees.

Why Use It?: It provides high accuracy, handles large datasets well, and can manage missing values
and noisy data effectively.
Motivation

Limitations of Decision Trees
 Overfitting

 Instability

 sensitivity to data variations


Ensemble Learning

 powerful technique that combines multiple models to improve predictive performance

https://medium.com/@ilyurek/ensemble-learning-random-forests-bagging-random-subspace-and-boosting-713c7dbe6823
Random Forest?


It can be used for both
classification and regression
problems

Instead of searching for the most
important feature while splitting a
node, it searches for the best
feature among a random subset of
features

https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-
images/national/two-tree-random-forest.png
Random Forest Models vs. Decision Trees

Decision Tree 
Random Forest
 Uses the entire dataset to create a single set of  Builds multiple decision trees from random
rules subsets of data and features
 Can easily overfit when trees are deep (too  Combines results from all trees to improve
many rules) accuracy
  Reduces the risk of overfitting by using smaller
trees
 Slower to compute due to multiple trees but
more robust predictions
A real-life example Start

"Do you prefer warm


climates?"


Andrew wants to decide where to go during
his one-year vacation, so he asks the people Yes No

who know him best for suggestions "Do you like "Do you like
beaches?" mountains?"


The first friend he seeks out asks him about
Yes No Yes No
the likes and dislikes of his past travels. Based
on the answers, he will give Andrew some Recommendation Recommendation Recommendation Recommendation
(Hawai) (Alaska) (Switzerland) (Japan)
advice

Decision Tree Approach


A real-life example

Friend1 Friend 2 Friend 3

"Do you like cities" "Do you prefer warm "Do you like beaches"
climates"

Yes No Yes No Yes No

Recommendation Recommendation Recommendation Recommendation Recommendation Recommendation


(Paris) (Alaska) (Hawai) (Switzerland) (Maldives) (Sweden)

Random Forest Approach


How Random Forest Works?
Generate multiple subsets (samples with replacement) from the training data
Bootstrap Sampling

Train Decision Train each decision tree on a unique sample from the bootstrap sampling
Trees process

Random Feature Each tree only considers a random subset of features at each split, increasing
Selection diversity among trees

Aggregation of 
Classification: Each tree votes for a class, and the most common class among
Results trees is the final prediction (majority voting).


Regression: The average prediction across trees is used as the final result
Key Concepts and Parameters

Number of Trees (n_estimators): Controls the number of trees in the forest. More trees can
improve performance but increase computation.

Max Features (max_features): Sets the maximum number of features considered at each split,
which helps in controlling overfitting and tree diversity.

Out-of-Bag (OOB) Error: Explain how the unused data points (left out of the bootstrap samples)
help estimate the model’s accuracy without cross-validation.

Tree Depth: Controlling the depth of each tree affects the model’s complexity and risk of
overfitting.
Out-of-bag error example


We will build a Random Forest
Student Hours
with 3 decision trees. Each tree is studied
1 Yes
trained on a bootstrap sample
2 No
(sampling with replacement) from 4 No
5 Yes
this dataset
6 No
7 Yes
8 No
Step 1: Training each tree on a bootstrap sample


Tree 1 is trained on samples: {1, 2, 3, 4, 5, 6}
Student Hours
 OOB samples for Tree 1: {7, 8} studied
1 Yes

Tree 2 is trained on samples: {2, 3, 5, 6, 7, 8} 2 No
4 No
 OOB samples for Tree 2: {1, 4} 5 Yes

Tree 3 is trained on samples: {1, 3, 4, 5, 7, 8} 6 No
7 Yes
 OOB samples for Tree 3: {2, 6} 8 No
Step 2: Making predictions for out-of-bag samples


Tree 1 OOB predictions:
Student Hours
 Sample 7: Predicts Yes (correct) studied
 Sample 8: Predicts Yes (incorrect) 1 Yes
2 No

Tree 2 OOB predictions:
4 No
5 Yes
 Sample 1: Predicts Yes (correct)
6 No
 Sample 4: Predicts No (correct) 7 Yes

Tree 3 OOB predictions: 8 No

 Sample 2: Predicts No (correct)


 Sample 6: Predicts Yes (incorrect)
Step 3: Aggregating OOB predictions for each sample


Sample 1: 
Sample 6:

 OOB prediction from Tree 2: Yes (correct)  OOB prediction from Tree 3: Yes (incorrect)

 Final prediction: Yes  Final prediction: Yes

 Correct  Incorrect


Sample 2: 
Sample 7:

 OOB prediction from Tree 3: No (correct)  OOB prediction from Tree 1: Yes (correct)

 Final prediction: No  Final prediction: Yes

 Correct  Correct


Sample 4: 
Sample 8:

 OOB prediction from Tree 2: No (correct)  OOB prediction from Tree 1: Yes (incorrect)

 Final prediction: No  Final prediction: Yes

 Correct  Incorrect
Step 4: Calculating the OOB error


Out of the six samples with OOB predictions, four were correctly classified, and two were
misclassified.


The OOB error rate is

 2/6 = 33.3 %


The OOB error rate is 33.3%, which gives an estimate of the Random Forest's generalization
error without needing a separate test set
Random forest example


Suppose we have a dataset of students with
two features:
Student Hours Preparation Pass/Fail
studied quality
 Hours Studied
A 5 Good Pass(1)
 Test Preparation Quality (rated as B 2 Poor Fail(0)
C 8 Good Pass(1)
"Good" or "Poor")
D 1 Poor Fail(0)

Our target is to predict if a student will E 7 Poor Pass(1)
Pass (Class 1) or Fail (Class 0) based on F 3 Good Fail(0)

these features
Step 1: Bootstrapping (Sampling with Replacement)
Student Hours studied Preparation Pass/Fail
quality
A 5 Good Pass(1)

Tree 1: Sampled Data C 8 Good Pass(1)
E 7 Poor Pass(1)
B 2 Poor Fail(0)

Student Hours studied Preparation Pass/Fail


quality
 Tree 2: Sampled Data D 1 Poor Fail(0)
F 3 Good Fail(0)
A 5 Good Pass(1)
C 8 Good Pass(1)

Student Hours studied Preparation Pass/Fail


quality
 Tree 3: Sampled Data E 7 Poor Pass(1)
B 2 Poor Fail(0)
F 3 Good Fail(0)
C 8 Good Pass(1)
Step 2: Building Individual Decision Trees
Student Hours studied Preparation Pass/Fail
quality

Tree 1: A 5 Good Pass(1)
C 8 Good Pass(1)
 If Hours Studied > 6, predict Pass (1).
E 7 Poor Pass(1)
 If Hours Studied ≤ 6, predict Fail (0). B 2 Poor Fail(0)

Tree 2: Student Hours studied Preparation Pass/Fail
quality
 If Preparation Quality is Good, predict D 1 Poor Fail(0)
Pass (1). F 3 Good Fail(0)
A 5 Good Pass(1)
 If Preparation Quality is Poor, predict Fail
C 8 Good Pass(1)
(0).
Student Hours studied Preparation Pass/Fail

Tree 3: quality
E 7 Poor Pass(1)
 If Hours Studied > 4, predict Pass (1).
B 2 Poor Fail(0)
 If Hours Studied ≤ 4, predict Fail (0). F 3 Good Fail(0)
C 8 Good Pass(1)
Step 3: Classifying a New Data Point


Suppose we want to classify a new student 
Tree 1:
who:  If Hours Studied > 6, predict Pass (1).
 Studied for 4 hours and has a  If Hours Studied ≤ 6, predict Fail (0).
Preparation Quality of "Good" 
Tree 2:

Each tree in the forest makes its prediction:  If Preparation Quality is Good, predict
 Tree 1: Since 4 ≤ 6, Tree 1 predicts Fail Pass (1).
(0)  If Preparation Quality is Poor, predict Fail
(0).
 Tree 2: Since Preparation Quality is
"Good", Tree 2 predicts Pass (1)

Tree 3:

 Tree 3: Since 4 ≤ 4, Tree 3 predicts Fail  If Hours Studied > 4, predict Pass (1).
(0)  If Hours Studied ≤ 4, predict Fail (0).
Step 4: Aggregating Predictions


Take the majority vote of all trees for the final prediction

 Tree 1 predicts Fail (0)


 Tree 2 predicts Pass (1)
 Tree 3 predicts Fail (0)

The majority prediction is Fail (0), so the final prediction for this new student is Fail (0)
Parameters and impact


n_estimators: number of decision trees in the forest.


Impact on Performance:

 Accuracy: Increasing the number of trees generally improves the model’s accuracy because it
reduces the overall variance

 Overfitting: Having too many trees rarely leads to overfitting since Random Forest is naturally
resistant to it. However, after a certain point, adding more trees yields diminishing returns, and
accuracy improvements plateau

 Computational Cost: Higher n_estimators increase computational cost and memory usage since
each tree requires computation and storage.

 Typical Values: Common values are 100-500, but it depends on the dataset size and computational
resources
Parameters and impact


max_features: controls the maximum number of features considered for splitting at each node in a tree


Impact on Performance

 Diversity and Overfitting: A lower max_features increases the diversity among trees, making them more independent,
which helps prevent overfitting and improves generalization. If max_features is too high (close to the total number of
features), each tree becomes more similar, reducing the benefit of having multiple trees
 Accuracy: The optimal max_features value balances accuracy and independence among trees. Setting max_features to the
square root of the total features for classification or one-third of the total features for regression is a common rule of
thumb
 Computational Efficiency: Lowering max_features can speed up training since fewer features are evaluated at each
split. However, setting it too low may decrease accuracy as each tree might lack sufficient information to make accurate
splits.

Typical Values
 Classification: Often set to sqrt(total_features)
 Regression: Often set to total_features / 3
Parameters and impact


max_depth: maximum number of levels (depth) allowed for each tree


Impact on Performance

 Overfitting vs. Underfitting: A higher max_depth allows trees to learn more complex patterns but also
increases the risk of overfitting, especially if the trees become too deep and learn noise in the training data.
Conversely, a low max_depth may lead to underfitting, as each tree might not capture enough information to make
accurate predictions
 Model Complexity and Interpretability: Higher depths lead to more complex trees that are harder to
interpret. Restricting max_depth can help simplify the model and reduce overfitting, especially in cases with limited
data
 Computational Cost: Deeper trees require more computation. Limiting max_depth can reduce the training time
and memory usage, making the model more efficient.

Typical Values
 Values between 10-20 are common, though they vary depending on the dataset size and complexity
Grid Search and Cross-Validation


Cross-Validation:

 technique for assessing how a machine learning model generalizes to unseen data.

In K-fold cross-validation, the dataset is split into K subsets (or folds)

For each fold, the model trains on K−1 folds and tests on the remaining fold


This process repeats K times, with each fold serving once as the test set


The model’s performance metrics (like accuracy or RMSE) are averaged across all K runs to get a
more reliable estimate


In the case of a Random Forest classifier, cross-validation helps in evaluating the model's accuracy and
robustness
Grid Search and Cross-Validation


Grid Search:

 method for finding the best combination of hyperparameters for a model by exhaustively
searching through a specified parameter grid

Each possible combination of these hyperparameters is tested, and the model’s performance is
evaluated (usually with cross-validation) for each combination

The result is the combination of parameters that yields the best performance
Thank you for
your attention

You might also like