Random Forest
Random Forest
Instability
Ensemble Learning
https://medium.com/@ilyurek/ensemble-learning-random-forests-bagging-random-subspace-and-boosting-713c7dbe6823
Random Forest?
It can be used for both
classification and regression
problems
Instead of searching for the most
important feature while splitting a
node, it searches for the best
feature among a random subset of
features
https://builtin.com/sites/www.builtin.com/files/styles/ckeditor_optimize/public/inline-
images/national/two-tree-random-forest.png
Random Forest Models vs. Decision Trees
Decision Tree
Random Forest
Uses the entire dataset to create a single set of Builds multiple decision trees from random
rules subsets of data and features
Can easily overfit when trees are deep (too Combines results from all trees to improve
many rules) accuracy
Reduces the risk of overfitting by using smaller
trees
Slower to compute due to multiple trees but
more robust predictions
A real-life example Start
Andrew wants to decide where to go during
his one-year vacation, so he asks the people Yes No
who know him best for suggestions "Do you like "Do you like
beaches?" mountains?"
The first friend he seeks out asks him about
Yes No Yes No
the likes and dislikes of his past travels. Based
on the answers, he will give Andrew some Recommendation Recommendation Recommendation Recommendation
(Hawai) (Alaska) (Switzerland) (Japan)
advice
"Do you like cities" "Do you prefer warm "Do you like beaches"
climates"
Train Decision Train each decision tree on a unique sample from the bootstrap sampling
Trees process
Random Feature Each tree only considers a random subset of features at each split, increasing
Selection diversity among trees
Aggregation of
Classification: Each tree votes for a class, and the most common class among
Results trees is the final prediction (majority voting).
Regression: The average prediction across trees is used as the final result
Key Concepts and Parameters
Number of Trees (n_estimators): Controls the number of trees in the forest. More trees can
improve performance but increase computation.
Max Features (max_features): Sets the maximum number of features considered at each split,
which helps in controlling overfitting and tree diversity.
Out-of-Bag (OOB) Error: Explain how the unused data points (left out of the bootstrap samples)
help estimate the model’s accuracy without cross-validation.
Tree Depth: Controlling the depth of each tree affects the model’s complexity and risk of
overfitting.
Out-of-bag error example
We will build a Random Forest
Student Hours
with 3 decision trees. Each tree is studied
1 Yes
trained on a bootstrap sample
2 No
(sampling with replacement) from 4 No
5 Yes
this dataset
6 No
7 Yes
8 No
Step 1: Training each tree on a bootstrap sample
Tree 1 is trained on samples: {1, 2, 3, 4, 5, 6}
Student Hours
OOB samples for Tree 1: {7, 8} studied
1 Yes
Tree 2 is trained on samples: {2, 3, 5, 6, 7, 8} 2 No
4 No
OOB samples for Tree 2: {1, 4} 5 Yes
Tree 3 is trained on samples: {1, 3, 4, 5, 7, 8} 6 No
7 Yes
OOB samples for Tree 3: {2, 6} 8 No
Step 2: Making predictions for out-of-bag samples
Tree 1 OOB predictions:
Student Hours
Sample 7: Predicts Yes (correct) studied
Sample 8: Predicts Yes (incorrect) 1 Yes
2 No
Tree 2 OOB predictions:
4 No
5 Yes
Sample 1: Predicts Yes (correct)
6 No
Sample 4: Predicts No (correct) 7 Yes
Tree 3 OOB predictions: 8 No
Sample 1:
Sample 6:
OOB prediction from Tree 2: Yes (correct) OOB prediction from Tree 3: Yes (incorrect)
Correct Incorrect
Sample 2:
Sample 7:
OOB prediction from Tree 3: No (correct) OOB prediction from Tree 1: Yes (correct)
Correct Correct
Sample 4:
Sample 8:
OOB prediction from Tree 2: No (correct) OOB prediction from Tree 1: Yes (incorrect)
Correct Incorrect
Step 4: Calculating the OOB error
Out of the six samples with OOB predictions, four were correctly classified, and two were
misclassified.
The OOB error rate is
2/6 = 33.3 %
The OOB error rate is 33.3%, which gives an estimate of the Random Forest's generalization
error without needing a separate test set
Random forest example
Suppose we have a dataset of students with
two features:
Student Hours Preparation Pass/Fail
studied quality
Hours Studied
A 5 Good Pass(1)
Test Preparation Quality (rated as B 2 Poor Fail(0)
C 8 Good Pass(1)
"Good" or "Poor")
D 1 Poor Fail(0)
Our target is to predict if a student will E 7 Poor Pass(1)
Pass (Class 1) or Fail (Class 0) based on F 3 Good Fail(0)
these features
Step 1: Bootstrapping (Sampling with Replacement)
Student Hours studied Preparation Pass/Fail
quality
A 5 Good Pass(1)
Tree 1: Sampled Data C 8 Good Pass(1)
E 7 Poor Pass(1)
B 2 Poor Fail(0)
Suppose we want to classify a new student
Tree 1:
who: If Hours Studied > 6, predict Pass (1).
Studied for 4 hours and has a If Hours Studied ≤ 6, predict Fail (0).
Preparation Quality of "Good"
Tree 2:
Each tree in the forest makes its prediction: If Preparation Quality is Good, predict
Tree 1: Since 4 ≤ 6, Tree 1 predicts Fail Pass (1).
(0) If Preparation Quality is Poor, predict Fail
(0).
Tree 2: Since Preparation Quality is
"Good", Tree 2 predicts Pass (1)
Tree 3:
Tree 3: Since 4 ≤ 4, Tree 3 predicts Fail If Hours Studied > 4, predict Pass (1).
(0) If Hours Studied ≤ 4, predict Fail (0).
Step 4: Aggregating Predictions
Take the majority vote of all trees for the final prediction
n_estimators: number of decision trees in the forest.
Impact on Performance:
Accuracy: Increasing the number of trees generally improves the model’s accuracy because it
reduces the overall variance
Overfitting: Having too many trees rarely leads to overfitting since Random Forest is naturally
resistant to it. However, after a certain point, adding more trees yields diminishing returns, and
accuracy improvements plateau
Computational Cost: Higher n_estimators increase computational cost and memory usage since
each tree requires computation and storage.
Typical Values: Common values are 100-500, but it depends on the dataset size and computational
resources
Parameters and impact
max_features: controls the maximum number of features considered for splitting at each node in a tree
Impact on Performance
Diversity and Overfitting: A lower max_features increases the diversity among trees, making them more independent,
which helps prevent overfitting and improves generalization. If max_features is too high (close to the total number of
features), each tree becomes more similar, reducing the benefit of having multiple trees
Accuracy: The optimal max_features value balances accuracy and independence among trees. Setting max_features to the
square root of the total features for classification or one-third of the total features for regression is a common rule of
thumb
Computational Efficiency: Lowering max_features can speed up training since fewer features are evaluated at each
split. However, setting it too low may decrease accuracy as each tree might lack sufficient information to make accurate
splits.
Typical Values
Classification: Often set to sqrt(total_features)
Regression: Often set to total_features / 3
Parameters and impact
max_depth: maximum number of levels (depth) allowed for each tree
Impact on Performance
Overfitting vs. Underfitting: A higher max_depth allows trees to learn more complex patterns but also
increases the risk of overfitting, especially if the trees become too deep and learn noise in the training data.
Conversely, a low max_depth may lead to underfitting, as each tree might not capture enough information to make
accurate predictions
Model Complexity and Interpretability: Higher depths lead to more complex trees that are harder to
interpret. Restricting max_depth can help simplify the model and reduce overfitting, especially in cases with limited
data
Computational Cost: Deeper trees require more computation. Limiting max_depth can reduce the training time
and memory usage, making the model more efficient.
Typical Values
Values between 10-20 are common, though they vary depending on the dataset size and complexity
Grid Search and Cross-Validation
Cross-Validation:
technique for assessing how a machine learning model generalizes to unseen data.
In K-fold cross-validation, the dataset is split into K subsets (or folds)
For each fold, the model trains on K−1 folds and tests on the remaining fold
This process repeats K times, with each fold serving once as the test set
The model’s performance metrics (like accuracy or RMSE) are averaged across all K runs to get a
more reliable estimate
In the case of a Random Forest classifier, cross-validation helps in evaluating the model's accuracy and
robustness
Grid Search and Cross-Validation
Grid Search:
method for finding the best combination of hyperparameters for a model by exhaustively
searching through a specified parameter grid
Each possible combination of these hyperparameters is tested, and the model’s performance is
evaluated (usually with cross-validation) for each combination
The result is the combination of parameters that yields the best performance
Thank you for
your attention