Random Forest Presentation
Random Forest Presentation
Pros:
• Non-linear decision boundaries
• Easy to interpret
• Numerical & Categorical Data
Decision Tree – Pros and Cons
Cons:
• Easy to Overfit
• High Variance (i.e. unstable).
Random Forests
What is Random Forest?
• Random Forest is an ensemble learning method used for classification
and regression tasks.
• What happens if you select more/less of the total features per tree:
Random Forests – Intuition Check
• What happens if you select more/less of the total features per tree:
Less: Trees more uncorrelated, but at some point many trees become “dead”,
i.e. fitting entire trees on unimportant features.
More: Trees become more correlated, but training of each tree improved.
Tree Optimization and Feature
Importance
Tree Optimization – Greedy Criterion
• Trees grown according to what the local
best option is.
• Criterion: Gini, Information Gain.
Short Aside - Greedy Algorithm
Example
Example: Find largest path.
Tree Optimization – Greedy Criterion
• Note: The criterion governing tree
growth is different than your global cost
function (e.g. precision-recall, accuracy,
etc.), which determines how well your
entire model is doing.
Tree Optimization – Gini Impurity
“Gini impurity is a measure of how often a randomly chosen element
from the set would be incorrectly labeled if it was randomly labeled
according to the distribution of labels in the subset.”
( )
𝐶𝑛 2
𝑁 𝑛 ,𝑖
𝐺𝑛 =1 − ∑
𝑖= 0 𝑁𝑛
Tree Optimization – Gini Impurity
Example
3. Robust to Overfitting:
While a single decision tree may overfit to noisy data, Random Forests reduce this risk by averaging
the predictions of multiple trees, which smoothens out irregularities.