Introduction to random forest and gradient boosting methods a lecture

LECTURE : INTRODUCTION TO RANDOM
FOREST AND GRADIENT BOOSTING METHODS
- Presented by Shreyas S.K
30-03-2019 1
AD RESEARCH GROUP

WHAT IS MACHINE LEARNING ABOUT??
30-03-2019 2
data

APPLICATIONS OF MACHINE LEARNING
3

ANATOMY OF DECISION TREE
4
• Trees that predict categorical results are
called as decision trees
• At each node certain set of rules should be
satisfied
• Output from each node will be a Boolean
(True/False)
• Splitting is a process of dividing a node into
two or more sub nodes
• Root node represents the entire population
• When sub nodes split into further sub
nodes then it’s a decision node
• Nodes that do not split are called as
terminal nodes/leaf nodes
ROOT NODE
DECISION NODE
LEAF NODE
Decision tree for Regression dataset
X[i] :- Input variables in the dataset
MSE :- Mean Squared Error of all samples in a node
Samples :- Total number of samples in a node
Value :- Average value of all samples corresponding to
an output variable in a node
30-03-2019

DECISION TREES FOR CLASSIFICATION
5
Predict whether or not to play tennis based on
Temperature, Humidity, Wind and Outlook
• A good decision tree is the one which makes correct
predictions for any unseen data
• Split at each node is made based on Gini-score
• Best split is the one which yields the lowest Gini-score
30-03-2019

DECISION TREE FOR REGRESSION
• Regression trees predict continuous values
• Values at the leaves are the average of all
samples in the leaf
• Best split at each node is based on MSE or
weighted average of standard deviation
6
Predict the average precipitation based on the
Slope and Elevation of the Himalayan region
30-03-2019

BEST SPLIT BASED ON STANDARD DEVIATION
7Weighted standard deviation30-03-2019

HOW LONG TO KEEP SPLITTING??..
• Until:
• Leaf nodes are pure – Only one class remains
• A maximum depth is reached
• A performance metric is achieved
• Problem:
• Decision trees tend to overfit
• Small changes in data greatly affects the prediction
• Solution:
• Pruning the trees
• Restricting the tree from growing to it’s fullest
• Maintain minimum number of samples in leaf
nodes
30-03-2019 8

Pros and Cons of Classification and Regression Trees
Advantages
• Simple to understand, interpret and
visualise
• Can handle both numerical and
categorical data
• Less effort in data preparation
• Non linear relationships between
parameters wont affect tree
performance
• Implicitly performs feature selection
Disadvantages
• Prone to create over complex trees
which lack generalization capability
• Unstable, small variations in data
results into completely different tree
• They create biased trees if some
classes dominate
• Cannot guarantee to return global
optimal decision tree
30-03-2019 9
Lower the variance of individual trees by Ensemble methods like Bagging and Boosting

ANALOGY OF ENSEMBLE LEARNING
10
Decision Tree 1 Decision Tree 2 Decision Tree 3
2.91
2.6 2.95 3.2 Desired output :- 2.85Predicted outputs
30-03-2019

RANDOM FOREST METHOD
11
Training dataset
Bootstrap sample 1 Bootstrap sample 2 Bootstrap sample k
In Bag
(2/3)
Out of Bag
(1/3)
In Bag
(2/3)
Out of Bag
(1/3)
In Bag
(2/3)
Out of Bag
(1/3)
Prediction 1 Prediction 2 Prediction k
Average of k
predictions
30-03-2019

RANDOM FOREST – A BAGGING APPROACH
30-03-2019 12

PSEUDO CODE FOR RANDOM FOREST METHOD
1. Randomly select “k” features from total “m” features
• k< m
2. Among “k” features, calculate the node “d” using best split point
3. Split the node into daughter nodes using the best split
4. Repeat steps 1 to 3 until a predefined number of nodes is reached
5. Build a forest by repeating steps 1 to 4 “n” number of times to create “n”
number of trees
6. Takes the test features and uses the rules of each randomly created trees to
predict the output
7. Calculates the votes for each predicted target
8. High voted predicted target is considered as the final prediction
30-03-2019 13

OVERFITTING – HIGH VARIANCE
• High variance
• Outcome can vary even if there are tiniest changes in the input
• Do not generalise well to new data
• High variance compared to “PHYSICAL BALANCE”
• If you are balancing on one foot while standing on solid ground you’re
not likely to fall over.
• But what if there are suddenly 100 mph wind gusts? I bet you’d fall
over.
• That’s because your ability to balance on one leg is highly dependent
on the factors in your environment.
• If even one thing changes, it could completely mess you up!
• If we mess with any factors in its training data, we could completely
change the outcome.
• This is not stable model and therefore not a model of which we
would want to make decisions.
30-03-2019 14
Don’t fall, lil guy!!

APPLICATIONS OF RANDOM FOREST
30-03-2019 15
1. Banking
• To find loyal and fraud customers
• Growth of a bank purely depends on loyal customers
• To identify customers not profitable to bank
• Bank won’t approve loans to such customers if
identified
2. Medicine
• To identify the disease by analysing patient’s
medical records
• Identify correct combination of components to
validate the medicine
3. E-commerce
• Identify likelihood of customer liking a
recommended product

GRADIENT BOOSTING METHOD
16
Create a decision tree
on known response
values
Make Predictions
Calculate errors
(Residuals)
Fit new tree using
errors as response
values
Combine new tree
with tree from
previous iteration
• Tuning parameters:
1. Number of trees
2. Maximum depth of each tree
3. Maximum features at each split
4. Learning rate
5. Minimum samples in leaf
• Builds decision trees sequentially
• More weight is given to mispredicted values
at each stage of training
• Builds more accurate models as the final
output is the average of predictions of all
decision trees
30-03-2019

GRADIENT BOOSTING – A BOOSTING APPROACH
30-03-2019 17

PSEUDO CODE FOR GRADIENT BOOSTING METHOD
1. Initialize the approximation function F(x):
2. For m=1 to M do:
• Calculate the pseudo responses
• Fit the regression tree using the training set
• Calculate the step size using the line search
• Update the model:
3. End the algorithm: is the final output
30-03-2019 18

AN EXAMPLE BASED ON GRADIENT BOOSTING
30-03-2019 19
Predict the age of a person based on whether they play video games, enjoy gardening and
their preference in wearing hats
Objective :- Minimize Squared Error

LOSS FUNCTION :- SQUARED ERROR
30-03-2019 20
F1 = F0 + gamma0 ∗ h0PseudoResidual0 = Age − F0F0 = (1/n) ∗ k=1
n
Age 𝑆𝑆𝐸 =
𝑘=1
𝑛
(𝐴𝑔𝑒 − 𝐹1)2

BOOSTING – SEQUENTIAL ACCUMULATION
30-03-2019 21
Tree1 Residual = Age – Tree1 Prediction Combined Prediction = Tree1 Prediction + Tree2 Prediction

THANK YOU FOR PATIENT HEARING!!!!..
2230-03-2019

Introduction to random forest and gradient boosting methods a lecture

In this document