0% found this document useful (0 votes)

13 views

MLS+1+-+Decision+Trees+and+Random+Forests

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

MLS+1+-+Decision+Trees+and+Random+Forests

Uploaded by

golgothgolgoth039

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision Tree and Random Forest

[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Topics covered so far
1. Decision Trees
a. Introduction
b. Advantages and Disadvantages
c. Building a Decision Tree
d. Impurity Measures

R8L0PN473Fe. Overﬁtting
[email protected]

2. Random Forest
a. Bias-Variance Tradeoff
b. Pruning
c. Bagging
d. Random Forest

This file is meant for personal use by [email protected] only.

2. How do we measure the impurity in a decision tree?

[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 3
Decision Tree
● A decision tree is one of the most popular and effective supervised learning techniques for classiﬁcation
problems, that works well with both categorical and continuous variables.
● It is a graphical representation of all the possible solutions to a decision that is based on a certain condition.
● In this algorithm, the training set is split into two or more sets based on the split condition over input
variables.
● For example: A person has to decide on going out to play tennis or not by looking at the weather conditions.
○ If it’s cloudy, then the person will go out to play.
[email protected]
R8L0PN473F○ If it’s sunny, the person will check the humidity level, if that’s normal, the person will go out to play.
○ If it’s rainy, the person further checks the wind speed, if that’s weak, the person will go out to play.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action. Image Source
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 4
Impurity Measures in Decision Trees
Decision trees recursively split features with respect to their target variable’s purity. The algorithm is designed to
optimize each split such that the purity will be maximized. Impurity can be measured in many ways such as
Entropy, Information Gain, etc.

GINI INDEX ENTROPY INFORMATION GAIN VARIANCE

When to use Classification Tree Classification Tree Classification Tree Regression Tree
[email protected]
R8L0PN473F

Formula G = 1- Σci=1(pi2) E = -∑P(X).logP(X) IG (Y, X) = E(Y) - E(Y|X) V = Σ(x-μ)2/N

0 to 0.5 0 to 1 0 to 1
Range 0 = most pure 0 = most pure 0 = less gain -
0.5 = most impure 1 = most impure 1 = more gain

Computationally The most common

Easy to compute Computationally
Characteristics intensive measure of
Non-additive intensive
Additive dispersion
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 5
Discussion Questions
1. What do you mean by Ensemble Learning?

2. What is bootstrap aggregation and how does it work?

3. What is a random forest and how is it useful?

4. What are the

[email protected] advantages and disadvantages of the random forest algorithm?
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 6
Ensemble Learning
● Ensemble Learning is a paradigm of machine learning methods for combining predictions from multiple
models.
● The central motivation is rooted under the belief that a committee of experts working together can perform
better than a single expert.

[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 7
Bootstrap Aggregation (Bagging)
● Bagging is a technique of merging the outputs of various models to get a final result.
● It reduces the chances of overfitting by training each model only with a randomly chosen subset of the
training data. Training can be done in parallel.
● It essentially trains a large number of “strong” learners in parallel (each model is an overfit for that subset of
the data)
● Then it combines (using averaging or majority voting) these learners together to "smooth out" predictions.
[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action. Image Source
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 8
Random Forest algorithm
● Random Forest is a supervised machine learning algorithm, which can be used for both classification and
regression.
● It generates decision trees using random samples of the original dataset where the collection of the
generated decision tree is defined as forest. In each tree, at all levels, a random subset of original features is
chosen to select the best split from using an attribute selection indicator such as entropy, information gain,
etc.
[email protected]
The following steps
R8L0PN473F are involved in this algorithm:
1. Selection of a random sample of a given dataset.
2. Using attribute selection indicators, create a
decision tree for each sample and record the
prediction outcome from each model.
3. Applying the voting/averaging method over
predicted outcomes of individual models.
4. Considering the final results as the average
value or most voted value.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 9
Advantages and Disadvantages of Random Forest
Advantages:

● It can be used to solve classiﬁcation as well as regression problems.

● It is one of the most accurate algorithms because of the number of decision trees taking part in the process.

● In general, it does not suffer from overﬁtting.

[email protected]
R8L0PN473F
● It is used to select features of relatively more importance and helps in feature selection.

Disadvantages:
● The Random Forest algorithm is very slow compared to others because it calculates predictions for each
decision tree for every sample and then votes on them to select the best one, which is time-consuming.

● It is difﬁcult to interpret the model in comparison to decision tree where you can easily make the decision
following the path of the tree.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 10
Case Study
[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 12
Pruning
● One of the problems with the decision tree is that it easily overfits the training data and becomes too large
and complex.
● A complex and large tree poorly generalizes to new data, whereas a small tree fails to capture the
information of the training data.
● Pruning can be defined as shortening the branches of the tree. It is the process of reducing the size of the
tree by turning some branch node into a leaf node and removing all the subsequent nodes under the
original branch.
[email protected]
● By removing branches, we can reduce the complexity of tree, which helps in reducing the overfitting of the
R8L0PN473F
tree.

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action. Image Source
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 13
Cost Complexity Pruning
● Cost Complexity Pruning is the most popular pruning technique for decision trees. It takes into account both
the number of errors and the complexity of the tree.
● This technique is parametrized by the cost complexity parameter, ccp_alpha which reduces the complexity
of the tree by controlling the number of leaf nodes, which eventually reduces overﬁtting. Greater values of
ccp_alpha increase the number of nodes pruned.
● The complexity parameter is used to deﬁne the cost-complexity measure, Rα(T) of a given tree T:
[email protected] Rα(T) = R(T) + α|T|
R8L0PN473F

where |T| is the number of terminal nodes and R(T) is the total misclassiﬁcation rate of the terminal nodes.

● Cost complexity pruning proceeds in the following stages:

○ A sequence of trees(T0, T1,..., Tk) for different values of alpha is built on the training data where T0 is
the original tree before pruning and Tk is the root tree.
○ The tree Ti+1 is obtained by replacing one or more of the sub-trees in the predecessor tree Ti with
suitable leaves.
○ The impurity of each pruned tree (T0, T1,..., Tk) is estimated and the best pruned tree is then selected
based on the metric under consideration (using test data).
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 14
Hyperparameters in Random Forest
1. Number of trees (n_estimators):
● It speciﬁes the number of trees in the forest of the model.
● The default value for this parameter is 100, which means that 100 different decision trees will be
constructed in the random forest.

2. Maximum Depth (max_depth):

● It speciﬁes the maximum depth of the tree.
[email protected]
R8L0PN473F● The default value is none, which means each tree will expand until every leaf is pure.

3. The minimum number of samples per leaf (min_samples_leaf):

● It speciﬁes the minimum number of samples required to be at a leaf node.
● The default value is 1, which means that every leaf must have at least 1 sample that it classiﬁes.

4. The minimum number of samples to split (min_samples_split):

● It specifies the minimum number of samples required to split a node.
● The default value for this parameter is 2, which means that an internal node must have at least two
samples before it can
This be split
file is tofor
meant have a more
personal specific
use by classification. only.
[email protected]
Sharing or publishing the contents in part or full is liable for legal action.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited. 15
Happy Learning !
[email protected]
R8L0PN473F

This file is meant for personal use by [email protected] only.

Unit 4
No ratings yet
Unit 4
33 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Machine learning
No ratings yet
Machine learning
5 pages
Random Forests
No ratings yet
Random Forests
22 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
Introduction to Decision Tree Algorithm
No ratings yet
Introduction to Decision Tree Algorithm
11 pages
Present
No ratings yet
Present
20 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
CE880_Lecture7_slides
No ratings yet
CE880_Lecture7_slides
78 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Random Forest
No ratings yet
Random Forest
25 pages
Da MS
No ratings yet
Da MS
24 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
PDS+LVC+2+Post-Session+Summary
No ratings yet
PDS+LVC+2+Post-Session+Summary
11 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Ca-Project: Aryan Devesh Puja Shabnas Mudit
No ratings yet
Ca-Project: Aryan Devesh Puja Shabnas Mudit
8 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
34 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
Lecture #15: Regression Trees & Random Forests
No ratings yet
Lecture #15: Regression Trees & Random Forests
34 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Introduction to Decision Trees
No ratings yet
Introduction to Decision Trees
10 pages
Lecture2 Decision Tree and Random Forest
No ratings yet
Lecture2 Decision Tree and Random Forest
24 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Random Forest
No ratings yet
Random Forest
25 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
HSMC
No ratings yet
HSMC
5 pages
Random Forest
No ratings yet
Random Forest
29 pages
Prac 6
No ratings yet
Prac 6
6 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
1.10. Decision Trees — scikit-learn 0.24.1 documentation
No ratings yet
1.10. Decision Trees — scikit-learn 0.24.1 documentation
10 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Case Study Possible Questions
No ratings yet
Case Study Possible Questions
3 pages
TEAA_ Tree Ensembles-1
No ratings yet
TEAA_ Tree Ensembles-1
43 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
ml unit3
No ratings yet
ml unit3
8 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
From Everand
Action Recognition: Step-by-step Recognizing Actions with Python and Recurrent Neural Network
Mark Magic
No ratings yet
Action Election: Fundamentals and Applications
From Everand
Action Election: Fundamentals and Applications
Fouad Sabry
No ratings yet
Notebook - Deep Neural Networks
No ratings yet
Notebook - Deep Neural Networks
28 pages
Building a Tanh Activation Function
No ratings yet
Building a Tanh Activation Function
9 pages
Programming with Python and GUI Development...2024
No ratings yet
Programming with Python and GUI Development...2024
145 pages
Stock Market Dashboard in Python
No ratings yet
Stock Market Dashboard in Python
4 pages
RAGE Against the Machine - Retrieval-Augmented LLM Explanations
No ratings yet
RAGE Against the Machine - Retrieval-Augmented LLM Explanations
4 pages
Glossary+of+Notations+-+Recommender+Systems+Part++3
No ratings yet
Glossary+of+Notations+-+Recommender+Systems+Part++3
4 pages
Notebook - Music Recommendation System Reference
No ratings yet
Notebook - Music Recommendation System Reference
22 pages
notebook - text classification
No ratings yet
notebook - text classification
7 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
Time_series_analysis__1718649022
No ratings yet
Time_series_analysis__1718649022
5 pages
Notebook - Main Code
No ratings yet
Notebook - Main Code
4 pages
Notebook - Geospatial
No ratings yet
Notebook - Geospatial
11 pages
Boston Dataset
No ratings yet
Boston Dataset
6 pages
1_3_Multiple_Hypothesis_Testing
No ratings yet
1_3_Multiple_Hypothesis_Testing
14 pages
New system to harness 40% of the sun's heat to produce clean hydrogen fuel
No ratings yet
New system to harness 40% of the sun's heat to produce clean hydrogen fuel
6 pages
Data pipeline in ML
No ratings yet
Data pipeline in ML
3 pages
5_3-2_Spatial_Environmental_Data_Model_Selection_Long-range_Dependencies
No ratings yet
5_3-2_Spatial_Environmental_Data_Model_Selection_Long-range_Dependencies
3 pages
5_2-6_Spatial_Environmental_Data_Gaussian_Processes
No ratings yet
5_2-6_Spatial_Environmental_Data_Gaussian_Processes
4 pages
MLS+1+-+Regression
No ratings yet
MLS+1+-+Regression
20 pages
The+CNN+Architecture
No ratings yet
The+CNN+Architecture
15 pages
5 2-4 Spatial Environmental Data Gaussian Processes
No ratings yet
5 2-4 Spatial Environmental Data Gaussian Processes
3 pages
MLS+1+-+Presentation
No ratings yet
MLS+1+-+Presentation
11 pages
ML+LVC+2+Post-Session+Summary
No ratings yet
ML+LVC+2+Post-Session+Summary
12 pages
ML+LVC+3+Glossary
No ratings yet
ML+LVC+3+Glossary
1 page
ML+LVC+3+Post-Session+Summary
No ratings yet
ML+LVC+3+Post-Session+Summary
16 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Unit 2 AAM
No ratings yet
Unit 2 AAM
32 pages
Module 3
No ratings yet
Module 3
102 pages
Basic Notes
No ratings yet
Basic Notes
26 pages
Ch4 Supervised
No ratings yet
Ch4 Supervised
78 pages
Machine Learning Notes - Lec 04 - Decision Tree Learning
No ratings yet
Machine Learning Notes - Lec 04 - Decision Tree Learning
108 pages
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Two Mark Questions 1
23 pages
Decision Lists and Trees
No ratings yet
Decision Lists and Trees
29 pages
It Works As Follows:: Decision Tree ?
No ratings yet
It Works As Follows:: Decision Tree ?
3 pages
MLS+1+-+Decision+Trees+and+Random+Forests
No ratings yet
MLS+1+-+Decision+Trees+and+Random+Forests
16 pages
Decision Trees Machine Learning
No ratings yet
Decision Trees Machine Learning
4 pages
Data Analytics - Unit 4 (22IT513PE)
100% (1)
Data Analytics - Unit 4 (22IT513PE)
30 pages
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
No ratings yet
Sat - 149.Pdf - Prediction of Bigmart Sales Using Machine Learning Algorihms
11 pages
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
No ratings yet
模型剪枝在2d3d卷积网络中的研究与应用-悉尼大学在读博士生郭晋阳智东西公开课
70 pages
Mod 3 AIML QB With Answers
No ratings yet
Mod 3 AIML QB With Answers
26 pages
ML - Questions & Answer
No ratings yet
ML - Questions & Answer
45 pages
2072 4119 1 SM
No ratings yet
2072 4119 1 SM
5 pages
DMML
No ratings yet
DMML
65 pages
Cs3491 Aiml Q&A Material
No ratings yet
Cs3491 Aiml Q&A Material
22 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
DATA ANAYTICS Notes UNIT4
No ratings yet
DATA ANAYTICS Notes UNIT4
45 pages
Machine Learning With Decision Trees and Random Forest ?
No ratings yet
Machine Learning With Decision Trees and Random Forest ?
31 pages
Decision tree
No ratings yet
Decision tree
16 pages
Decision_Trees_Concepts_Algorithms
No ratings yet
Decision_Trees_Concepts_Algorithms
15 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
No ratings yet
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
23 pages
UNIT5_AI
No ratings yet
UNIT5_AI
133 pages
Unit 4-2
No ratings yet
Unit 4-2
20 pages
Aiml Unit 1 Nil
No ratings yet
Aiml Unit 1 Nil
24 pages
A Study Using N-Gram Features For Text Categorization
No ratings yet
A Study Using N-Gram Features For Text Categorization
10 pages

MLS+1+-+Decision+Trees+and+Random+Forests

Uploaded by

MLS+1+-+Decision+Trees+and+Random+Forests

Uploaded by

Decision Tree and Random Forest

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

2. How do we measure the impurity in a decision tree?

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

GINI INDEX ENTROPY INFORMATION GAIN VARIANCE

Formula G = 1- Σci=1(pi2) E = -∑P(X).logP(X) IG (Y, X) = E(Y) - E(Y|X) V = Σ(x-μ)2/N

Computationally The most common

2. What is bootstrap aggregation and how does it work?

3. What is a random forest and how is it useful?

4. What are the

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

● It can be used to solve classiﬁcation as well as regression problems.

● In general, it does not suffer from overﬁtting.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

This file is meant for personal use by [email protected] only.

● Cost complexity pruning proceeds in the following stages:

2. Maximum Depth (max_depth):

3. The minimum number of samples per leaf (min_samples_leaf):

4. The minimum number of samples to split (min_samples_split):

This file is meant for personal use by [email protected] only.

You might also like