10. Decistion Tree.pptx

The document provides an overview of decision tree induction, detailing its advantages, structure, and the algorithms used for building decision trees, such as ID3, C4.5, and CART. It explains the processes involved in creating decision trees, including splitting, pruning, and predicting outcomes, as well as the calculation of information gain and Gini impurity. Additionally, it includes examples to illustrate the application of these algorithms in classification and regression tasks.

Uploaded by

mohirdhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

10. Decistion Tree.pptx

Uploaded by

mohirdhaka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Decision Tree

Sadia Islam
Assistant Professor
Department of Computer Science and engineering
United International University
Decision Tree
Decision tree (DT) induction is the learning of decision trees from class-labeled
training instances, which is a top-down recursive divide and conquer algorithm.
Advantages:
● Simple to understand.
● Easy to implement.
● Requiring little prior knowledge.
● Able to handle both numerical and categorical data.
● Robust.
● Dealing with large and noisy datasets.
● Nonlinear relationships between features do not affect the tree performance.
Decision Tree
A decision tree works by breaking down a dataset into smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree with decision nodes
and leaf nodes. Process:
● Root Node: This is the topmost node in a decision tree. It represents the entire dataset
● Splitting: This involves dividing a node into two or more sub-nodes. The split is based on
features in the data set. The aim is to ensure that the resulting sub-nodes are as pure
(homogeneous) as possible.
● Decision Nodes: They represent features in the dataset and the possible values they can
take and gets divided based on certain conditions.
● Leaf/Terminal Nodes: These nodes represent the final output or decision. They are the
nodes where no further splitting is possible or necessary.
● Pruning: Done to reduce the size of the tree.
● Predicting: The final part, predicting values of the test data.
Decision Tree
Building a Decision Tree: Different algorithms can be used like ID3, C4.5, or
CART.
Iterative Dichotomiser 3 (ID3)
● Calculate Entropy of the Target Variable
○ Calculate the randomness
● Calculate Information Gain for Each Attribute
● Select the Best Attribute for the Root Node
○ Choose the attribute with the highest information gain as the root node.
● Split the Dataset
○ Split the dataset into subsets based on the values of the chosen attribute
● Recursively Build the Tree
○ Continue this process until all instances in a subset belong to the same class or no more
attributes are left to split on
● Classify New Instances
Iterative Dichotomiser 3 (ID3)
● Information gain of an attribute =

Information of the output variables - information of an attribute

● Here, we are expressing information in terms of entropy(randomness), the

less entropy an attribute has, the more effective the attribute is in terms of
splitting the data towards more decisive output, the more gain it has
Iterative Dichotomiser 3 (ID3)
● Entropy for any random variable having k values

● Binary random variable

Iterative Dichotomiser 3 (ID3)
Information Gain of an attribute A = Entropy of output variable - Entropy of that attribute
IG(A) = Infooutput- InfoA
Infooutput= Entropyoutput =
Where, p is the number of positive cases and n is the number of negative cases
InfoA= EntropyA =

Where, pk and nk are the number of positive and negative instances respectively for attribute
value = k and d is the total number of values for the attribute
Example - Dataset
ID3 - Information Gain calculation
Infooutput= , Information gain of target variable

Information(entropy) of the attribute “Outlook”

InfoOutlook

Information Gain = Infooutput-Infooutlook = 0.940 − 0.694 = 0.246

ID3
Same way, the IG of Outlook = 0.246, Temperature = 0.086, Humidity = 0.154, and
Wind = 0.197.

So, we will divide the dataset based on Outlook first.

ID3
ID3
Decision Tree
C4.5
C4.5 - example for the same dataset
Steps:

● Calculate Information Gain for each attribute.

● Calculate Split Information for each attribute.
● Calculate Gain Ratio and select the attribute with the highest Gain Ratio for
splitting.
C4.5 - example for the same dataset
● Step 1 is already complete, IG(S, Outlook) = 0.246
● Look at the previous slides
○
C4.5 - example for the same dataset
● Step 2: Split Information for ”Outlook”
C4.5 - example for the same dataset
● Step 3: Gain Ratio for ”Outlook”

Similarly, find GR for all the attributes and select the one with the highest value to split the data.
CART( Classification And Regression Trees)
CART( Classification And Regression Trees) is a variation of the decision tree
algorithm. It can handle both classification and regression tasks.

Gini index/Gini impurity

The Gini index is a metric for the classification tasks in CART. It stores the sum of
squared probabilities of each class. It computes the degree of probability of a
specific variable that is wrongly being classified when chosen randomly and a
variation of the Gini coefficient.
CART( Classification And Regression Trees)

The degree of the Gini index varies from 0 to 1,

● Where 0 depicts that all the elements are allied to a certain class, or only one
class exists there.
● Gini index close to 1 means a high level of impurity, where each class
contains a very small fraction of elements, and
● A value of 1-1/n occurs when the elements are uniformly distributed into n
classes and each class has an equal probability of 1/n. For example, with two
classes, the Gini impurity is 1 – 1/2 = 0.5.
CART for Classification
Gini Impurity- Gini impurity measures the probability of misclassifying a random
instance from a subset labeled according to the majority class. Lower Gini impurity
means more purity of the subset.

Splitting Criteria- The CART algorithm evaluates all potential splits at every node
and chooses the one that best decreases the Gini impurity of the resultant
subsets. This process continues until a stopping criterion is reached, like a
maximum tree depth or a minimum number of instances in a leaf node.
CART for Regression
Residual Reduction- Residual reduction is a measure of how much the average
squared difference between the predicted values and the actual values for the
target variable is reduced by splitting the subset. The lower the residual reduction,
the better the model fits the data.

Splitting Criteria- CART evaluates every possible split at each node and selects
the one that results in the greatest reduction of residual error in the resulting
subsets. This process is repeated until a stopping criterion is met, such as
reaching the maximum tree depth or having too few instances in a leaf node.
CART Example

Study Hours Attendance (%) Final Grade (Pass/Fail)

10 85 Pass
8 78 Pass
4 50 Fail
6 65 Pass
3 40 Fail

1. Choose the Splitting Attribute.

Considering the Study Hours = 6.5 as mid to split.

● Left Node (Study Hours ≤ 6.5): (3, 40, Fail)(4, 50, Fail)(6, 65, Pass)
● Right Node (Study Hours > 6.5): (8, 78, Pass)(10, 85, Pass)
CART Example
2. Calculate Gini Impurity

Left Node:

Probability of Fail = ⅔

Probability of Pass = ⅓

Gini = 1 - (2/3)^2 - (1/3)^2 ≈ 0.44

Right Node:

Probability of Pass = 2/2 = 1

Gini = 1 - (1)^2 = 0
CART Example

Tree

Study Hours ≤ 6.5

/ \
Fail/pass Pass
(3, 4,6) (8, 10)
Arigato Gujaimas

ML Unit II
No ratings yet
ML Unit II
183 pages
u34
No ratings yet
u34
4 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
CART1
No ratings yet
CART1
17 pages
CSE 422 Machine Learning Tree Based Methods
No ratings yet
CSE 422 Machine Learning Tree Based Methods
35 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Decision Tree - ML Class
No ratings yet
Decision Tree - ML Class
58 pages
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
No ratings yet
Classification and Regression Trees (CART - I) : Dr. A. Ramesh
34 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
5 1 decision trees
No ratings yet
5 1 decision trees
34 pages
CSE445 NSU Week_4
No ratings yet
CSE445 NSU Week_4
48 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Tree: "For Each Node of The Tree, The Information Value Measures
No ratings yet
Decision Tree: "For Each Node of The Tree, The Information Value Measures
3 pages
Classification and Regression Trees (CART) Algorithm
No ratings yet
Classification and Regression Trees (CART) Algorithm
15 pages
6__DecisionTrees__ID3_CART
No ratings yet
6__DecisionTrees__ID3_CART
24 pages
Decitions Tree
No ratings yet
Decitions Tree
6 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Classification
No ratings yet
Classification
7 pages
Unit-3_ML
No ratings yet
Unit-3_ML
47 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
Peer Reviewed Scientific Journals
No ratings yet
Peer Reviewed Scientific Journals
9 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
UNIT 1 CLASSIFICATION & PREDICTION DM
No ratings yet
UNIT 1 CLASSIFICATION & PREDICTION DM
71 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
ML-Lecture-8-9-Classification
No ratings yet
ML-Lecture-8-9-Classification
35 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Ch13. Decision Tree: KH Wong
No ratings yet
Ch13. Decision Tree: KH Wong
82 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
23 pages
Data Science Concepts Lesson04 Decision Tree Concepts
No ratings yet
Data Science Concepts Lesson04 Decision Tree Concepts
22 pages
Decision_tree
No ratings yet
Decision_tree
15 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Decision Tree New
No ratings yet
Decision Tree New
52 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Slides+ +CART+V3+Copy+2
No ratings yet
Slides+ +CART+V3+Copy+2
21 pages
Decision Trees
No ratings yet
Decision Trees
16 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
2023-24_ML_NOTES_2
No ratings yet
2023-24_ML_NOTES_2
16 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Applying Decision Tree Algorithm Classification An
No ratings yet
Applying Decision Tree Algorithm Classification An
5 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Unit-3
No ratings yet
Unit-3
98 pages
S&ML Unit 6- Q & A
No ratings yet
S&ML Unit 6- Q & A
12 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Neural Networks and Deep Learning
No ratings yet
Neural Networks and Deep Learning
20 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 3 Notes
21 pages
Presentation On: Neural Network
No ratings yet
Presentation On: Neural Network
30 pages
Contoh Hasil Spss
No ratings yet
Contoh Hasil Spss
7 pages
JIUP - Deny Ardianto
No ratings yet
JIUP - Deny Ardianto
14 pages
A Multilayer Perceptron Neural Network Model For Predicting Diabetes
No ratings yet
A Multilayer Perceptron Neural Network Model For Predicting Diabetes
9 pages
AI Algorithms Explained To Kids
No ratings yet
AI Algorithms Explained To Kids
20 pages
CCS355 SET1 Anna University Lab Manual Question Set
100% (1)
CCS355 SET1 Anna University Lab Manual Question Set
3 pages
Early Grade Reading Assessment For Kindergarten Final
No ratings yet
Early Grade Reading Assessment For Kindergarten Final
6 pages
Week 2
No ratings yet
Week 2
17 pages
NNDL Question Bank
No ratings yet
NNDL Question Bank
9 pages
CSA3007_DEEP-LEARNING_LTP_1.0_40_DEEP LEARNING
No ratings yet
CSA3007_DEEP-LEARNING_LTP_1.0_40_DEEP LEARNING
2 pages
Module-1 Backpropagation Process in Deep Neural Network
No ratings yet
Module-1 Backpropagation Process in Deep Neural Network
5 pages
Deep Learning: Data Mining: Advanced Aspects
No ratings yet
Deep Learning: Data Mining: Advanced Aspects
131 pages
NNFL CBCGS Syllabus
No ratings yet
NNFL CBCGS Syllabus
8 pages
NN Examples Matlab
No ratings yet
NN Examples Matlab
91 pages
System Intelligence
No ratings yet
System Intelligence
3 pages
Conference Paper 28-02
No ratings yet
Conference Paper 28-02
8 pages
Delta Rule
No ratings yet
Delta Rule
3 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Neural Network
No ratings yet
Neural Network
12 pages
Ieee Conference Paper Template
No ratings yet
Ieee Conference Paper Template
5 pages
CST395 NEURAL NETWORKS AND DEEP LEARNING, DECEMBER 2021 (2)
No ratings yet
CST395 NEURAL NETWORKS AND DEEP LEARNING, DECEMBER 2021 (2)
3 pages
Supervised Learning in R Classification
No ratings yet
Supervised Learning in R Classification
7 pages
CSC 325 AI Assignment 02 23102023 033111pm
No ratings yet
CSC 325 AI Assignment 02 23102023 033111pm
5 pages
Decision Tree Classifier-Introduction, ID3
No ratings yet
Decision Tree Classifier-Introduction, ID3
34 pages
Model Questions DWT
No ratings yet
Model Questions DWT
2 pages
Deep Learning - Lecture 4
No ratings yet
Deep Learning - Lecture 4
13 pages
Lecture 03 Perceptron PDF
No ratings yet
Lecture 03 Perceptron PDF
12 pages