100% found this document useful (1 vote)

391 views58 pages

Understanding Decision Trees in Classification

The document discusses building classification models using decision trees. It begins by introducing supervised learning and classification. It then covers key concepts in decision trees like nodes, attributes, information gain, and algorithms like ID3. The document demonstrates calculating information gain and entropy to build a sample decision tree to predict if it is suitable to play golf based on weather attributes. It concludes by showing how to implement a decision tree using scikit-learn and visualize it with Graphviz in Python.

Uploaded by

amrita cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

391 views58 pages

Understanding Decision Trees in Classification

Uploaded by

amrita cse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Mrs. S. L .

JOTHI LAKSHMI,
ASSISTANT PROFESSOR /CSE,
AMRITA COLLEGE OF ENGINEERING AND
TECHNOLOGY, NAGERCOIL.
Agenda

 Why supervised learning

 What is classification
 How classification
 What is decision trees
 Advantages of decision trees
 Important Terminology related to Decision Trees
 How do Decision Trees work?
 Attribute Selection Measure
 How does ID3 decide which attribute is the best
 Implementation of decision tree
Supervised Learning

 Algorithms learn from labeled data.

 The algorithm determines which label
should be given to new data
 There are two types of Supervised
learning problems.
1.CLASSIFICATION
2.REGRESSION
Supervised Learning model
Steps in Supervised Learning

 Determine the type of training examples

-what kind of data
 Gather a training set
 Determine the input feature representation of the
learned function
 Determine the structure of the learned function and
corresponding learning algorithm
 Complete the design
 Evaluate the accuracy of the learned function
Classification
 Categorize data into a given number of classes.
 Identify the category/class to which a new data will fall
under.
 classification algorithm, is a function that weighs the input
features so that the output separates one class into positive
values and the other into negative values
Terminologies encountered in classification
 Classifier
 Classification model
 Feature
 Binary Classification
 Multi-class classification
 Multi-label classification
The following are the steps involved in building a
classification model:
 Initialize the classifier to be used.
 Train the classifier
 Predict the target
 Evaluate the model
Classification process

Classification is a two-step process

 Learning step
 Prediction step
Types of Classification Algorithm

 Logistic Regression
 Naïve Bayes
 Stochastic Gradient Descent
 K-Nearest Neighbors
 Decision Tree
 Random Forest
 Support Vector Machine
Decision Tree
 Decision Tree algorithm belongs to the family of
supervised learning algorithms.
 Learning simple decision rules inferred from prior
data(training data).
 Start from the root of the tree.
Advantages of decision trees
 Simple to understand and to interpret.
 Trees can be visualized.
 Requires little data preparation.
 The cost of using the tree (i.e., predicting data) is logarithmic
in the number of data points used to train the tree.
 Able to handle both numerical and categorical data.
 Able to handle multi-output problems.
 Uses a white box model.
 Possible to validate a model using statistical tests.
 Performs well even if its assumptions are somewhat violated
by the true model from which the data were generated
Types of decision trees

 Types of decision trees are based on the type of target

variable
 Categorical Variable Decision tree:
categorical target variable.
 Continuous Variable Decision tree:
continuous target variable
Important Terminology related to Decision Trees

 Root Node
 Splitting
 Decision Node
 Leaf / Terminal Node
 Pruning
 Branch / Sub-Tree
 Parent and Child Node
Assumptions while creating Decision Tree

 Whole training set is considered as the root.

 Feature values are preferred to be categorical
 Records are distributed recursively
 Order to placing attributes
Attributes selection measures
 Identify which attributes do we need to consider as the
root node and each level.
 Entropy
 Information gain
 Gini index
 Gain Ratio
 Reduction in Variance
 Chi-Square
Entropy
 Entropy is a measure of the impurity or randomness
in the information being processed. The higher the
entropy, the harder it is to draw any conclusions from
that information.
Contd…
 Mathematically Entropy for 1 attribute is represented as:
Where S → Current state, and Pi → Probability of an event i of
state S or Percentage of class i in a node of state S.

 Mathematically Entropy for multiple attributes is represented

as:
where T→ Current state and X → Selected attribute
Information Gain
 Information gain or IG
Statistical property that measures how well a given
attribute separates the training examples according to
their target classification.

Information Gain(T,X)=Entropy(T)-Entropy(T,X)
Gini Index
 Cost function used to evaluate splits in the dataset.
 It is calculated by subtracting the sum of the squared
probabilities of each class from one.
 Gini Index works with the categorical target variable
“Success” or “Failure”. It performs only Binary splits.
 Higher the value of Gini index higher the
homogeneity.
Gain ratio & Reduction in variance
 Modification of information gain is gain ratio and
reduces its bias
 Gain ratio takes number and size of branches when
choosing an attribute

 Reduction in variance is an algorithm used for

continuous target variables (regression problems).
Chi-Square(Chi-squared Automatic Interaction Detector)

 It finds out the statistical significance between the

differences between sub-nodes and parent node.
 Mathematically, Chi-squared is represented as:
How do Decision Trees work?
Algorithms for classification
 ID3
 C4.5
 CART
ID3(Iterative Dichotomiser 3)
 The ID3 algorithm builds decision trees top-down greedy
search approach
Steps in ID3 algorithm:
 Calculate entropy for dataset.
 For each attribute/feature
 Calculate entropy for all its categorical values.
 Calculate information gain for the feature.
 Find the feature with maximum information gain.
 Repeat it until we get the desired tree.
Example
 Decide whether the weather is amenable to
playing GOLF. Over the course of 2 weeks, data
is collected to help ID3 build a decision tree .
 The weather attributes are outlook, temperature,
humidity, and wind speed.
 They can have the following values:
outlook = { sunny, overcast, rain }
temperature = {hot, mild, cool }
humidity = { high, normal }
wind = {weak, strong }
Play Golf
 Consider the table below. It represent factors that affect whether John
would go out to play golf or not. Using the data in the table, build a
decision tree to model that can be used to predict if John would play
golf or not.
Algorithm for Building Decision Trees – The
ID3 Algorithm
 Begin
Load learning sets and create decision tree root node(rootNode), add learning
set S into root not as its subset

For rootNode, compute Entropy(rootNode.subset) first
If Entropy(rootNode.subset) == 0 (subset is homogenious)
return a leaf node
If Entropy(rootNode.subset)!= 0 (subset is not homogenious)
compute Information Gain for each attribute left (not been used for
spliting)
Find attibute A with Maximum(Gain(S, A))
Create child nodes for this root node and add to rootNode in the decision
tree
For each child of the rootNode
Apply ID3(S, A, V)
Continue until a node with Entropy of 0 or a leaf node is reached
End
Step by Step Procedure

Step 1: Determine the Root of the Tree

Step 2: Calculate Entropy for The Classes
Step 3: Calculate Entropy After Split for Each Attribute
Step 4: Calculate Information Gain for each split
Step 5: Perform the Split
Step 6: Perform Further Splits
Step 7: Complete the Decision Tree
Determine the Root of the Tree
What is a good Attribute?
 A good attribute prefers attributes that split the data so
that each successor node is as pure as possible
 Attributes that have a high degree of „order“
 Entropy
 Information gain
Calculate Entropy for Other Attributes
After Split

 E(PlayGolf, Outloook)
 E(PlayGolf, Temperature)
 E(PlayGolf, Humidity)
 E(PlayGolf,Windy)
 To build a decision tree, we need to calculate two types
of entropy using frequency tables as follows
1.Entropy using the frequency table of one attribute
Contd..
 Entropy using the frequency table of two attributes:
E(PlayGolf, Temperature) Calculation
E(PlayGolf, Humidity) Calculation
E(PlayGolf, Windy) Calculation
Calculating Information Gain for Each Split
 The information gain is calculated using the formula:
Gain(S,T) = Entropy(S) – Entropy(S,T)
1.Gain(PlayGolf, Outlook) = Entropy(PlayGolf) – Entropy(PlayGolf,
Outlook)
= 0.94 – 0.693 = 0.247

2.Gain(PlayGolf, Temperature) = Entropy(PlayGolf ) – Entropy(PlayGolf,

Temparature)
= 0.94 – 0.911 = 0.029

3.Gain(PlayGolf, Humidity) = Entropy(PlayGolf ) – Entropy(PlayGolf,

Humidity)
= 0.94 – 0.788 = 0.152

4.Gain(PlayGolf, Windy) = Entropy(PlayGolf) – Entropy(PlayGolf, Windy)

= 0.94 – 0.892 = 0.048
Perform the First Split
Initial Split using Outlook
Perform Further Splits
Complete the Decision Tree
Demo
 Scikit-learn is a free software machine
learning library for the Python programming
language. , and is designed to interoperate with the
Python numerical and scientific
libraries NumPy and SciPy.
 Graphviz(Graphviz graph-drawing software
 Pydotplus (Decision Tree Graph)
 Matplotlib is a plotting library for the Python
programming language and its numerical
mathematics extension NumPy
Install packages
 !apt install -y graphviz
 !pip install graphviz
## import dependencies
from sklearn import tree #For our Decision Tree
import pandas as pd # For our DataFrame
import pydotplus
from IPython.display import Image
import matplotlib.pyplot as plt
import numpy as np
from graphviz import Digraph
from math import log
Create the dataset

 #Create the dataset

 #create empty data frame
 golf_df = pd.DataFrame()
 #add outlook
 golf_df['Outlook'] = ['rainy', 'rainy', 'overcast', 'sunny', '
sunny', 'sunny',
 'overcast', 'rainy', 'rainy', 'sunny', 'rainy', 'o
vercast',
 'overcast', 'sunny']
Dataset
Calculate information entropy
ROOT NODE
NEXT BRANCH
NEXT BRANCH
NEXT BRANCH
FINAL DECISION TREE
Decision tree implementation using Python Sklearn
 DecisionTreeClassifier is a class capable of performing multi-class classification on a
dataset.
 DecisionTreeClassifier parameters
 ccp_alpha
 class_weight
 criterion
 max_depth
 max_features
 max_leaf_nodes
 min_impurity_decrease
 min_impurity_split
 min_samples_leaf
 min_samples_split
 min_weight_fraction_leaf
 presort
 random_state
 splitter='best‘

 Iris dataset
Iris dataset
Decision tree for Iris dataset
Accuracy
 True Positive:our predicted positive and it’s true
 True Negative:You predicted negative and it’s true.
 False Positive:You predicted positive and it’s false
 False Negative:You predicted negative and it’s false.
 Recall:Out of all the positive classes, how much we
predicted correctly. It should be high as possible.
TP/TP+FN
 Precision:Out of all the positive classes we have predicted
correctly, how many are actually positive.
TP/TP+FP
 Accuracy->F-
measure=2*Recall*Precision/Recall+Precision
DISADVANTAGES
 Over fitting.
 Decision trees can be unstable
 The problem of learning an optimal decision tree is
known to be NP-complete
 Hard to learn because decision trees do not express
them easily, such as XOR, parity or multiplexer
problems.
 Decision tree learners create biased trees if some
classes dominate.
Over fitting
Two ways to remove over fitting
 Pruning Decision Trees.
 Random Forest
APPLICATIONS OF DECISION TREES
 Business Management
 Customer Relationship Management
 Fraudulent Statement Detection
 Engineering
 Energy Consumption
Thank you

MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Data Science Experiment Guide
100% (2)
Data Science Experiment Guide
43 pages
Stats & ML Model Comparisons
100% (1)
Stats & ML Model Comparisons
72 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Understanding the Curse of Dimensionality
No ratings yet
Understanding the Curse of Dimensionality
9 pages
Python Generators: How To Create A Generator in Python?
No ratings yet
Python Generators: How To Create A Generator in Python?
8 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Introduction to Statistics Basics
100% (1)
Introduction to Statistics Basics
46 pages
Linear Regression with Scikit-Learn
No ratings yet
Linear Regression with Scikit-Learn
8 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Python Pandas: 12 Data Manipulation Techniques
100% (2)
Python Pandas: 12 Data Manipulation Techniques
19 pages
CCS355 Neural Networks and Deep Learning Lab
No ratings yet
CCS355 Neural Networks and Deep Learning Lab
43 pages
Machine Learning Course with Python
No ratings yet
Machine Learning Course with Python
120 pages
CS7641 Machine Learning Midterm Notes PDF
0% (1)
CS7641 Machine Learning Midterm Notes PDF
239 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Machine Learning Notes - Concepts, Algorithms
No ratings yet
Machine Learning Notes - Concepts, Algorithms
171 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
4 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
83 pages
Machine Learning Bits
100% (2)
Machine Learning Bits
28 pages
SVM Guide for Data Science Enthusiasts
100% (1)
SVM Guide for Data Science Enthusiasts
28 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
Deep Learning CNN
100% (1)
Deep Learning CNN
22 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Machine Learning Lab Assignments
100% (2)
Machine Learning Lab Assignments
23 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
ML0101EN Clas Logistic Reg Churn Py v1
100% (1)
ML0101EN Clas Logistic Reg Churn Py v1
13 pages
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
100% (1)
Python Numpy (1) : Intro To Multi-Dimensional Array & Numerical Linear Algebra
27 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Introduction To Machine Learning PART 1
No ratings yet
Introduction To Machine Learning PART 1
6 pages
Unit I
No ratings yet
Unit I
10 pages
Regression Project
100% (1)
Regression Project
60 pages
Strings PDF
No ratings yet
Strings PDF
14 pages
Scikit-learn Machine Learning Tutorial
No ratings yet
Scikit-learn Machine Learning Tutorial
17 pages
ML First Unit
0% (1)
ML First Unit
70 pages
Scikit-Learn Overview and Algorithms
100% (2)
Scikit-Learn Overview and Algorithms
12 pages
Supervised Learning
No ratings yet
Supervised Learning
3 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Python Assignment
33% (3)
Python Assignment
53 pages
Machine Learning Lab Experiments Guide
No ratings yet
Machine Learning Lab Experiments Guide
47 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
27 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Matplotlib Guide for Python Users
100% (1)
Matplotlib Guide for Python Users
21 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Machine Learning Loss Functions Guide
100% (2)
Machine Learning Loss Functions Guide
37 pages
Predicting Salary with Experience
100% (1)
Predicting Salary with Experience
7 pages
R22 ML Question Bank For It and CSM
No ratings yet
R22 ML Question Bank For It and CSM
4 pages
Machine Learning Data Preparation Guide
No ratings yet
Machine Learning Data Preparation Guide
49 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
35 pages
FIND-S and Candidate-Elimination Algorithms
100% (1)
FIND-S and Candidate-Elimination Algorithms
44 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
K-Means Clustering Explained
100% (1)
K-Means Clustering Explained
25 pages
ML Notes
100% (2)
ML Notes
125 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Slide 3
No ratings yet
Slide 3
23 pages
Lecture 14. Remote Object Reference
No ratings yet
Lecture 14. Remote Object Reference
13 pages
Lecture 13. External Data Representation & Marshalling
No ratings yet
Lecture 13. External Data Representation & Marshalling
13 pages
2161CS136 Distributed Systems: Unit II Process and Distributed Objects Lecture No.12 TCP Stream Communication
No ratings yet
2161CS136 Distributed Systems: Unit II Process and Distributed Objects Lecture No.12 TCP Stream Communication
14 pages
Interprocess Communication Guide
No ratings yet
Interprocess Communication Guide
23 pages
IFoS - Previous Years Questions 2000 - 11 (Complex Analysis)
No ratings yet
IFoS - Previous Years Questions 2000 - 11 (Complex Analysis)
2 pages
Exterior Algebra With Differential Forms On Manifolds: Md. Showkat Ali, K. M. Ahmed, M. R Khan and Md. Mirazul Islam
No ratings yet
Exterior Algebra With Differential Forms On Manifolds: Md. Showkat Ali, K. M. Ahmed, M. R Khan and Md. Mirazul Islam
7 pages
Newton's Divided Difference Polynomial Method
No ratings yet
Newton's Divided Difference Polynomial Method
9 pages
Strategic+Management Hyo+Kang SP2025+ (Rev1)
No ratings yet
Strategic+Management Hyo+Kang SP2025+ (Rev1)
7 pages
5.4 Exponential and Logarithmic Functions
No ratings yet
5.4 Exponential and Logarithmic Functions
25 pages
Calculus Integration Guide
No ratings yet
Calculus Integration Guide
10 pages
Engineering Students' Exam Performance Analysis
No ratings yet
Engineering Students' Exam Performance Analysis
6 pages
Rolls Royce Financial Analysis 2017-2020
No ratings yet
Rolls Royce Financial Analysis 2017-2020
19 pages
HPLC New
No ratings yet
HPLC New
23 pages
Understanding Research A Consumers Guide 2nd Edition by Vicki L Plano Clark Ebook and TestBank Bundle Verified PDF
No ratings yet
Understanding Research A Consumers Guide 2nd Edition by Vicki L Plano Clark Ebook and TestBank Bundle Verified PDF
411 pages
Complex Variables Functions Guide
No ratings yet
Complex Variables Functions Guide
74 pages
10 Maths Statistics Notes Question Bank
No ratings yet
10 Maths Statistics Notes Question Bank
16 pages
Limits - Problems and Solutions
No ratings yet
Limits - Problems and Solutions
15 pages
Detection of Fake News Posts On Facebook
No ratings yet
Detection of Fake News Posts On Facebook
6 pages
(EXE) Chapter 1 Functions
No ratings yet
(EXE) Chapter 1 Functions
24 pages
Semester One 3C/3D Exam Overview
No ratings yet
Semester One 3C/3D Exam Overview
10 pages
Curve Fitting Toolbox™ II
No ratings yet
Curve Fitting Toolbox™ II
703 pages
M.Sc. Operational Research at University of Delhi, North Campus
No ratings yet
M.Sc. Operational Research at University of Delhi, North Campus
3 pages
Structural Dynamics TME141
No ratings yet
Structural Dynamics TME141
9 pages
Fixed Points in Cone Banach Spaces
No ratings yet
Fixed Points in Cone Banach Spaces
14 pages
Psychology Ph.D. Program Overview
No ratings yet
Psychology Ph.D. Program Overview
15 pages
Understanding Rational Functions
No ratings yet
Understanding Rational Functions
17 pages
Decision Science Notes
No ratings yet
Decision Science Notes
11 pages
Finite Element Problem 2
No ratings yet
Finite Element Problem 2
4 pages
Inverse Functions PDF
No ratings yet
Inverse Functions PDF
5 pages
Gravetter, Frederick J. - Wallnau, Larry B - Statistics For The Behavioral Sciences (2019, Cengage Learning Asia Pte LTD) (119-120)
No ratings yet
Gravetter, Frederick J. - Wallnau, Larry B - Statistics For The Behavioral Sciences (2019, Cengage Learning Asia Pte LTD) (119-120)
2 pages
Lablas
No ratings yet
Lablas
7 pages
WWW - Manaresults.Co - In: Operations Research
No ratings yet
WWW - Manaresults.Co - In: Operations Research
2 pages
Chemical Equilibrium Solutions
No ratings yet
Chemical Equilibrium Solutions
4 pages
2.2 Theorems On Differentiation
No ratings yet
2.2 Theorems On Differentiation
19 pages

Understanding Decision Trees in Classification

Uploaded by

Understanding Decision Trees in Classification

Uploaded by

Mrs. S. L .

 Why supervised learning

 Algorithms learn from labeled data.

 Determine the type of training examples

Classification is a two-step process

 Types of decision trees are based on the type of target

 Whole training set is considered as the root.

 Mathematically Entropy for multiple attributes is represented

 Reduction in variance is an algorithm used for

 It finds out the statistical significance between the

Step 1: Determine the Root of the Tree

2.Gain(PlayGolf, Temperature) = Entropy(PlayGolf ) – Entropy(PlayGolf,

3.Gain(PlayGolf, Humidity) = Entropy(PlayGolf ) – Entropy(PlayGolf,

4.Gain(PlayGolf, Windy) = Entropy(PlayGolf) – Entropy(PlayGolf, Windy)

 #Create the dataset

You might also like