100% found this document useful (1 vote)
391 views58 pages

Understanding Decision Trees in Classification

The document discusses building classification models using decision trees. It begins by introducing supervised learning and classification. It then covers key concepts in decision trees like nodes, attributes, information gain, and algorithms like ID3. The document demonstrates calculating information gain and entropy to build a sample decision tree to predict if it is suitable to play golf based on weather attributes. It concludes by showing how to implement a decision tree using scikit-learn and visualize it with Graphviz in Python.

Uploaded by

amrita cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
391 views58 pages

Understanding Decision Trees in Classification

The document discusses building classification models using decision trees. It begins by introducing supervised learning and classification. It then covers key concepts in decision trees like nodes, attributes, information gain, and algorithms like ID3. The document demonstrates calculating information gain and entropy to build a sample decision tree to predict if it is suitable to play golf based on weather attributes. It concludes by showing how to implement a decision tree using scikit-learn and visualize it with Graphviz in Python.

Uploaded by

amrita cse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Mrs. S. L .

JOTHI LAKSHMI,
ASSISTANT PROFESSOR /CSE,
AMRITA COLLEGE OF ENGINEERING AND
TECHNOLOGY, NAGERCOIL.
Agenda

 Why supervised learning


 What is classification
 How classification
 What is decision trees
 Advantages of decision trees
 Important Terminology related to Decision Trees
 How do Decision Trees work?
 Attribute Selection Measure
 How does ID3 decide which attribute is the best
 Implementation of decision tree
Supervised Learning

 Algorithms learn from labeled data.


 The algorithm determines which label
should be given to new data
 There are two types of Supervised
learning problems.
1.CLASSIFICATION
2.REGRESSION
Supervised Learning model
Steps in Supervised Learning

 Determine the type of training examples


-what kind of data
 Gather a training set
 Determine the input feature representation of the
learned function
 Determine the structure of the learned function and
corresponding learning algorithm
 Complete the design
 Evaluate the accuracy of the learned function
Classification
 Categorize data into a given number of classes.
 Identify the category/class to which a new data will fall
under.
 classification algorithm, is a function that weighs the input
features so that the output separates one class into positive
values and the other into negative values
Terminologies encountered in classification
 Classifier
 Classification model
 Feature
 Binary Classification
 Multi-class classification
 Multi-label classification
The following are the steps involved in building a
classification model:
 Initialize the classifier to be used.
 Train the classifier
 Predict the target
 Evaluate the model
Classification process

Classification is a two-step process


 Learning step
 Prediction step
Types of Classification Algorithm

 Logistic Regression
 Naïve Bayes
 Stochastic Gradient Descent
 K-Nearest Neighbors
 Decision Tree
 Random Forest
 Support Vector Machine
Decision Tree
 Decision Tree algorithm belongs to the family of
supervised learning algorithms.
 Learning simple decision rules inferred from prior
data(training data).
 Start from the root of the tree.
Advantages of decision trees
 Simple to understand and to interpret.
 Trees can be visualized.
 Requires little data preparation.
 The cost of using the tree (i.e., predicting data) is logarithmic
in the number of data points used to train the tree.
 Able to handle both numerical and categorical data.
 Able to handle multi-output problems.
 Uses a white box model.
 Possible to validate a model using statistical tests.
 Performs well even if its assumptions are somewhat violated
by the true model from which the data were generated
Types of decision trees

 Types of decision trees are based on the type of target


variable
 Categorical Variable Decision tree:
categorical target variable.
 Continuous Variable Decision tree:
continuous target variable
Important Terminology related to Decision Trees

 Root Node
 Splitting
 Decision Node
 Leaf / Terminal Node
 Pruning
 Branch / Sub-Tree
 Parent and Child Node
Assumptions while creating Decision Tree

 Whole training set is considered as the root.


 Feature values are preferred to be categorical
 Records are distributed recursively
 Order to placing attributes
Attributes selection measures
 Identify which attributes do we need to consider as the
root node and each level.
 Entropy
 Information gain
 Gini index
 Gain Ratio
 Reduction in Variance
 Chi-Square
Entropy
 Entropy is a measure of the impurity or randomness
in the information being processed. The higher the
entropy, the harder it is to draw any conclusions from
that information.
Contd…
 Mathematically Entropy for 1 attribute is represented as:
Where S → Current state, and Pi → Probability of an event i of
state S or Percentage of class i in a node of state S.

 Mathematically Entropy for multiple attributes is represented


as:
where T→ Current state and X → Selected attribute
Information Gain
 Information gain or IG
Statistical property that measures how well a given
attribute separates the training examples according to
their target classification.

Information Gain(T,X)=Entropy(T)-Entropy(T,X)
Gini Index
 Cost function used to evaluate splits in the dataset.
 It is calculated by subtracting the sum of the squared
probabilities of each class from one.
 Gini Index works with the categorical target variable
“Success” or “Failure”. It performs only Binary splits.
 Higher the value of Gini index higher the
homogeneity.
Gain ratio & Reduction in variance
 Modification of information gain is gain ratio and
reduces its bias
 Gain ratio takes number and size of branches when
choosing an attribute

 Reduction in variance is an algorithm used for


continuous target variables (regression problems).
Chi-Square(Chi-squared Automatic Interaction Detector)

 It finds out the statistical significance between the


differences between sub-nodes and parent node.
 Mathematically, Chi-squared is represented as:
How do Decision Trees work?
Algorithms for classification
 ID3
 C4.5
 CART
ID3(Iterative Dichotomiser 3)
 The ID3 algorithm builds decision trees top-down greedy
search approach
Steps in ID3 algorithm:
 Calculate entropy for dataset.
 For each attribute/feature
 Calculate entropy for all its categorical values.
 Calculate information gain for the feature.
 Find the feature with maximum information gain.
 Repeat it until we get the desired tree.
Example
 Decide whether the weather is amenable to
playing GOLF. Over the course of 2 weeks, data
is collected to help ID3 build a decision tree .
 The weather attributes are outlook, temperature,
humidity, and wind speed.
 They can have the following values:
outlook = { sunny, overcast, rain }
temperature = {hot, mild, cool }
humidity = { high, normal }
wind = {weak, strong }
Play Golf
 Consider the table below. It represent factors that affect whether John
would go out to play golf or not. Using the data in the table, build a
decision tree to model that can be used to predict if John would play
golf or not.
Algorithm for Building Decision Trees – The
ID3 Algorithm
 Begin
Load learning sets and create decision tree root node(rootNode), add learning
set S into root not as its subset

For rootNode, compute Entropy(rootNode.subset) first
If Entropy(rootNode.subset) == 0 (subset is homogenious)
return a leaf node
If Entropy(rootNode.subset)!= 0 (subset is not homogenious)
compute Information Gain for each attribute left (not been used for
spliting)
Find attibute A with Maximum(Gain(S, A))
Create child nodes for this root node and add to rootNode in the decision
tree
For each child of the rootNode
Apply ID3(S, A, V)
Continue until a node with Entropy of 0 or a leaf node is reached
End
Step by Step Procedure

Step 1: Determine the Root of the Tree


Step 2: Calculate Entropy for The Classes
Step 3: Calculate Entropy After Split for Each Attribute
Step 4: Calculate Information Gain for each split
Step 5: Perform the Split
Step 6: Perform Further Splits
Step 7: Complete the Decision Tree
Determine the Root of the Tree
What is a good Attribute?
 A good attribute prefers attributes that split the data so
that each successor node is as pure as possible
 Attributes that have a high degree of „order“
 Entropy
 Information gain
Calculate Entropy for Other Attributes
After Split

 E(PlayGolf, Outloook)
 E(PlayGolf, Temperature)
 E(PlayGolf, Humidity)
 E(PlayGolf,Windy)
 To build a decision tree, we need to calculate two types
of entropy using frequency tables as follows
1.Entropy using the frequency table of one attribute
Contd..
 Entropy using the frequency table of two attributes:
E(PlayGolf, Temperature) Calculation
E(PlayGolf, Humidity) Calculation
E(PlayGolf, Windy) Calculation
Calculating Information Gain for Each Split
 The information gain is calculated using the formula:
Gain(S,T) = Entropy(S) – Entropy(S,T)
1.Gain(PlayGolf, Outlook) = Entropy(PlayGolf) – Entropy(PlayGolf,
Outlook)
= 0.94 – 0.693 = 0.247

2.Gain(PlayGolf, Temperature) = Entropy(PlayGolf ) – Entropy(PlayGolf,


Temparature)
= 0.94 – 0.911 = 0.029

3.Gain(PlayGolf, Humidity) = Entropy(PlayGolf ) – Entropy(PlayGolf,


Humidity)
= 0.94 – 0.788 = 0.152

4.Gain(PlayGolf, Windy) = Entropy(PlayGolf) – Entropy(PlayGolf, Windy)


= 0.94 – 0.892 = 0.048
Perform the First Split
Initial Split using Outlook
Perform Further Splits
Complete the Decision Tree
Demo
 Scikit-learn is a free software machine
learning library for the Python programming
language. , and is designed to interoperate with the
Python numerical and scientific
libraries NumPy and SciPy.
 Graphviz(Graphviz graph-drawing software
 Pydotplus (Decision Tree Graph)
 Matplotlib is a plotting library for the Python
programming language and its numerical
mathematics extension NumPy
Install packages
 !apt install -y graphviz
 !pip install graphviz
## import dependencies
from sklearn import tree #For our Decision Tree
import pandas as pd # For our DataFrame
import pydotplus
from IPython.display import Image
import matplotlib.pyplot as plt
import numpy as np
from graphviz import Digraph
from math import log
Create the dataset

 #Create the dataset


 #create empty data frame
 golf_df = pd.DataFrame()
 #add outlook
 golf_df['Outlook'] = ['rainy', 'rainy', 'overcast', 'sunny', '
sunny', 'sunny',
 'overcast', 'rainy', 'rainy', 'sunny', 'rainy', 'o
vercast',
 'overcast', 'sunny']
Dataset
Calculate information entropy
ROOT NODE
NEXT BRANCH
NEXT BRANCH
NEXT BRANCH
FINAL DECISION TREE
Decision tree implementation using Python Sklearn
 DecisionTreeClassifier is a class capable of performing multi-class classification on a
dataset.
 DecisionTreeClassifier parameters
 ccp_alpha
 class_weight
 criterion
 max_depth
 max_features
 max_leaf_nodes
 min_impurity_decrease
 min_impurity_split
 min_samples_leaf
 min_samples_split
 min_weight_fraction_leaf
 presort
 random_state
 splitter='best‘

 Iris dataset
Iris dataset
Decision tree for Iris dataset
Accuracy
 True Positive:our predicted positive and it’s true
 True Negative:You predicted negative and it’s true.
 False Positive:You predicted positive and it’s false
 False Negative:You predicted negative and it’s false.
 Recall:Out of all the positive classes, how much we
predicted correctly. It should be high as possible.
TP/TP+FN
 Precision:Out of all the positive classes we have predicted
correctly, how many are actually positive.
TP/TP+FP
 Accuracy->F-
measure=2*Recall*Precision/Recall+Precision
DISADVANTAGES
 Over fitting.
 Decision trees can be unstable
 The problem of learning an optimal decision tree is
known to be NP-complete
 Hard to learn because decision trees do not express
them easily, such as XOR, parity or multiplexer
problems.
 Decision tree learners create biased trees if some
classes dominate.
Over fitting
Two ways to remove over fitting
 Pruning Decision Trees.
 Random Forest
APPLICATIONS OF DECISION TREES
 Business Management
 Customer Relationship Management
 Fraudulent Statement Detection
 Engineering
 Energy Consumption
Thank you

You might also like