0% found this document useful (0 votes)
67 views

Lec 3&4

The document describes decision tree classifiers and how they are constructed using information gain. It provides examples of weather and tennis play datasets to demonstrate how a decision tree is built in a recursive top-down manner by splitting the dataset into purer subsets based on feature attributes that have the highest information gain. The key steps are: 1) calculating the information gain of attributes, 2) selecting the attribute with highest gain as the root node, 3) splitting the dataset on that attribute into subsets, 4) recursively repeating on the subsets until stopping criteria is met.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

Lec 3&4

The document describes decision tree classifiers and how they are constructed using information gain. It provides examples of weather and tennis play datasets to demonstrate how a decision tree is built in a recursive top-down manner by splitting the dataset into purer subsets based on feature attributes that have the highest information gain. The key steps are: 1) calculating the information gain of attributes, 2) selecting the attribute with highest gain as the root node, 3) splitting the dataset on that attribute into subsets, 4) recursively repeating on the subsets until stopping criteria is met.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Decision Tree Classifiers

Tanmay Basu

Department of Data Science and Engineering


IISER Bhopal, India

Tanmay Basu Decision Tree Classifiers 1


Overview

▶ Decision tree builds classification models in the form of a tree


structure
▶ A decision tree classifier is expressed as a recursive partition of
the instance space
▶ The decision tree consists of nodes that form a rooted tree
i.e., a directed tree with a node called ’root’ that has no
incoming edges into it
▶ All other nodes in the tree have exactly one incoming edge
and can have many outgoing edges
▶ All other nodes at the bottom are called leave nodes or
terminal nodes or decision nodes

Tanmay Basu Decision Tree Classifiers 2


Example of a Decision Tree

Tanmay Basu Decision Tree Classifiers 3


Weather Dataset Developed by Quinlan

SL Outlook Temperature Humidity Wind Play Tennis?


1 Sunny Hot High False No
2 Sunny Hot High True No
3 Overcast Hot High False Yes
4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
9 Sunny Cool Normal False Yes
10 Rainy Mild Normal False Yes
11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No
Tanmay Basu Decision Tree Classifiers 4
How the Algorithm Works?

Figure 1: Transformation of Figure 1

Tanmay Basu Decision Tree Classifiers 5


How the Algorithm Works?

Figure 2: Decision Subtree for ”Outlook”

Tanmay Basu Decision Tree Classifiers 6


How the Algorithm Works?

Figure 3: Decision Subtree for ”Outlook” and ”Temperature”

Tanmay Basu Decision Tree Classifiers 7


Objectives and Natures of Decision Tree Algorithms

▶ The goal of a decision tree classification algorithm is to find


the optimal decision tree by minimizing the generalization
error.

▶ The optimal decision tree can be obtained by minimizing the


number of nodes or minimizing the average depth of the tree.

▶ Induction of an optimal decision tree from a given data is


considered to be a hard task.

▶ There are various top down decision tree algorithms such as


ID3, C4.5, CART, which are greedy in nature and construct
the decision tree in a top-down recursive manner.

Tanmay Basu Decision Tree Classifiers 8


Basic Idea of Decision Tree Algorithms

▶ In each iteration these algorithms consider the partition of a


training set using the outcome of a discrete function of the
features.

▶ The selection of the most appropriate function is made


according to some splitting criteria.

▶ After the selection of an appropriate split, each node further


subdivides the training set into smaller subsets.

▶ This process continues till no more split is possible or a


stopping criterion is satisfied.

Tanmay Basu Decision Tree Classifiers 9


Splitting Measure: Information Gain
▶ The information gain for a feature of a set of training samples
is simply the reduction in entropy caused by partitioning the
training samples according to this feature.

▶ This is used as splitting measure of the ID3, C4.5 decision


tree algorithm.

▶ The information gain of a particular feature (say f) of a set of


training samples X can be defined as

X X : f = v
IG (X , f ) = E (X ) − E (X : f = v ) (1)
|X |
v ∈values(f )

where values(f) refer to the set of all possible values of the


feature f.
Tanmay Basu Decision Tree Classifiers 10
Splitting Measure: Information Gain

• E(X) be the entropy and defined as


c
X
E (X ) = −pi log pi (2)
i=1

where c be the number of classes,


pi is the proportion of X belong to
the i th class.
• E (X ) ∈ [0, 1] when c=2
• Entropy = 0 =⇒ all the members
of X belong to either of the classes
Figure 4: Entropy Function • Entropy = 1 =⇒ X contains equal
for Binary Classification number of samples from two
Problem different classes

Tanmay Basu Decision Tree Classifiers 11


How Information Gain Works?

SL Outlook Temperature Humidity Wind Play Tennis?


1 Sunny Hot High False No
2 Sunny Hot High True No
3 Overcast Hot High False Yes
4 Rainy Mild High False Yes
5 Rainy Cool Normal False Yes
6 Rainy Cool Normal True No
7 Overcast Cool Normal True Yes
8 Sunny Mild High False No
9 Sunny Cool Normal False Yes
10 Rainy Mild Normal False Yes
11 Sunny Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Rainy Mild High True No

2
P 5 5 9 9
E (X ) = −pi log2 pi = − log2 − log2 = 0.94
i=1 14 14 14 14

Tanmay Basu Decision Tree Classifiers 12


How Information Gain Works?

X |X : outlook = v |
IG (X , outlook) = E (X ) − E (X : outlook = v )
|X |
v ∈outlook
2 2 3 3
E (X |outlook = sunny ) = − log2 − log2 = 0.970
5 5 5 5
E (X |outlook = overcast) = −1 log2 1 − 0 log2 0 = 0
3 3 2 2
E (X |outlook = rainy ) = − log2 − log2 = 0.970
5 5 5 5
5 5 5 
IG (X , outlook) = 0.94− ∗ 0.97 + ∗0+ ∗ 0.970 = 0.94 − 0.692 = 0.248
14 14 14
Tanmay Basu Decision Tree Classifiers 13
How Information Gain Works?

▶ Similarly, IG(X,temperature)=0.029; IG(X,humidity)=0.152;


IG(X,wind)=0.048
▶ Hence the root node is selected as outlook as it has highest IG value
▶ The next step is to divide the subtrees further when outlook =sunny and
when outlook=rainy
▶ For outlook=overcast, all the entries belong to a single class and hence
no further split is necessary

Tanmay Basu Decision Tree Classifiers 14


How Information Gain Works?


X X : wind = v &outlook = sunny
IG (X , wind : outlook = sunny ) = E (X : outlook = sunny ) −
v ∈wind
|X |

E (X : wind = v &outlook = sunny )

E (X : outlook = sunny ) = 0.970


E (X : wind = true & outlook = sunny ) = 1 (since, one yes, one no)
1 1 2 2
E (X : wind = false & outlook = sunny ) = − log2 − log2 = 0.918
3 3 3 3
Tanmay Basu Decision Tree Classifiers 15
Final Decision Tree using Information Gain

2 3 
∴ IG (X , wind : outlook = sunny ) = 0.97 − ∗1+
∗ 0.918
5 5
= 0.97 − 0.950 = 0.020

Similarly, IG(X,temperature:outlook=sunny)=0.571 and


IG(X,humidity:outlook=sunny)=0.971

Thus the left subtree of outlook will split based on Humidity as it has highest
information gain. Eventually, the decision tree will look as follows:

Figure 5: Final Decision Tree

Tanmay Basu Decision Tree Classifiers 16


Splitting Measure: Gini Index

• It is used as the splitting measure of the CART decision tree


algorithm.
• Gini index of a set of training samples X for a particular
feature f is
k
X |Xj |
Gini(X , f ) = Gini(Xj ) (3)
|X |
j=1

where k denotes the number of splits for a particular feature f,


Xj denotes the set of training samples traversed at the j th
split and
Xc
2
Gini(Xj ) = 1 − pi,j (4)
i=1

Here c be the number of classes in the data set and pi,j be the
proportion of Xj belonging to the i th class.

Tanmay Basu Decision Tree Classifiers 17


Splitting Measure: Gini Index

▶ The minimum value of Gini(Xj ) = 0 when all samples of Xj


belong to one particular class, which indicates most
interesting information.
▶ Gini index is maximum when the training samples are equally
distributed among all classes implying least important
information.
▶ The maximum gini index is (1 − X0 ), where X0 is the
proportion of Xj belonging to the individual classes.
▶ The feature that produces the smallest gini index following
equation 3 by computing the gini gain as defined below is
chosen to split a node

Gini Gain(X , f ) = Gini(X ) − Gini(X , f ) (5)

Gini index tends to isolate the largest class from the data.

Tanmay Basu Decision Tree Classifiers 18


Splitting Measure: Gain Ratio

• It has been mentioned that information gain measure tends to


prefer features with large number of value.
• Gain ratio is an extension of information gain measure that
reduces its bias on features with large number of branches
• The gain ratio is defined as follows for a set of training
samples X and a particular feature f.

IG (X , f )
GainRatio(X , f ) = (6)
E (X , f )

• Note that gain ratio normalizes the information gain and it is


not defined when E(X,f)=0
• The feature with maximum gain ratio is selected for the
splitting.

Tanmay Basu Decision Tree Classifiers 19


References
▶ Lior Rokach and Oded Z Maimon. Data mining with decision
trees: theory and applications,vol. 69. World scientific, 2007.

▶ Tom Mitchell. Machine Learning. McGraw Hill, ISBN:


0070428077 edition, 1997.

▶ L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone.


Classification and regression trees.Metrika, 33:128–128, 1986.

▶ J Ross Quinlan. C4. 5: programs for machine learning.


Elsevier, 2014.

▶ Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S,


Barnes L, Brown D. Text classification algorithms: A survey.
Information;10(4):150, 2019.
https://github.com/kk7nc/Text Classification
Tanmay Basu Decision Tree Classifiers 20

You might also like