Chapter 5. Decision Trees
Chapter 5. Decision Trees
DECISION TREES
INTERMEDIATE ECONOMETRICS & DATA ANALYSIS
CHAPTER 5. PLAN
1. IN BUSINESS
2. IN LAW
• In law, decision trees can be used to evaluate the various financial outcomes
that may arise from litigation.
C. WHEN TO USE DT?
(3/3)
3. IN VIDEO GAMES
• In video games, decision trees enable players to shape their own story or
outcome by selecting the options they believe are best.
DT ALGORITHM
PART II
A. HOW DT WORKS?
(1/6)
• A decision tree is typically constructed with the root at the top or on the left
side, depending on the orientation. If the branches are not labeled, the left
branch is generally assumed to represent “true”, while the right branch
represents “false”.
ROOT NODE
Tree roots
BRANCHES
INTERNAL NODES
Outcomes of previous decisions or tests
on features
BRANCHE
S LEAF NODE BRANCHES
Final decision or outcome => No more
branches
BRANCHES
A. HOW DT WORKS?
(2/6)
• Finding the best split requires constructing and comparing multiple decision
trees to determine the most effective one.
• To evaluate which tree is the best, we use the “GINI Impurity”, also known as
the “GINI Index”.
• This index measures the probability that a randomly selected observation will
be incorrectly classified.
1. BEST SPLIT
(2/8)
IMPURITY PURITY
• An impure node contains cases distributed • A pure node has all cases classified into a
across more than one branch (as single branch or all cases belonging to a
illustrated in the example on slide 24). single class (as shown in the example on
slide 24).
• The GINI Impurity ranges from 0 to 1, • A GINI Impurity of “0” indicates a pure
where 0 indicates a perfect split (i.e., node, meaning either only one class is
the best possible tree) and 1 indicates present, or all cases belong to a single
the worst. branch of the node.
1. Sort the values from smallest to largest; 1. Use each qualitative feature to build a
GENDER
GENDER GPA LIKES MATH
Femal Mal
M 4.0 YES e e
F 3.2 YES LIKES LIKES
MATH MATH
F 3.5 NO
Yes No
M 3.8 NO
F 3.0 NO
2 2 1 1
F 3.1 YES
1. QUALITATIVE FEATURE
1. BEST SPLIT
(6/8)
GPA <
GENDER GPA LIKES MATH 3.05
Tru Fals
F 3.0
3.0
NO e e
F 3.1
5
YES LIKES LIKES
3.1
5
MATH MATH
F 3.2 YES PURE
3.3
5 NODE
F 3.5 NO
3.6
5 0 1 3 2
M 3.8 NO
3.9
M 4.0 YES
GPA <
Repeat this 3.15
Tru Fals
process for all e e
2. QUANTITATIVE FEATURE other calculated
means. LIKES LIKES
MATH MATH
IMPURE
NODES
1 1 2 2
1. BEST SPLIT
(6/8)
GPA <
GENDER GPA LIKES MATH 3.05
Tru Fals
F 3.0
3.0
NO e e
F 3.1
5
YES LIKES LIKES
3.1
5
MATH MATH
F 3.2 YES
3.3
5
F 3.5 NO
3.6
5 0 1 3 2
M 3.8 NO
3.9
M 4.0 YES
2. QUANTITATIVE FEATURE
1. BEST SPLIT
(6/8)
GPA <
GENDER GPA LIKES MATH 3.15
Tru Fals
F 3.0
3.0
NO e e
F 3.1
5
YES LIKES LIKES
3.1
5
MATH MATH
F 3.2 YES
3.3
5
F 3.5 NO
3.6
5 1 1 2 2
M 3.8 NO
3.9
M 4.0 YES
2. QUANTITATIVE FEATURE
1. BEST SPLIT
(7/9)
M 3.8 NO
M 4.0 YES
0 1 0 0 2 1 1 1
M 3.7 ?
𝟎 𝟐 𝟏 𝟐 𝟎 𝟐 𝟎 𝟐 𝟐 𝟐 𝟏 𝟐 𝟐 𝟐
𝐆𝐈 = 𝟏 − ൬ ቀ 𝟏ቁ + ቀ 𝟏ቁ ൰ = 0 𝐆𝐈 = 𝟏 − ൬ ቀ 𝟎ቁ + ቀ 𝟎ቁ ൰ = 1 𝐆𝐈 = 𝟏 − ൬ ቀ ቁ + ቀ ቁ ൰ = 0.44 𝐆𝐈 = 𝟏 − ൬ ቀ ቁ + ቀ ቁ ൰ = 0.5
F 2.9 ? 𝟏 𝟏
𝟑 𝟑 𝟐 𝟐
F 3.3 ?
M 4.0 YES
0 1 2 1 1 1
M 3.7 ?
F 2.9 ?
F 3.3 ?
1. BEST SPLIT
(9/9)
GPA <
3.05
GENDER GPA LIKES MATH
M 4.0 YES
0 1 2 1 1 1
M 3.7 ?
F 2.9 ?
• With
F equal chances
3.3 (50%) that ?the new student either likes or dislikes math, we cannot
make a clear classification. This illustrates what we referred to earlier as an imperfect
model, where impurities remain high.
1. BEST SPLIT
(9/9)
GPA <
3.05
GENDER GPA LIKES MATH
M 4.0 YES
0 1 2 1 1 1
M 3.7 TIE
F 3.9 ?
F 3.3 ?
• Since most students in this node (2 out of 3) like math, we can infer that the new student
is likely to like it as well.
1. BEST SPLIT
(9/9)
GPA <
3.05
GENDER GPA LIKES MATH
M 4.0 YES
0 1 2 1 1 1
M 3.7 TIE
F 3.9 YES
However, having only one case per leaf is not
F 2.9 ?
sufficient to generalize classification predictions.
This new student does not like math (pure Therefore, to avoid overfitting, we need to
node).
determine when to stop splitting the tree.
2. STOPPING CRITERIA
(1/4)
• We stop splitting when we achieve • Not all leaves will achieve 100%
100% purity with a sufficient purity.
number of cases in each leaf.
• Therefore, we need to determine
when to stop splitting the tree.
◦ Maximum depth: Define the maximum length of the path from the root to any leaf,
thereby limiting the number of splits and features used in the decision tree.
◦ Minimum number of cases per leaf: Define a threshold for the minimum number of
• For the IEDA course, we will use Scikit-Learn’s default criteria to construct
the best decision tree possible, with the exception of setting “max_depth =
None”.
• “splitter = best” to ensure that only the most important features are considered
for splitting;
• “max_depth = int” (e.g., 5) to limit the depth of the tree and prevent it from
growing until all nodes are pure, which helps avoid overfitting.
BEFORE CHOOSING DT
PART III
BEFORE CHOOSING DT
ADVANTAGES DISADVANTAGES
• Easy to understand, visualize and interpret. • Requires balanced data; can produce biased
results if some classes dominate.
tasks.