0% found this document useful (0 votes)
4 views

10. Decistion Tree.pptx

The document provides an overview of decision tree induction, detailing its advantages, structure, and the algorithms used for building decision trees, such as ID3, C4.5, and CART. It explains the processes involved in creating decision trees, including splitting, pruning, and predicting outcomes, as well as the calculation of information gain and Gini impurity. Additionally, it includes examples to illustrate the application of these algorithms in classification and regression tasks.

Uploaded by

mohirdhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

10. Decistion Tree.pptx

The document provides an overview of decision tree induction, detailing its advantages, structure, and the algorithms used for building decision trees, such as ID3, C4.5, and CART. It explains the processes involved in creating decision trees, including splitting, pruning, and predicting outcomes, as well as the calculation of information gain and Gini impurity. Additionally, it includes examples to illustrate the application of these algorithms in classification and regression tasks.

Uploaded by

mohirdhaka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Decision Tree

Sadia Islam
Assistant Professor
Department of Computer Science and engineering
United International University
Decision Tree
Decision tree (DT) induction is the learning of decision trees from class-labeled
training instances, which is a top-down recursive divide and conquer algorithm.
Advantages:
● Simple to understand.
● Easy to implement.
● Requiring little prior knowledge.
● Able to handle both numerical and categorical data.
● Robust.
● Dealing with large and noisy datasets.
● Nonlinear relationships between features do not affect the tree performance.
Decision Tree
A decision tree works by breaking down a dataset into smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree with decision nodes
and leaf nodes. Process:
● Root Node: This is the topmost node in a decision tree. It represents the entire dataset
● Splitting: This involves dividing a node into two or more sub-nodes. The split is based on
features in the data set. The aim is to ensure that the resulting sub-nodes are as pure
(homogeneous) as possible.
● Decision Nodes: They represent features in the dataset and the possible values they can
take and gets divided based on certain conditions.
● Leaf/Terminal Nodes: These nodes represent the final output or decision. They are the
nodes where no further splitting is possible or necessary.
● Pruning: Done to reduce the size of the tree.
● Predicting: The final part, predicting values of the test data.
Decision Tree
Building a Decision Tree: Different algorithms can be used like ID3, C4.5, or
CART.
Iterative Dichotomiser 3 (ID3)
● Calculate Entropy of the Target Variable
○ Calculate the randomness
● Calculate Information Gain for Each Attribute
● Select the Best Attribute for the Root Node
○ Choose the attribute with the highest information gain as the root node.
● Split the Dataset
○ Split the dataset into subsets based on the values of the chosen attribute
● Recursively Build the Tree
○ Continue this process until all instances in a subset belong to the same class or no more
attributes are left to split on
● Classify New Instances
Iterative Dichotomiser 3 (ID3)
● Information gain of an attribute =

Information of the output variables - information of an attribute

● Here, we are expressing information in terms of entropy(randomness), the


less entropy an attribute has, the more effective the attribute is in terms of
splitting the data towards more decisive output, the more gain it has
Iterative Dichotomiser 3 (ID3)
● Entropy for any random variable having k values

● Binary random variable


Iterative Dichotomiser 3 (ID3)
Information Gain of an attribute A = Entropy of output variable - Entropy of that attribute
IG(A) = Infooutput- InfoA
Infooutput= Entropyoutput =
Where, p is the number of positive cases and n is the number of negative cases
InfoA= EntropyA =

Where, pk and nk are the number of positive and negative instances respectively for attribute
value = k and d is the total number of values for the attribute
Example - Dataset
ID3 - Information Gain calculation
Infooutput= , Information gain of target variable

Information(entropy) of the attribute “Outlook”

InfoOutlook

Information Gain = Infooutput-Infooutlook = 0.940 − 0.694 = 0.246


ID3
Same way, the IG of Outlook = 0.246, Temperature = 0.086, Humidity = 0.154, and
Wind = 0.197.

So, we will divide the dataset based on Outlook first.


ID3
ID3
Decision Tree
C4.5
C4.5 - example for the same dataset
Steps:

● Calculate Information Gain for each attribute.


● Calculate Split Information for each attribute.
● Calculate Gain Ratio and select the attribute with the highest Gain Ratio for
splitting.
C4.5 - example for the same dataset
● Step 1 is already complete, IG(S, Outlook) = 0.246
● Look at the previous slides

C4.5 - example for the same dataset
● Step 2: Split Information for ”Outlook”
C4.5 - example for the same dataset
● Step 3: Gain Ratio for ”Outlook”

Similarly, find GR for all the attributes and select the one with the highest value to split the data.
CART( Classification And Regression Trees)
CART( Classification And Regression Trees) is a variation of the decision tree
algorithm. It can handle both classification and regression tasks.

Gini index/Gini impurity

The Gini index is a metric for the classification tasks in CART. It stores the sum of
squared probabilities of each class. It computes the degree of probability of a
specific variable that is wrongly being classified when chosen randomly and a
variation of the Gini coefficient.
CART( Classification And Regression Trees)

The degree of the Gini index varies from 0 to 1,


● Where 0 depicts that all the elements are allied to a certain class, or only one
class exists there.
● Gini index close to 1 means a high level of impurity, where each class
contains a very small fraction of elements, and
● A value of 1-1/n occurs when the elements are uniformly distributed into n
classes and each class has an equal probability of 1/n. For example, with two
classes, the Gini impurity is 1 – 1/2 = 0.5.
CART for Classification
Gini Impurity- Gini impurity measures the probability of misclassifying a random
instance from a subset labeled according to the majority class. Lower Gini impurity
means more purity of the subset.

Splitting Criteria- The CART algorithm evaluates all potential splits at every node
and chooses the one that best decreases the Gini impurity of the resultant
subsets. This process continues until a stopping criterion is reached, like a
maximum tree depth or a minimum number of instances in a leaf node.
CART for Regression
Residual Reduction- Residual reduction is a measure of how much the average
squared difference between the predicted values and the actual values for the
target variable is reduced by splitting the subset. The lower the residual reduction,
the better the model fits the data.

Splitting Criteria- CART evaluates every possible split at each node and selects
the one that results in the greatest reduction of residual error in the resulting
subsets. This process is repeated until a stopping criterion is met, such as
reaching the maximum tree depth or having too few instances in a leaf node.
CART Example

Study Hours Attendance (%) Final Grade (Pass/Fail)


10 85 Pass
8 78 Pass
4 50 Fail
6 65 Pass
3 40 Fail

1. Choose the Splitting Attribute.

Considering the Study Hours = 6.5 as mid to split.

● Left Node (Study Hours ≤ 6.5): (3, 40, Fail)(4, 50, Fail)(6, 65, Pass)
● Right Node (Study Hours > 6.5): (8, 78, Pass)(10, 85, Pass)
CART Example
2. Calculate Gini Impurity

Left Node:

Probability of Fail = ⅔

Probability of Pass = ⅓

Gini = 1 - (2/3)^2 - (1/3)^2 ≈ 0.44

Right Node:

Probability of Pass = 2/2 = 1

Gini = 1 - (1)^2 = 0
CART Example

Tree

Study Hours ≤ 6.5


/ \
Fail/pass Pass
(3, 4,6) (8, 10)
Arigato Gujaimas

You might also like