10. Decistion Tree.pptx
10. Decistion Tree.pptx
Sadia Islam
Assistant Professor
Department of Computer Science and engineering
United International University
Decision Tree
Decision tree (DT) induction is the learning of decision trees from class-labeled
training instances, which is a top-down recursive divide and conquer algorithm.
Advantages:
● Simple to understand.
● Easy to implement.
● Requiring little prior knowledge.
● Able to handle both numerical and categorical data.
● Robust.
● Dealing with large and noisy datasets.
● Nonlinear relationships between features do not affect the tree performance.
Decision Tree
A decision tree works by breaking down a dataset into smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree with decision nodes
and leaf nodes. Process:
● Root Node: This is the topmost node in a decision tree. It represents the entire dataset
● Splitting: This involves dividing a node into two or more sub-nodes. The split is based on
features in the data set. The aim is to ensure that the resulting sub-nodes are as pure
(homogeneous) as possible.
● Decision Nodes: They represent features in the dataset and the possible values they can
take and gets divided based on certain conditions.
● Leaf/Terminal Nodes: These nodes represent the final output or decision. They are the
nodes where no further splitting is possible or necessary.
● Pruning: Done to reduce the size of the tree.
● Predicting: The final part, predicting values of the test data.
Decision Tree
Building a Decision Tree: Different algorithms can be used like ID3, C4.5, or
CART.
Iterative Dichotomiser 3 (ID3)
● Calculate Entropy of the Target Variable
○ Calculate the randomness
● Calculate Information Gain for Each Attribute
● Select the Best Attribute for the Root Node
○ Choose the attribute with the highest information gain as the root node.
● Split the Dataset
○ Split the dataset into subsets based on the values of the chosen attribute
● Recursively Build the Tree
○ Continue this process until all instances in a subset belong to the same class or no more
attributes are left to split on
● Classify New Instances
Iterative Dichotomiser 3 (ID3)
● Information gain of an attribute =
Where, pk and nk are the number of positive and negative instances respectively for attribute
value = k and d is the total number of values for the attribute
Example - Dataset
ID3 - Information Gain calculation
Infooutput= , Information gain of target variable
InfoOutlook
Similarly, find GR for all the attributes and select the one with the highest value to split the data.
CART( Classification And Regression Trees)
CART( Classification And Regression Trees) is a variation of the decision tree
algorithm. It can handle both classification and regression tasks.
The Gini index is a metric for the classification tasks in CART. It stores the sum of
squared probabilities of each class. It computes the degree of probability of a
specific variable that is wrongly being classified when chosen randomly and a
variation of the Gini coefficient.
CART( Classification And Regression Trees)
Splitting Criteria- The CART algorithm evaluates all potential splits at every node
and chooses the one that best decreases the Gini impurity of the resultant
subsets. This process continues until a stopping criterion is reached, like a
maximum tree depth or a minimum number of instances in a leaf node.
CART for Regression
Residual Reduction- Residual reduction is a measure of how much the average
squared difference between the predicted values and the actual values for the
target variable is reduced by splitting the subset. The lower the residual reduction,
the better the model fits the data.
Splitting Criteria- CART evaluates every possible split at each node and selects
the one that results in the greatest reduction of residual error in the resulting
subsets. This process is repeated until a stopping criterion is met, such as
reaching the maximum tree depth or having too few instances in a leaf node.
CART Example
● Left Node (Study Hours ≤ 6.5): (3, 40, Fail)(4, 50, Fail)(6, 65, Pass)
● Right Node (Study Hours > 6.5): (8, 78, Pass)(10, 85, Pass)
CART Example
2. Calculate Gini Impurity
Left Node:
Probability of Fail = ⅔
Probability of Pass = ⅓
Right Node:
Gini = 1 - (1)^2 = 0
CART Example
Tree