0% found this document useful (0 votes)
14 views24 pages

Practical 9 Decision Tree Classification

A decision tree is a non-parametric supervised learning algorithm used for classification and regression, structured in a hierarchical format with nodes and branches. It works by recursively splitting datasets based on feature values using criteria like entropy and information gain to determine the best splits. The document also outlines the process of implementing a decision tree classifier using the Iris dataset in Google Colab, including data loading, training, and evaluation metrics.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views24 pages

Practical 9 Decision Tree Classification

A decision tree is a non-parametric supervised learning algorithm used for classification and regression, structured in a hierarchical format with nodes and branches. It works by recursively splitting datasets based on feature values using criteria like entropy and information gain to determine the best splits. The document also outlines the process of implementing a decision tree classifier using the Iris dataset in Google Colab, including data loading, training, and evaluation metrics.

Uploaded by

easyupload999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Decision Trees

Decision Tree

A decision tree is a non-parametric supervised learning algorithm,


which is utilized for both classification and regression tasks. It has a
hierarchical, tree structure, which consists of a root node, branches,
internal nodes and leaf nodes.
Decision Tree working
As an example, let’s imagine that you were
trying to assess whether or not you should go
surf, you may use the following decision rules
to make a choice:

The leaf nodes represent all the


possible outcomes within the dataset.
Types of Decision Trees
1. ID3
2. C4.5
3. CART
Working Methodology of Decision
Trees
• Decision Trees work by recursively splitting the dataset into subsets
based on feature values, forming a tree-like structure.
• The splitting process is guided by a criterion such as entropy and
information gain.
Entropy example
Entropy for root node

As you can see the entropy for the


parent node is 1. Keep this value
in mind, we’ll use this in the next
steps when calculating the
information gain.
Information Gain
• The next step is to find the information gain (IG), its value also lies
within the range 0–1.
• Information gain helps the tree decide which feature to split on: The
feature that gives maximum information gain.
• We’ll now calculate information gain for every feature, one by one.
• For this iteration, the entropy of the parent node is 1 that we
calculated above.
Information Gain
Information gain for the feature Age
Information gain for feature Mileage
Information gain for Road Tested
feature
Summary for information gain for
each feature

The maximum information gain is for


the feature “Road Tested,” and
therefore we’ll select this to be our first
split feature.
Final classifier, decision tree

1. Since the entropy is zero at


the leaf nodes, we’ll stop
slitting.
2. If this wasn’t the case,
then we would continue to
find the information gain
for the preceding parent
and children nodes, until
any one of the stopping
criteria is met.
The Iris Dataset
• Iris data sets consists of 3 different types of irises’ (Setosa, Versicolour,
and Virginica)
• Four features Sepal Length, Sepal Width, Petal Length and Petal
Width

Load dataset by using the commands

from sklearn import datasets


iris = datasets.load_iris()
Colab Iris Dataset classification
using decision tree
1. Open Google Colab by visiting link
https://colab.research.google.com/
2. Load data
3. Split data into training and test parts
4. Load a decision tree classifier and classification metrics
5. Build model
6. Calculate training and test set accuracies
Step 2 Load Data

Dataset loaded

Get X and y variables


from Data
Split data into training and test part

1. First command loads a train_test_split function from sklearn library


2. Input X and y to this function, and this will return X_train, X_test, y_train,
y_test
3. Test size is 0.2, which means, 80% data for training and 20% for testing
Import Classifier and metrics

1. Classification metrics imported in line 1


2. Decision tree classifier imported in line 4
Fit/ Train a decision tree classifier

1. Decision tree classifier initialised


2. Fitted with training data, X_train, y_train
Computed training accuracy,
precision etc

Get training metrics values


Testing metrics values
Confusion matrix code
Confusion Matrix

You might also like