A decision tree is a non-parametric supervised learning algorithm used for classification and regression, structured in a hierarchical format with nodes and branches. It works by recursively splitting datasets based on feature values using criteria like entropy and information gain to determine the best splits. The document also outlines the process of implementing a decision tree classifier using the Iris dataset in Google Colab, including data loading, training, and evaluation metrics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
14 views24 pages
Practical 9 Decision Tree Classification
A decision tree is a non-parametric supervised learning algorithm used for classification and regression, structured in a hierarchical format with nodes and branches. It works by recursively splitting datasets based on feature values using criteria like entropy and information gain to determine the best splits. The document also outlines the process of implementing a decision tree classifier using the Iris dataset in Google Colab, including data loading, training, and evaluation metrics.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24
Decision Trees
Decision Tree
A decision tree is a non-parametric supervised learning algorithm,
which is utilized for both classification and regression tasks. It has a hierarchical, tree structure, which consists of a root node, branches, internal nodes and leaf nodes. Decision Tree working As an example, let’s imagine that you were trying to assess whether or not you should go surf, you may use the following decision rules to make a choice:
The leaf nodes represent all the
possible outcomes within the dataset. Types of Decision Trees 1. ID3 2. C4.5 3. CART Working Methodology of Decision Trees • Decision Trees work by recursively splitting the dataset into subsets based on feature values, forming a tree-like structure. • The splitting process is guided by a criterion such as entropy and information gain. Entropy example Entropy for root node
As you can see the entropy for the
parent node is 1. Keep this value in mind, we’ll use this in the next steps when calculating the information gain. Information Gain • The next step is to find the information gain (IG), its value also lies within the range 0–1. • Information gain helps the tree decide which feature to split on: The feature that gives maximum information gain. • We’ll now calculate information gain for every feature, one by one. • For this iteration, the entropy of the parent node is 1 that we calculated above. Information Gain Information gain for the feature Age Information gain for feature Mileage Information gain for Road Tested feature Summary for information gain for each feature
The maximum information gain is for
the feature “Road Tested,” and therefore we’ll select this to be our first split feature. Final classifier, decision tree
1. Since the entropy is zero at
the leaf nodes, we’ll stop slitting. 2. If this wasn’t the case, then we would continue to find the information gain for the preceding parent and children nodes, until any one of the stopping criteria is met. The Iris Dataset • Iris data sets consists of 3 different types of irises’ (Setosa, Versicolour, and Virginica) • Four features Sepal Length, Sepal Width, Petal Length and Petal Width
Load dataset by using the commands
from sklearn import datasets
iris = datasets.load_iris() Colab Iris Dataset classification using decision tree 1. Open Google Colab by visiting link https://colab.research.google.com/ 2. Load data 3. Split data into training and test parts 4. Load a decision tree classifier and classification metrics 5. Build model 6. Calculate training and test set accuracies Step 2 Load Data
Dataset loaded
Get X and y variables
from Data Split data into training and test part
1. First command loads a train_test_split function from sklearn library
2. Input X and y to this function, and this will return X_train, X_test, y_train, y_test 3. Test size is 0.2, which means, 80% data for training and 20% for testing Import Classifier and metrics
1. Classification metrics imported in line 1
2. Decision tree classifier imported in line 4 Fit/ Train a decision tree classifier
1. Decision tree classifier initialised
2. Fitted with training data, X_train, y_train Computed training accuracy, precision etc