0% found this document useful (0 votes)
3 views5 pages

Expt7 ML2025 250306 143857

The document outlines a program to implement a decision tree classifier using the Iris dataset, displaying the decision tree and calculating its accuracy. It also includes a section on implementing a decision tree regression model based on gender and height data. Additionally, it suggests assignments for further tasks, including calculating a confusion matrix and using a digits dataset for classification.

Uploaded by

harshitha.hegde5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Expt7 ML2025 250306 143857

The document outlines a program to implement a decision tree classifier using the Iris dataset, displaying the decision tree and calculating its accuracy. It also includes a section on implementing a decision tree regression model based on gender and height data. Additionally, it suggests assignments for further tasks, including calculating a confusion matrix and using a digits dataset for classification.

Uploaded by

harshitha.hegde5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Expt:7

Aim: Program to implement a decision tree and display it

(a) Program to implement decision tree as a classifier

#PROGRAM To Implement a Decision TREE AND to DISPLAY IT


print('PROGRAM To Implement a Decision Tree AND to DISPLAY IT')
from sklearn import datasets
import pandas as pd
iris=datasets.load_iris()
# df will fold dataset as a table
df=pd.DataFrame(
iris.data,
columns=iris.feature_names
)
#labels are assigned to df[target] table or array
df['target']=pd.Series(
iris.target
)
from sklearn.model_selection import train_test_split
# Train Test Split Ratio
df_train,df_test=train_test_split(df,test_size=0.3)
df['target_names']=df['target'].apply(lambda y:iris.target_names[y])
print('Number of Training samples')
print(df_train.shape[0])
print('Number of Testing samples')
print(df_test.shape[0])
#Importing Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf=DecisionTreeClassifier()
x_train=df_train[iris.feature_names]
x_test=df_test[iris.feature_names]
y_train=df_train['target']
y_test=df_test['target']
#Training Decision Tree Classifier
clf.fit(x_train,y_train)
#Testing the data
y_test_pred=clf.predict(x_test)
print('Class of Testing Samples')
print(y_test_pred)
#To display the decision tree in command shell
from sklearn.tree import export_text
from sklearn import tree
from matplotlib import pyplot as plt
text_representation = tree.export_text(clf)
print(text_representation)
with open("decistion_tree.log", "w") as fout:
fout.write(text_representation)
fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(clf,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True)
fig.savefig("decistion_tree.png")

OUTPUT
PROGRAM To Implement a Decision Tree AND to DISPLAY IT
Number of Training samples
105
Number of Testing samples
45
Class of Testing Samples
[1 0 2 2 0 2 0 0 0 2 0 2 1 1 2 0 0 2 1 0 1 0 1 1 2 1 2 2 2 0 1 1 1 1 0 2 2
1 1 0 1 0 2 2 0]
|--- feature_2 <= 2.45
| |--- class: 0
|--- feature_2 > 2.45
| |--- feature_3 <= 1.70
| | |--- feature_2 <= 4.95
| | | |--- class: 1
| | |--- feature_2 > 4.95
| | | |--- feature_3 <= 1.55
| | | | |--- class: 2
| | | |--- feature_3 > 1.55
| | | | |--- class: 1
| |--- feature_3 > 1.70
| | |--- class: 2

(b) Program to display decision tree and calculate its accuracy

#PROGRAM To Calculate Accuracy of Decision Tree


print('PROGRAM To Calculate Accuracy of Decision Tree')
from sklearn import datasets
import pandas as pd
iris=datasets.load_iris()
# df will fold dataset as a table
df=pd.DataFrame( iris.data, columns=iris.feature_names )
#labels are assigned to df[target] table or array
df['target']=pd.Series( iris.target )
from sklearn.model_selection import train_test_split
# Train Test Split Ratio
df_train,df_test=train_test_split(df,test_size=0.3)
df['target_names']=df['target'].apply(lambda y:iris.target_names[y])
print('Number of Training samples')
print(df_train.shape[0])
print('Number of Testing samples')
print(df_test.shape[0])
#Importing Decision Tree Classifier
from sklearn.tree import DecisionTreeClassifier
clf=DecisionTreeClassifier()
x_train=df_train[iris.feature_names]
x_test=df_test[iris.feature_names]
y_train=df_train['target']
y_test=df_test['target']
#Training Decision Tree Classifier
clf.fit(x_train,y_train)
#Testing the data
y_test_pred=clf.predict(x_test)
print('Class of Testing Samples')
print(y_test_pred)
from sklearn.metrics import accuracy_score
x=accuracy_score(y_test,y_test_pred)
print('Accuracy')
print(x)

OUTPUT

PROGRAM To Calculate Accuracy of Decision Tree


Number of Training samples
105
Number of Testing samples
45
Class of Testing Samples
[1 1 1 0 1 1 1 0 2 0 2 0 1 2 2 0 0 2 2 2 1 1 1 1 1 1 1 0 2 1 1 1 1 0 2 2 2
2 2 2 2 2 0 0 2]
Accuracy
0.9333333333333333

(c) Program to implement decision tree as a regression model

#Program to implement Decision Tree Regression


import numpy as np
n=200
#200 samples
height_pop1_f=np.random.normal(loc=155,scale=10,size=n)
height_pop1_m=np.random.normal(loc=175,scale=5,size=n)
height_pop2_f=np.random.normal(loc=165,scale=10,size=n)
height_pop2_m=np.random.normal(loc=185,scale=5,size=n)
height_f=np.concatenate([height_pop1_f,height_pop2_f])
height_m=np.concatenate([height_pop1_m,height_pop2_m])
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.tree import export_text
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
df_height=pd.DataFrame( { 'Gender':[1 for i in range(height_f.size)]+
[2 for i in range(height_m.size)],
'Height':np.concatenate((height_f,height_m))} )
# to calculate mean and median of height
df_height.groupby('Gender')[['Height']].agg([np.mean,np.median]).round(
1)
df_train,df_test=train_test_split(df_height,test_size=0.3)
x_train,x_test=df_train[['Gender']],df_test[['Gender']]
y_train,y_test=df_train['Height'],df_test['Height']
print('Training Samples')
print(df_train)
print('Testing Samples')
print(df_test)
for criterion in['squared_error','absolute_error']:
rgrsr=DecisionTreeRegressor(criterion=criterion)
rgrsr.fit(x_train,y_train)
print(f'criterion={criterion}:\n')
print(export_text(rgrsr,feature_names=['Gender'],spacing=3,decimals=1))
print('Program Executed successfully')

OUTPUT

Training Samples
Gender Height
585 2 183.343569
590 2 171.482107
740 2 188.868396
633 2 182.644465
515 2 178.841086
.. ... ...
336 1 136.035596
694 2 180.998702
449 2 176.484152
332 1 164.418790
299 1 186.458330

[560 rows x 2 columns]


Testing Samples
Gender Height
156 1 146.834942
216 1 156.431498
58 1 155.297595
649 2 188.886167
350 1 151.916364
.. ... ...
204 1 158.452720
654 2 180.298237
186 1 145.174562
140 1 174.128847
157 1 134.886183

[240 rows x 2 columns]


criterion=absolute_error:

|--- Gender <= 1.5


| |--- value: [159.4]
|--- Gender > 1.5
| |--- value: [179.9]

Assignment

1) Write a program to calculate the confusion matrix and classification report for the decision
tree
2) Implement a decision tree as a classifier using digits dataset

You might also like