0% found this document useful (0 votes)
7 views

ML Classification

The document discusses loading and preprocessing the Iris dataset for machine learning classification. It divides the data into training and test sets, performs feature scaling, and trains perceptron, logistic regression, and SVM classifiers. It evaluates the accuracy of each model on the test set and discusses concepts like the sigmoid function, Bayes' theorem, and impurity measures.

Uploaded by

mgs181101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

ML Classification

The document discusses loading and preprocessing the Iris dataset for machine learning classification. It divides the data into training and test sets, performs feature scaling, and trains perceptron, logistic regression, and SVM classifiers. It evaluates the accuracy of each model on the test set and discusses concepts like the sigmoid function, Bayes' theorem, and impurity measures.

Uploaded by

mgs181101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

• 1.

Load data sets (Iris)

from sklearn import datasets


import numpy as np
iris=datasets.load_iris()
X=iris.data[:,[2,3]] # only two features
y=iris.target
print(y)
print(X)
print(np.unique(y)) # unique value
• 2. Divide the data to train and test

from sklearn.cross_validation import train_test_split


X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)
print(np.shape(X))
print(np.shape(X_train))
print(np.shape(X_test))
• 3- Feature scaling

from sklearn.preprocessing import StandardScaler


sc=StandardScaler()
sc.fit(X_train)
X_train_std=sc.transform(X_train)
X_test_std=sc.transform(X_test)
• standarization
• 4. Train Perceptron model

from sklearn.linear_model import Perceptron


ppn=Perceptron(max_iter=40,eta0=0.1,random_state=0)
ppn.fit(X_train_std,y_train)# Fit model
y_pred=ppn.predict(X_test_std)# check accurracy
print('misclassified samples: %d'%(y_test!=y_pred).sum())#c
• 5. accuracy

from sklearn.metrics import accuracy_score


print('Accuracy:%.2f'%accuracy_score(y_test,y_pred))

6. Plot Decision Region


plot_decision_regions(X=X_combined_std,y=y_combined,
classifier=ppn,test_idx=range(105,150))
plt.xlabel('petal length [standardized]')
plt.ylabel('petal width [standardized]')
plt.legend(loc='upper left')
plt.show()
Sigmoid function in Python
import matplotlib.pyplot as plt
import numpy as np
def sigmoid(z):
return 1.0 / (1.0 + np.exp(-z))
z = np.arange(-7, 7, 0.1)
phi_z = sigmoid(z)
plt.plot(z, phi_z)
plt.axvline(0.0, color='k')
plt.axhspan(0.0, 1.0, facecolor='1.0', alpha=1.0, ls='dotted')
plt.axhline(y=0.5, ls='dotted', color='k')
plt.yticks([0.0, 0.5, 1.0])
plt.ylim(-0.1, 1.1)
plt.xlabel('z')
plt.ylabel('$\phi (z)$')
plt.show()
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(C=1000.0, random_state=0)
lr.fit(X_train_std, y_train)
y_pred=lr.predict(X_test_std)
print('misclassified samples: %d'%(y_test!=y_pred).sum())#compute
from sklearn.metrics import accuracy_score
print('Accuracy:%.2f'%accuracy_score(y_test,y_pred))

Pridict Class probability of 1st sample


x=lr.predict_proba(X_test_std[0:1,:])
print("%f %f %f"%(x[0,0],x[0,1],x[0,2]))
from sklearn.svm import SVC
svm = SVC(kernel='linear', C=1, random_state=0)
svm.fit(X_train_std, y_train)
y_pred=svm.predict(X_test_std)
print('misclassified samples: %d'%(y_test!=y_pred).sum())#compute
from sklearn.metrics import accuracy_score
print('Accuracy:%.2f'%accuracy_score(y_test,y_pred))
• Bayes theorem provides a way of calculating
posterior probability P(c|x) from P(c), P(x) and
P(x|c). Look at the equation below:

Above,
P(c|x) is the posterior probability of class (c, target)
given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.
How Naive Bayes algorithm works?
• Step 1: Convert the data set into a frequency
table
• Step 2: Create Likelihood table by finding the
probabilities
• Step 3: Now, use Naive Bayesian equation to
calculate the posterior probability for each
class.
Players will play if weather is sunny. Is this statement is
correct?
implementation

import numpy as np
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
Y = np.array([1, 1, 1, 2, 2, 2])
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
GaussianNB(priors=None)
clf.fit(X, Y)
print(clf.predict([[1, 1]]))
print(clf.predict([[-1, -1]]))
y=clf.predict(X)
print('misclassified samples: %d'%(y!=Y).sum())#compute
from sklearn.metrics import accuracy_score
print('Accuracy:%.2f'%accuracy_score(y,Y))
• BEYOND SYLLABUS
Example for impurity
Example for information gain
import matplotlib.pyplot as plt
import numpy as np
def gini(p):
return (p)*(1 - (p)) + (1 - p)*(1 - (1-p))
def entropy(p):
return - p*np.log2(p) - (1 - p)*np.log2((1 - p))
def error(p):
return 1 - np.max([p, 1 - p])
x = np.arange(0.0, 1.0, 0.01)
ent = [entropy(p) if p != 0 else None for p in x]
err = [error(i) for i in x]
fig = plt.figure()
ax = plt.subplot(111)
for i, lab, ls, c, in zip([ent,gini(x), err],
['Entropy','Gini Impurity','Misclass Error'],
['-','--', '-.'],
['black','red', 'green']):
line = ax.plot(x, i, label=lab,
linestyle=ls, lw=2, color=c)

ax.legend()
ax.axhline(y=0.5, linewidth=1, color='k',
linestyle='--')
ax.axhline(y=1.0, linewidth=1, color='k',
linestyle='--')
plt.ylim([0, 1.1])
plt.xlabel('p(i=1)')
plt.ylabel('Impurity Index')
plt.show()
from sklearn.tree import export_graphviz
export_graphviz(tree,
out_file='tree.dot',
feature_names=['petal length', 'petal width'])

https://www.coolutils.com/online/DOT-to-P
NG
!! !
ou
k Y
a n
T h

You might also like