0% found this document useful (0 votes)
5 views27 pages

ML Manual

The document outlines the implementation of various machine learning algorithms including Support Vector Machine (SVM), EM algorithm, k-Means, and k-Nearest Neighbors (KNN) for classification and regression tasks. It provides code examples for each algorithm using Python libraries such as sklearn and pandas, along with performance metrics like accuracy, confusion matrix, and error rates. The document emphasizes the effectiveness of these algorithms in different applications, such as image classification and medical data analysis.

Uploaded by

ypragathi-1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views27 pages

ML Manual

The document outlines the implementation of various machine learning algorithms including Support Vector Machine (SVM), EM algorithm, k-Means, and k-Nearest Neighbors (KNN) for classification and regression tasks. It provides code examples for each algorithm using Python libraries such as sklearn and pandas, along with performance metrics like accuracy, confusion matrix, and error rates. The document emphasizes the effectiveness of these algorithms in different applications, such as image classification and medical data analysis.

Uploaded by

ypragathi-1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

6. Apply Support Vector Machine to classify the given data set.

Description:
Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for
classification or regression tasks. It is particularly effective in high-dimensional spaces and is
widely used in various fields, including image classification, handwriting recognition,
bioinformatics, and more.

Code:
import [Link] as plt
from sklearn import svm
import numpy as np
import pandas as pd

dataset = pd.read_csv("heart_cleveland_upload.csv")
X = [Link][:, :-1]
y = [Link][:, -1].values

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

classifier = [Link](kernel='linear', random_state=0)


[Link](X_train, y_train)
y_pred = [Link](X_test)

from [Link] import confusion_matrix


cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

from sklearn.model_selection import cross_val_score


accuracies = cross_val_score(estimator=classifier, X=X_train, y=y_train, cv=10)
print("Accuracy: {:.2f} %".format([Link]() * 100))
print("Standard Deviation: {:.2f} %".format([Link]() * 100))

Output:

[[20 9]
[ 2 35]]

Accuracy: 84.67 %
Standard Deviation: 7.46 %
7. Apply EM algorithm to cluster a set of data stored in a .CSVfile. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.

Description:

EM Algorithm:
EM is a probabilistic algorithm that iteratively assigns data points to clusters based on
probability estimates.
It considers uncertainty in cluster assignments and iteratively refines its estimates.
Well-suited for datasets with hidden or missing information.

k-Means Algorithm:
k-Means is a partitioning algorithm that assigns each data point to the cluster with the nearest
centroid.
It aims to minimize the sum of squared distances between data points and their cluster
centroids.
Sensitive to initial centroid positions and may converge to local optima.

Code:
from sklearn import datasets
import numpy as np
import pandas as pd
from [Link] import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics

dataset = datasets.load_diabetes()
X = [Link]
y = [Link]
print("dataset shape", [Link])

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10, random_state=42)

regressor = KNeighborsRegressor(n_neighbors=5).fit(Xtrain, ytrain)


ypred = [Link](Xtest)

i=0
print("\n-------------------------------------------------------------------------")
print('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Difference'))
print("-------------------------------------------------------------------------")
for label in ytest:
print('%-25s %-25s %-25s' % (label, ypred[i], abs(label - ypred[i])))
i += 1
print("-------------------------------------------------------------------------")
print("\nMean Absolute Error:", metrics.mean_absolute_error(ytest, ypred))
print("\nMean Squared Error:", metrics.mean_squared_error(ytest, ypred))
print("\nR2 Score:", metrics.r2_score(ytest, ypred))

Output:

-------------------------------------------------------------------------
Original Label Predicted Label Difference
-------------------------------------------------------------------------
155.0 177.6 22.6
109.0 141.6 32.6
214.0 179.8 34.2
101.0 88.2 12.8
259.0 208.8 50.2
... (more rows)
-------------------------------------------------------------------------

Mean Absolute Error: 40.36

Mean Squared Error: 2706.02

R2 Score: 0.23

8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

Description:

K-Nearest Neighbors (KNN):


KNN is a simple and effective algorithm used for both classification and regression tasks.
For regression (as in this case), it predicts the target variable by averaging the values of its
k-nearest neighbors.

Code:
from [Link] import load_iris
from sklearn import datasets
import numpy as np
import pandas as pd
from [Link] import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
iris = load_iris()
X = [Link]
y = [Link]
print("dataset shape", [Link], [Link])
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)
ypred = [Link](Xtest)

i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")

print("\nConfusion Matrix:\n", metrics.confusion_matrix(ytest, ypred))


print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n", metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")

print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest, ypred))


print ("-------------------------------------------------------------------------")

Output:
dataset shape (150, 4) (150,)
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
2 2 Correct
0 0 Correct
2 2 Correct
2 2 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
2 2 Correct
1 1 Correct
2 2 Correct
0 0 Correct
1 2 Wrong
2 2 Correct
2 2 Correct
2 2 Correct
1 1 Correct
0 0 Correct
2 2 Correct
2 2 Correct
1 1 Correct
2 2 Correct
1 1 Correct
0 0 Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[3 0 0]
[0 9 1]
[0 1 11]]
-------------------------------------------------------------------------
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 3


1 0.90 0.90 0.90 10
2 0.92 0.92 0.92 12

accuracy 0.93 25
macro avg 0.94 0.94 0.94 25
weighted avg 0.93 0.93 0.93 25
-------------------------------------------------------------------------

Accuracy of the classifer is 0.92


-------------------------------------------------------------------------

You might also like