6. Apply Support Vector Machine to classify the given data set.
Description:
Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for
classification or regression tasks. It is particularly effective in high-dimensional spaces and is
widely used in various fields, including image classification, handwriting recognition,
bioinformatics, and more.
Code:
import [Link] as plt
from sklearn import svm
import numpy as np
import pandas as pd
dataset = pd.read_csv("heart_cleveland_upload.csv")
X = [Link][:, :-1]
y = [Link][:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)
classifier = [Link](kernel='linear', random_state=0)
[Link](X_train, y_train)
y_pred = [Link](X_test)
from [Link] import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)
from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator=classifier, X=X_train, y=y_train, cv=10)
print("Accuracy: {:.2f} %".format([Link]() * 100))
print("Standard Deviation: {:.2f} %".format([Link]() * 100))
Output:
[[20 9]
[ 2 35]]
Accuracy: 84.67 %
Standard Deviation: 7.46 %
7. Apply EM algorithm to cluster a set of data stored in a .CSVfile. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
Description:
EM Algorithm:
EM is a probabilistic algorithm that iteratively assigns data points to clusters based on
probability estimates.
It considers uncertainty in cluster assignments and iteratively refines its estimates.
Well-suited for datasets with hidden or missing information.
k-Means Algorithm:
k-Means is a partitioning algorithm that assigns each data point to the cluster with the nearest
centroid.
It aims to minimize the sum of squared distances between data points and their cluster
centroids.
Sensitive to initial centroid positions and may converge to local optima.
Code:
from sklearn import datasets
import numpy as np
import pandas as pd
from [Link] import KNeighborsRegressor
from sklearn.model_selection import train_test_split
from sklearn import metrics
dataset = datasets.load_diabetes()
X = [Link]
y = [Link]
print("dataset shape", [Link])
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10, random_state=42)
regressor = KNeighborsRegressor(n_neighbors=5).fit(Xtrain, ytrain)
ypred = [Link](Xtest)
i=0
print("\n-------------------------------------------------------------------------")
print('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Difference'))
print("-------------------------------------------------------------------------")
for label in ytest:
print('%-25s %-25s %-25s' % (label, ypred[i], abs(label - ypred[i])))
i += 1
print("-------------------------------------------------------------------------")
print("\nMean Absolute Error:", metrics.mean_absolute_error(ytest, ypred))
print("\nMean Squared Error:", metrics.mean_squared_error(ytest, ypred))
print("\nR2 Score:", metrics.r2_score(ytest, ypred))
Output:
-------------------------------------------------------------------------
Original Label Predicted Label Difference
-------------------------------------------------------------------------
155.0 177.6 22.6
109.0 141.6 32.6
214.0 179.8 34.2
101.0 88.2 12.8
259.0 208.8 50.2
... (more rows)
-------------------------------------------------------------------------
Mean Absolute Error: 40.36
Mean Squared Error: 2706.02
R2 Score: 0.23
8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
Description:
K-Nearest Neighbors (KNN):
KNN is a simple and effective algorithm used for both classification and regression tasks.
For regression (as in this case), it predicts the target variable by averaging the values of its
k-nearest neighbors.
Code:
from [Link] import load_iris
from sklearn import datasets
import numpy as np
import pandas as pd
from [Link] import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
iris = load_iris()
X = [Link]
y = [Link]
print("dataset shape", [Link], [Link])
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)
ypred = [Link](Xtest)
i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n", metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n", metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest, ypred))
print ("-------------------------------------------------------------------------")
Output:
dataset shape (150, 4) (150,)
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
2 2 Correct
0 0 Correct
2 2 Correct
2 2 Correct
1 1 Correct
2 2 Correct
1 1 Correct
1 1 Correct
2 2 Correct
1 1 Correct
2 2 Correct
0 0 Correct
1 2 Wrong
2 2 Correct
2 2 Correct
2 2 Correct
1 1 Correct
0 0 Correct
2 2 Correct
2 2 Correct
1 1 Correct
2 2 Correct
1 1 Correct
0 0 Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[3 0 0]
[0 9 1]
[0 1 11]]
-------------------------------------------------------------------------
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 3
1 0.90 0.90 0.90 10
2 0.92 0.92 0.92 12
accuracy 0.93 25
macro avg 0.94 0.94 0.94 25
weighted avg 0.93 0.93 0.93 25
-------------------------------------------------------------------------
Accuracy of the classifer is 0.92
-------------------------------------------------------------------------