0% found this document useful (0 votes)
22 views

SVM Implementation

SVM_Implementation
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

SVM Implementation

SVM_Implementation
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Python Implementation of Support Vector Machine

Now we will implement the SVM algorithm using Python. Here we will use
the same dataset user_data, which we have used in Logistic regression
and KNN classification.

o Data Pre-processing step

Till the Data pre-processing step, the code will remain the same. Below is
the code:

1. #Data Pre-processing Step


2. # importing libraries
3. import numpy as nm
4. import matplotlib.pyplot as mtp
5. import pandas as pd
6.
7. #importing datasets
8. data_set= pd.read_csv('user_data.csv')
9.
10. #Extracting Independent and dependent Variable
11. x= data_set.iloc[:, [2,3]].values
12. y= data_set.iloc[:, 4].values
13.
14. # Splitting the dataset into training and test set.
15. from sklearn.model_selection import train_test_split
16. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, r
andom_state=0)
17. #feature Scaling
18. from sklearn.preprocessing import StandardScaler
19. st_x= StandardScaler()
20. x_train= st_x.fit_transform(x_train)
21. x_test= st_x.transform(x_test)

After executing the above code, we will pre-process the data. The code will
give the dataset as:
The scaled output for the test set will be:
Fitting the SVM classifier to the training set:

Now the training set will be fitted to the SVM classifier. To create the SVM
classifier, we will import SVC class from Sklearn.svm library. Below is the
code for it:

1. from sklearn.svm import SVC # "Support vector classifier"


2. classifier = SVC(kernel='linear', random_state=0)
3. classifier.fit(x_train, y_train)

In the above code, we have used kernel='linear', as here we are creating


SVM for linearly separable data. However, we can change it for non-linear
data. And then we fitted the classifier to the training dataset(x_train,
y_train)

Output:

Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
kernel='linear', max_iter=-1, probability=False, random_state=0,
shrinking=True, tol=0.001, verbose=False)

The model performance can be altered by changing the value


of C(Regularization factor), gamma, and kernel.

o Predicting the test set result:


Now, we will predict the output for test set. For this, we will create a
new vector y_pred. Below is the code for it:

1. #Predicting the test set result


2. y_pred= classifier.predict(x_test)

After getting the y_pred vector, we can compare the result


of y_pred and y_test to check the difference between the actual value and
predicted value.

Output: Below is the output for the prediction of the test set:
o Creating the confusion matrix:
Now we will see the performance of the SVM classifier that how many
incorrect predictions are there as compared to the Logistic regression
classifier. To create the confusion matrix, we need to import
the confusion_matrix function of the sklearn library. After importing
the function, we will call it using a new variable cm. The function
takes two parameters, mainly y_true( the actual values)
and y_pred (the targeted value return by the classifier). Below is the
code for it:

1. #Creating the Confusion matrix


2. from sklearn.metrics import confusion_matrix
3. cm= confusion_matrix(y_test, y_pred)

Output:
As we can see in the above output image, there are 66+24= 90 correct
predictions and 8+2= 10 correct predictions. Therefore we can say that our
SVM model improved as compared to the Logistic regression model.

o Visualizing the training set result:


Now we will visualize the training set result, below is the code for it:

1. from matplotlib.colors import ListedColormap


2. x_set, y_set = x_train, y_train
3. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
4. nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
5. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()])
.T).reshape(x1.shape),
6. alpha = 0.75, cmap = ListedColormap(('red', 'green')))
7. mtp.xlim(x1.min(), x1.max())
8. mtp.ylim(x2.min(), x2.max())
9. for i, j in enumerate(nm.unique(y_set)):
10. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
11. c = ListedColormap(('red', 'green'))(i), label = j)
12. mtp.title('SVM classifier (Training set)')
13. mtp.xlabel('Age')
14. mtp.ylabel('Estimated Salary')
15. mtp.legend()
16. mtp.show()

Output:

By executing the above code, we will get the output as:

As we can see, the above output is appearing similar to the Logistic


regression output. In the output, we got the straight line as hyperplane
because we have used a linear kernel in the classifier. And we have also
discussed above that for the 2d space, the hyperplane in SVM is a straight
line.

o Visualizing the test set result:

1. #Visulaizing the test set result


2. from matplotlib.colors import ListedColormap
3. x_set, y_set = x_test, y_test
4. x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() -
1, stop = x_set[:, 0].max() + 1, step =0.01),
5. nm.arange(start = x_set[:, 1].min() -
1, stop = x_set[:, 1].max() + 1, step = 0.01))
6. mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(), x2.ravel()])
.T).reshape(x1.shape),
7. alpha = 0.75, cmap = ListedColormap(('red','green' )))
8. mtp.xlim(x1.min(), x1.max())
9. mtp.ylim(x2.min(), x2.max())
10. for i, j in enumerate(nm.unique(y_set)):
11. mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
12. c = ListedColormap(('red', 'green'))(i), label = j)
13. mtp.title('SVM classifier (Test set)')
14. mtp.xlabel('Age')
15. mtp.ylabel('Estimated Salary')
16. mtp.legend()
17. mtp.show()

Output:

By executing the above code, we will get the output as:

As we can see in the above output image, the SVM classifier has divided
the users into two regions (Purchased or Not purchased). Users who
purchased the SUV are in the red region with the red scatter points. And
users who did not purchase the SUV are in the green region with green
scatter points. The hyperplane has divided the two classes into Purchased
and not purchased variable.

You might also like