Understanding Support Vector Machine Algorithm From Examples Along With Code
Understanding Support Vector Machine Algorithm From Examples Along With Code
SHARE
Sunil Ray , October 6, 2015 / 21
Introduction
Mastering machine learning algorithms isn’t a myth at all. Most of the beginners start by learning regression.
It is simple to learn and use, but does that solve our purpose? Of course not! Because, you can do so much
more than just Regression!
Think of machine learning algorithms as an armory packed with axes, sword, blades, bow, dagger etc. You
have various tools, but you ought to learn to use them at the right time. As an analogy, think of ‘Regression’
as a sword capable of slicing and dicing data efficiently, but incapable of dealing with highly complex data.
On the contrary, ‘Support Vector Machines’ is like a sharp knife – it works on smaller datasets, but on them,
it can be much more stronger and powerful in building models.
By now, I hope you’ve now mastered Random Forest, Naive Bayes Algorithm and Ensemble Modeling. If
not, I’d suggest you to take out few minutes and read about them as well. In this article, I shall guide you
through the basics to advanced knowledge of a crucial machine learning algorithm, support vector
machines.
1/11
Table of Contents
1. What is Support Vector Machine?
2. How does it work?
3. How to implement SVM in Python?
4. How to tune Parameters of SVM?
5. Pros and Cons associated with SVM
Let’s understand:
Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and C). Now,
identify the right hyper-plane to classify star and circle.
2/11
You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane which
segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this
job.
Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and C) and all
are segregating the classes well. Now, How can we identify the right hyper-plane?
Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in previous section to
identify the right hyper-plane
3/11
Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is
the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin.
Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-
plane is A.
Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two classes using
a straight line, as one of star lies in the territory of other(circle) class as an outlier.
4/11
As I have already mentioned, one star at other end is like an outlier for star class. SVM has a feature
to ignore outliers and find the hyper-plane that has maximum margin. Hence, we can say, SVM is
robust to outliers.
Find the hyper-plane to segregate to
classes (Scenario-5): In the scenario
below, we can’t have linear hyper-plane
between the two classes, so how does SVM
classify these two classes? Till now, we
have only looked at the linear hyper-
plane.SVM can solve this problem. Easily! It
solves this problem by introducing additional
feature. Here, we will add a new feature
z=x^2+y^2. Now, let’s plot the data points on
axis x and z:
All values for z would be positive always because z is the squared sum of both x and y
In the original plot, red circles appear close to the origin of x and y axes, leading to lower
value of z and star relatively away from the origin result to higher value of z.
In SVM, it is easy to have a linear hyper-plane between these two classes. But, another burning
question which arises is, should we need to add this feature manually to have a hyper-plane.
No, SVM has a technique called the kernel trick. These are functions which takes low dimensional
input space and transform it to a higher dimensional space i.e. it converts not separable problem to
separable problem, these functions are called kernels. It is mostly useful in non-linear separation
problem. Simply put, it does some extremely complex data transformations, then find out the
process to separate the data based on the labels or outputs you’ve defined.
When we look at the hyper-plane in original input space it looks like a circle:
Now, let’s look at the methods to apply SVM algorithm in a data science challenge.
5/11
How to implement SVM in Python?
In Python, scikit-learn is a widely used library for
implementing machine learning algorithms,
SVM is also available in scikit-learn library and
follow the same structure (Import library, object
creation, fitting model and prediction). Let’s look
at the below code:
#Import Library
from sklearn import svm
#Assumed you have, X (predictor) and Y (target) for training data set and
x_test(predictor) of test_dataset
# Create SVM classification object
model = svm.svc(kernel='linear', c=1, gamma=1)
# there is various option associated with it, like changing kernel, gamma and C
value. Will discuss more # about it in next section.Train the model using the
training sets and check score
model.fit(X, y)
model.score(X, y)
#Predict Output
predicted= model.predict(x_test)
6/11
How to tune Parameters of SVM?
I am going to discuss about some important parameters having higher impact on model performance,
“kernel”, “gamma” and “C”.
kernel: We have already discussed about it. Here, we have various options available with kernel like,
“linear”, “rbf”,”poly” and others (default value is “rbf”). Here “rbf” and “poly” are useful for non-linear hyper-
plane. Let’s look at the example, where we’ve used linear kernel on two feature of iris data set to classify
their class.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)
7/11
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
Change the kernel type to rbf in below line and look at the impact.
8/11
I would suggest you to go for linear kernel if you have large number of features (>1000) because it is more
likely that the data is linearly separable in high dimensional space. Also, you can RBF but do not forget to
cross validate for its parameters as to avoid over-fitting.
gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the as
per training data set i.e. generalization error and cause over-fitting problem.
Example: Let’s difference if we have gamma different gamma values like 0, 10 or 100.
C: Penalty parameter C of the error term. It also controls the trade off between smooth decision boundary
and classifying the training points correctly.
9/11
We should always look at the cross validation score to have effective combination of these parameters and
avoid over-fitting.
Cons:
It doesn’t perform well, when we have large data set because the required training time is
higher
It also doesn’t perform very well, when the data set has more noise i.e. target classes are
overlapping
SVM doesn’t directly provide probability estimates, these are calculated using an expensive
five-fold cross-validation. It is related SVC method of Python scikit-learn library.
Practice Problem
Find right additional feature to have a hyper-plane for segregating the classes in below snapshot:
10/11
Answer the variable name in the comments section
below. I’ll shall then reveal the answer.
End Notes
In this article, we looked at the machine learning
algorithm, Support Vector Machine in detail. I
discussed its concept of working, process of
implementation in python, the tricks to make the
model efficient by tuning its parameters, Pros and
Cons, and finally a problem to solve. I would suggest
you to use SVM and analyse the power of this model
by tuning the parameters. I also want to hear your
experience with SVM, how have you tuned
parameters to avoid over-fitting and reduce the
training time?
Did you find this article helpful? Please share your opinions / thoughts in the comments section below.
If you like what you just read & want to continue your analytics learning, subscribe
to our emails, follow us on twitter or like our facebook page.
11/11