0% found this document useful (0 votes)
23 views

DMML Unit4 - SVM

Uploaded by

Clash Clans
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

DMML Unit4 - SVM

Uploaded by

Clash Clans
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

SVM

Support Vector Machine Algorithm


• Support Vector Machine or SVM is one of the
most popular Supervised Learning algorithms,
which is used for Classification as well as
Regression problems.
• However, primarily, it is used for Classification
problems in Machine Learning.
What is SVM?
• An SVM model is a representation of the examples as
points in space, mapped so that the examples of the
separate categories are divided by a clear gap that is as
wide as possible. New examples are then mapped into
that same space and predicted to belong to a category
based on which side of the gap they fall.
• In addition to performing linear classification, SVMs
can efficiently perform a nonlinear classification using
what is called the kernel trick, implicitly mapping their
inputs into high-dimensional feature spaces.
What is SVM?
A support vector machine constructs a hyperplane or set of
hyperplanes in a high- or infinite-dimensional space, which can be
used for classification, regression, or other tasks like outliers detection
What is SVM?
How does SVM works?
• Suppose we have a dataset that has two tags (green and
blue), and the dataset has two features x1 and x2. We want a
classifier that can classify the pair(x1, x2) of coordinates in
either green or blue
• So as it is 2-d space so by just using a straight
line, we can easily separate these two classes.
But there can be multiple lines that can
separate these classes.
• Identify the right hyper-plane (Scenario-1): Here, we have three
hyper-planes (A, B and C). Now, identify the right hyper-plane to
classify star and circle.

• You need to remember a thumb rule to identify the right


hyper-plane:
• “Select the hyper-plane which segregates the two classes better
• Identify the right hyper-plane (Scenario-2): Here, we
have three hyper-planes (A, B and C) and all are
segregating the classes well. Now, How can we
identify the right hyper-plane?
• Here, maximizing the distances between nearest
data point (either class) and hyper-plane will help us
to decide the right hyper-plane. This distance is
called as Margin. Let’s look at the below snapshot:
• Hence, the SVM algorithm helps to find the best line or
decision boundary; this best boundary or region is called as
a hyperplane. SVM algorithm finds the closest point of the
lines from both the classes. These points are called support
vectors. The distance between the vectors and the hyperplane
is called as margin. And the goal of SVM is to maximize this
margin. The hyperplane with maximum margin is called
the optimal hyperplane.
• Above, you can see that the margin for hyper-plane C is
high as compared to both A and B. Hence, we name the right
hyper-plane as C. Another lightning reason for selecting the
hyper-plane with higher margin is robustness. If we select a
hyper-plane having low margin then there is high chance of
miss-classification
• Hyperplane: There can be multiple lines/decision
boundaries to segregate the classes in
n-dimensional space, but we need to find out the
best decision boundary that helps to classify the
data points. This best boundary is known as the
hyperplane of SVM.
• Support Vectors: The data points or vectors that
are the closest to the hyperplane and which affect
the position of the hyperplane are termed as
Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
Types of SVM
• SVM can be of two types:
• Linear SVM: Linear SVM is used for linearly separable
data, which means if a dataset can be classified into
two classes by using a single straight line, then such
data is termed as linearly separable data, and classifier
is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for
non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such
data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
Advantages of Support Vector
algorithm
• Support vector machine is very effective even
with high dimensional data.
• When you have a data set where number of
features is more than the number of rows of
data, SVM can perform in that case as well.
• When classes in the data are points are well
separated SVM works really well.
• SVM can be used for both regression and
classification problem.
Disadvantages of Support Vector
Machine (SVM)

1. Choosing an appropriate Kernel function is difficult: Choosing
an appropriate Kernel function (to handle the non-linear data) is not an easy task. It
could be tricky and complex. In case of using a high dimension Kernel, you might
generate too many support vectors which reduce the training speed drastically.
2. Extensive memory requirement: Algorithmic complexity and memory
requirements of SVM are very high. You need a lot of memory since you have to
store all the support vectors in the memory and this number grows abruptly with the
training dataset size.
3. Requires Feature Scaling: One must do feature scaling of variables before applying
SVM.
4. Long training time: SVM takes a long training time on large datasets.
5. Difficult to interpret: SVM model is difficult to understand and interpret by human
beings unlike Decision Trees.
Applications of Support Vector
Machine
Face detection – SVM classify parts of the image as a
face and nonface and create a square boundary around
the face.
• Text and hypertext categorization – SVMs allow Text
and hyper text categorization for both inductive and
transductive models. They use training data to classify
documents into different categories. It categorizes on the
basis of the score generated and then compares with the
threshold value.
• Classification of images – Use of SVMs provides better
search accuracy for image classification. It provides
better accuracy in comparison to the traditional query
based searching techniques.
Applications of Support Vector
Machine

• Bioinformatics – It includes protein
classification and cancer classification. We use
SVM for identifying the classification of genes,
patients on the basis of genes and other
biological problems.
• Protein fold and remote homology detection
– Apply SVM algorithms for protein remote
homology detection.
• Handwriting recognition – We use SVMs to
recognize hand written characters used widely.
Exercise
Excercise
SVM Kernel
• Kernel plays a vital role in classification and is used to
analyse some patterns in the given dataset.
• They are very helpful in solving a non-linear
problem by using a linear classifier.
• Kernels help us to deal with high dimensional data in a
very efficient manner
• Kernels are a way to solve non-linear problems with
the help of linear classifiers. This is known as
the kernel trick method.
• Kernel Function is a method used to take data as input
and transform into the required form of processing
data.
SVM Kernel
• The kernel functions are used as parameters in the
SVM codes. They help to determine the shape of
the hyperplane and decision boundary.
• The value can be any type of kernel from linear to
polynomial. If the value of the kernel is linear then
the decision boundary would be linear and
two-dimensional.
• These kernel functions also help in giving decision
boundaries for higher dimensions.
Types of kernel
• These functions are of different kinds—for instance, linear,
nonlinear, polynomial, radial basis function (RBF), and sigmoid.
• The most preferred kind of kernel function is RBF. Because it's
localized and has a finite response along the complete x-axis.
• The kernel functions return the scalar product between two points
in an exceedingly suitable feature space.
SVM Kernel
Linear Kernel
• It is the most basic type of kernel, usually one dimensional in
nature. It proves to be the best function when there are lots of
features.
• The linear kernel is mostly preferred for text-classification
problems as most of these kinds of classification problems can be
linearly separated.

• Linear kernel functions are faster than other functions.

• Linear Kernel Formula


• K(xi, xj) = sum( xi.xj)

• Here, xi, xj represents the data you’re trying to classify.


SVM Kernel
• Polynomial Kernel
• It is a more generalized representation of the linear kernel. It is not
as preferred as other kernel functions as it is less efficient and
accurate.

• Polynomial Kernel Formula


• k(xi, xj) = (x.xj+1)^d

• Here ‘.’ shows the dot product of both the values, and d denotes
the degree.

• k(xi, xj) representing the decision boundary to separate the given


classes.
SVM Kernel
• Gaussian Radial Basis Function (RBF)
• It is one of the most preferred and used kernel functions in svm.
• It is usually chosen for non-linear data. It helps to make proper
separation when there is no prior knowledge of data.

• Gaussian Radial Basis Formula


• k(xi, xj) = exp(-gamma * ||xi - xj||^2)

• The value of gamma varies from 0 to 1.


• You have to manually provide the value of gamma in the code.
The most preferred value for gamma is 0.1.
SVM Kernel
• Sigmoid Kernel
• It is mostly preferred for neural networks. This
kernel function is similar to a two-layer
perceptron model of the neural network, which
works as an activation function for neurons.

• It can be shown as,

• Sigmoid Kenel Function


• k(xi, xj) = tanh(αxay + c)
SVM Kernel
• Gaussian Kernel
• It is a commonly used kernel. It is used when
there is no prior knowledge of a given dataset.
• Gaussian Kernel Formula
SVM Kernel
• The linear kernel is mostly preferred for text
classification problems as it performs well for large
datasets.
• Gaussian kernels tend to give good results when there
is no additional information regarding data that is not
available.
• Rbf kernel is also a kind of Gaussian kernel which
projects the high dimensional data and then searches
a linear separation for it.
• Polynomial kernels give good results for problems
where all the training data is normalized.

You might also like