0% found this document useful (0 votes)
47 views

Support Vector Machine

Support vector machines (SVM) is a supervised machine learning algorithm used for classification and regression analysis. It works by finding the optimal hyperplane that separates all data points of one class from another class with the maximum margin. SVM can perform both linear and nonlinear classification by using kernels to transform the original input space into a higher dimensional space. The key parameters that affect an SVM model are the kernel type, regularization parameter, and gamma value, which can be tuned to improve classification accuracy.

Uploaded by

danyalshah9009
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Support Vector Machine

Support vector machines (SVM) is a supervised machine learning algorithm used for classification and regression analysis. It works by finding the optimal hyperplane that separates all data points of one class from another class with the maximum margin. SVM can perform both linear and nonlinear classification by using kernels to transform the original input space into a higher dimensional space. The key parameters that affect an SVM model are the kernel type, regularization parameter, and gamma value, which can be tuned to improve classification accuracy.

Uploaded by

danyalshah9009
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Machine Learning

By: Ammara Yaseen


Agenda:
• What is support vector machine?
• Logistic regression vs SVM
• Types of Support vector machine Algorithms
• Important terms
• How does it works?
• Mathematical intuition behind it
• Margins in SVM
• Kernels in SVM
• Why SVM?
Support Vector Machine
• Support Vector Machine (SVM) is a supervised learning machine learning
algorithm that can be used for both classification or regression challenges.
However, it is mostly used in classification problems, such as text
classification.
Support Vector Machine
• It is a supervised machine learning problem where we try to find a
hyperplane that best separates the two classes.

• Now the question is which hyperplane does it select? There can be an


infinite number of hyperplanes passing through a point and classifying the
two classes perfectly. So, which one is the best?

• SVM does this by finding the maximum margin between the hyperplanes
that means maximum distances between the two classes.
Logistic Regression Vs SVM
• Both the algorithms try to find the best hyperplane, but the main
difference is logistic regression is a probabilistic approach whereas
support vector machine is based on statistical approaches.

• Depending on the number of features you have you can either choose
Logistic Regression or SVM.

• SVM works best when the dataset is small and complex. It is usually
advisable to first use logistic regression and see how does it performs, if it
fails to give a good accuracy you can go for SVM without any kernel (will
talk more about kernels in the later section). Logistic regression and SVM
without any kernel have similar performance but depending on your
features, one may be more efficient than the other.
Types of Support Vector Machine
Algorithms

1. Linear SVM
When the data is perfectly linearly separable only then we can use Linear SVM.
Perfectly linearly separable means that the data points can be classified into 2 classes
by using a single straight line(if 2D).
2. Non-Linear SVM
When the data is not linearly separable then we can use Non-Linear SVM, which
means when the data points cannot be separated into 2 classes by using a straight
line (if 2D) then we use some advanced techniques like kernel tricks to classify them.
In most real-world applications we do not find linearly separable datapoints hence
we use kernel trick to solve them.
Support Vector Machine
Support Vector Machine

Support Vectors: These are the points that are closest to the hyperplane. A separating
line will be defined with the help of these data points.

Margin: it is the distance between the hyperplane and the observations closest to the
hyperplane (support vectors). In SVM large margin is considered a good margin. There
are two types of margins hard margin and soft margin.
Identify the right Hyper-plane

You need to remember a thumb rule to identify the right hyper-plane: “Select the
hyper-plane which segregates the two classes better”. In this scenario, hyper-plane
“B” has excellently performed this job.
Identify the right Hyper-plane

Here, maximizing the distances between nearest data point (either class) and hyper-
plane will help us to decide the right hyper-plane. This distance is called as Margin.
Identify the right Hyper-plane

Above, you can see that the margin for hyper-plane C is high as compared to both A
and B. Hence, we name the right hyper-plane as C. Another lightning reason for
selecting the hyper-plane with higher margin is robustness. If we select a hyper-
plane having low margin then there is high chance of miss-classification.
Identify the right Hyper-plane

unable to segregate the two classes using a straight line, as one of the stars lies in
the territory of other(circle) class as an outlier.

one star at other end is like an outlier for star class. The SVM algorithm has a feature
to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we
can say, SVM classification is robust to outliers.
Identify the right Hyper-plane

The vector points closest to the hyperplane are known as the support vector
points because only these two points are contributing to the result of the algorithm,
other points are not.

If a data point is not a support vector, removing it has no effect on the model.

On the other hands, deleting the support vectors will then change the position of the
hyperplane.
Hyperplane

The dimension of the hyperplane depends upon the number of features.

If the number of input features is 2, then the hyperplane is just a line.

If the number of input features is 3, then the hyperplane becomes a two-


dimensional plane.

It becomes difficult to imagine when the number of features exceeds 3.


Hyperplane
MARGIN

The distance of the vectors from the hyperplane is called the margin which is a
separation of a line to the closest class points.

We would like to choose a hyperplane that maximises the margin between classes.

The graph in next slide shows what good margin and bad margin are.
MARGIN
MARGIN

Hard Margin

If the training data is linearly separable, we can select two parallel hyperplanes
that separate the two classes of data, so that the distance between them is as
large as possible.

Soft Margin

As most of the real-world data are not fully linearly separable, we will allow
some margin violation to occur which is called soft margin classification. It is
better to have a large margin, even though some constraints are violated.
Margin violation means choosing a hyperplane, which can allow some data
points to stay in either incorrect side of hyperplane and between margin and
correct side of the hyperplane.
MARGIN
Linear Algebra REVISED
Mathematics behind SVM

Equation of a line is y=ax+b.


Equation of a hyperplane is defined by :
Mathematics behind SVM

The two equations are just two different ways of expressing the same thing

For Support Vector Classifier (SVC), we use 𝐰T𝐱+𝑏 where 𝐰 is the weight vector and
𝑏 is the bias.
Mathematics behind SVM

The name of the variables in the hyperplane equation are w and x which means they
are vectors!

A vector has magnitude (size) and direction which works perfectly well in 3 or more
dimensions.

Therefore, the application of “vector” is used in the SVMs algorithm.


Mathematics behind SVM

The eq. of calculating the


margin
Classifying Non-linear Data

What about data points are not linearly separable?


Classifying Non-linear Data

SVM has a technique called the kernel trick.

These are functions which take low dimensional input space and transform it into a
higher-dimensional space i.e. it converts not separable problem to separable
problem.

It is mostly useful in non-linear separation problem.


Classifying Non-linear Data
Classifying Non-linear Data
Kernel

In practice, SVM algorithm is implemented with kernel that transforms an input data
space into the required form.

SVM uses a technique called the kernel trick in which kernel takes a low dimensional
input space and transforms it into a higher dimensional space.

In simple words, kernel converts non-separable problems into separable problems


by adding more dimensions to it.

It makes SVM more powerful, flexible and accurate.


Types of Kernels
Linear Kernel

It can be used as a dot product between any two observations. The formula of linear
kernel is as below −

From the above formula, we can see that the product between two vectors say 𝑥 &
𝑥𝑖 is the sum of the multiplication of each pair of input values.
Polynomial Kernel

It is more generalized form of linear kernel and distinguish curved or nonlinear input
space.

Following is the formula for polynomial kernel −

Here d is the degree of polynomial, which we need to specify manually in the


learning algorithm.
Radial Basis Function (RBF) Kernel

RBF kernel, mostly used in SVM classification, maps input space in indefinite
dimensional space.

Following formula explains it mathematically −

Here, gamma ranges from 0 to 1. We need to manually specify it in the learning


algorithm. A good default value of gamma is 0.1.

As we implemented SVM for linearly separable data, we can implement it in Python


for the data that is not linearly separable. It can be done by using kernels.
Tuning parameters: Kernel,
Regularization, Gamma and Margin.

These are tuning parameters in SVM classifier.

Varying those we can achieve considerable non linear classification line with more
accuracy in reasonable amount of time.
Regularization

The Regularization parameter (often termed as C parameter ) tells the SVM


optimization how much you want to avoid misclassifying each training example.

For large values of C, the optimization will choose a smaller-margin hyperplane if


that hyperplane does a better job of getting all the training points classified correctly.

Conversely, a very small value of C will cause the optimizer to look for a larger-margin
separating hyperplane, even if that hyperplane misclassifies more points.
Regularization

The images below are example of two different regularization parameter.

Left one has some misclassification due to lower regularization value.

Higher value leads to results like right one.


Gamma

The gamma parameter defines how far the influence of a single training example
reaches, with low values meaning ‘far’ and high values meaning ‘close’.

In other words, with low gamma, points far away from plausible separation line are
considered in calculation for the separation line. Where as high gamma means the
points close to plausible line are considered in calculation.
Why SVM?

Advantages of SVM
•SVM works better when the data is Linear
•It is more effective in high dimensions
•With the help of the kernel trick, we can solve any complex problem
•SVM is not sensitive to outliers
•Can help us with Image classification
Disadvantages of SVM
•Choosing a good kernel is not easy
•It doesn’t show good results on a big dataset
•The SVM hyperparameters are Cost -C and gamma. It is not that easy to fine-tune
these hyper-parameters. It is hard to visualize their impact
Non-Linear Classification

You might also like