0% found this document useful (0 votes)
28 views37 pages

SVM Updated

The document explains the mathematical principles behind Support Vector Machines (SVM), focusing on the concept of hyperplanes that separate data points in a high-dimensional space. It details the equations governing hyperplanes, the use of dot products, and the steps involved in the Linear SVM algorithm, including maximizing the margin between classes. Additionally, it outlines the classification conditions and optimization techniques necessary for effective SVM implementation.

Uploaded by

manasvigoyal5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views37 pages

SVM Updated

The document explains the mathematical principles behind Support Vector Machines (SVM), focusing on the concept of hyperplanes that separate data points in a high-dimensional space. It details the equations governing hyperplanes, the use of dot products, and the steps involved in the Linear SVM algorithm, including maximizing the margin between classes. Additionally, it outlines the classification conditions and optimization techniques necessary for effective SVM implementation.

Uploaded by

manasvigoyal5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Math behind SVM

The main objective of SVM is to find the optimal hyperplane which


linearly separates the data points in two component by maximizing the
margin
All data points considered to be vector here.
• SVM is the Supervised learning model with associate learning
algorithms that analyse data and recognize patterns.
• Used for regression and classification.
• SVM model is a representation of examples as points in space,
mapped so that the example of the separate categories are divided by
a clear gap that is as wide as possible.
• New examples are mapped to the same space and predicted to
belong to a category based on wide side of the gap they fall on.
Basic Linear Algebra

Vectors
Vectors are mathematical quantity which has both magnitude and
direction. A point in the 2D plane can be represented as a vector
between origin and the point.
Dot Product
Dot product between two vectors is a scalar
quantity . It tells how to vectors are related.
Hyper-plane
It is plane that linearly divide the n-dimensional data points in two
component. In case of 2D, hyperplane is line, in case of 3D it is plane.It
is also called as n-dimensional line. Figure shows, a blue
line(hyperplane) linearly separates the data point in two components.
Equations of hyperplanes for
algorithms like linear
regression, SVM
For 2 dimensional data that is 2 input features we need the equation of
line, for 3 dimensional data, we need the equation of plane and for
data whose dimension is greater than 3, we need the equation of
hyperplane.

A hyperplane is nothing but the equation of the plane for 4D data, 5D


data , n-dimensional data
Consider 2D data.
The equation of line (general) is y = mx + c
where m is the slope and c is the intercept.
Another general form for the equation of line can be

ax + by + c = 0 — — — eq(1)
which can again be written as :
y = -(a/b)x -(c/b).
Comparing the above equation with y = mx + c, we get m = -a/b and c =
-c/b
Now generally for higher dimensional data the axes are named as x1,
x2, .. xn and not X axis and Y axis. So replacing the x and y with x1 and
x2 in equation 1 we get,
ax1 + bx2 + c = 0 — — — eq (2)
Now replacing the coefficients i.e a with w1, b with w2 and c with w0.
Since for higher dimensions instead of a, b, c we use w1, w2, w3, … wn.

w1x1 + w2x2 + w0 = 0

The same type of concept can be extended to 3D data


w1x1 + w2x2 + w3x3 + w0 = 0

So for n dimensional data the equation of hyperplane will be


w1x1 + w2x2 + w3x3 + w4x4 + …… + wnxn + w0 = 0
w1x1 + w2x2 is nothing but the
dot product of w.x in vectors.
For a 4 dimensional data, when the equation
of hyperplane is

w1x1 + w2x2 + w3x3 + w4x4 + w0 = 0

To generalize this we can assume a vector w which has components w1,


w2, w3 …. wn inside it and a vector x which has components x1, x2, x3
… xn inside it.
So rewriting equation as
w.x + w0 = 0
Here- w is normal vector
x is point on hyperplane
w0 is bias or offset
Here w0 = 0 whenever the hyperplane will pass through the origin
So equation can be modified as :
Dot Product of 2 Vectors
A . B = |A| cosθ * |B|

Where -
|A| cosθ is the projection of A on B
|B| is the magnitude of vector B
Use of Dot Product in SVM
Consider a random point X and we want to
know whether it lies on the right side of
the plane or the left side of the plane
(positive or negative).
Assume this point is a vector (X) and then we make a vector (w) which
is perpendicular to the hyperplane.
Let- The distance of vector w from origin to decision boundary is ‘c’.
Now take the projection of X vector on w.
Hence, we will take the dot product of X and w vectors and we just
need the projection of X not the magnitude of w from dot product.
To just get the projection, simply take the unit vector of w because it
will be in the direction of w but its magnitude will be 1. Hence now the
equation becomes:
X.w = |X| cosθ * unit vector of w
project all other data points onto this
perpendicular vector and compare their
distances.
Linear SVM Algorithm
Step 1:

{(𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)}, where 𝑦𝑖∈{−1,+1}


Given training data:

Step 2:
Define a hyperplane:
wTx+b=0
Where:
• w is the weight vector (normal to the hyperplane)
• b is the bias term (offset)

This separates the data into:


Class +1 if w⋅x+b>0
Class -1 if w⋅x+b<0
Step 3: Maximize the Margin

The margin is the distance between the hyperplane and the nearest data
points from each class. The goal is to maximize this margin.
The distance from a point x to the hyperplane is:

To maximize the margin that maximally separates the classes. we minimize


Classification Condition
For every point to be correctly classified below condition should always
be true:, it enforces the following constraints on the data:
It defines the two margin boundaries:

+1 Margin (w·x + b = +1)


---------------------------
Support Vectors (class +1)

Decision Boundary (w·x + b = 0)


---------------------------

Support Vectors (class -1)


---------------------------
-1 Margin (w·x + b = -1)
Step 4: Margin and Optimization
Objective Function
Hard-Margin SVM (Linearly Separable
Case)
• We will select 2 support vectors:
• 1 from the negative class.
• 1 from the positive class.
• The distance between these two vectors, x1​and x2, will be
represented as the vector (x2−x1).
• Our goal is to find the shortest distance between these two points.
• This can be achieved using a trick from the dot product:
• Take a vector w that is perpendicular to the hyperplane.
• Find the projection of the vector (x2−x1) onto w (this perpendicular vector
should be a unit vector then only this will work using dot product)
We already know how to find the
projection of a vector on another
vector.
We minimize a cost function for optimization
problems; therefore, invert the function.
Cost Function

You might also like