Math behind SVM
The main objective of SVM is to find the optimal hyperplane which
linearly separates the data points in two component by maximizing the
margin
All data points considered to be vector here.
• SVM is the Supervised learning model with associate learning
algorithms that analyse data and recognize patterns.
• Used for regression and classification.
• SVM model is a representation of examples as points in space,
mapped so that the example of the separate categories are divided by
a clear gap that is as wide as possible.
• New examples are mapped to the same space and predicted to
belong to a category based on wide side of the gap they fall on.
Basic Linear Algebra
Vectors
Vectors are mathematical quantity which has both magnitude and
direction. A point in the 2D plane can be represented as a vector
between origin and the point.
Dot Product
Dot product between two vectors is a scalar
quantity . It tells how to vectors are related.
Hyper-plane
It is plane that linearly divide the n-dimensional data points in two
component. In case of 2D, hyperplane is line, in case of 3D it is plane.It
is also called as n-dimensional line. Figure shows, a blue
line(hyperplane) linearly separates the data point in two components.
Equations of hyperplanes for
algorithms like linear
regression, SVM
For 2 dimensional data that is 2 input features we need the equation of
line, for 3 dimensional data, we need the equation of plane and for
data whose dimension is greater than 3, we need the equation of
hyperplane.
A hyperplane is nothing but the equation of the plane for 4D data, 5D
data , n-dimensional data
Consider 2D data.
The equation of line (general) is y = mx + c
where m is the slope and c is the intercept.
Another general form for the equation of line can be
ax + by + c = 0 — — — eq(1)
which can again be written as :
y = -(a/b)x -(c/b).
Comparing the above equation with y = mx + c, we get m = -a/b and c =
-c/b
Now generally for higher dimensional data the axes are named as x1,
x2, .. xn and not X axis and Y axis. So replacing the x and y with x1 and
x2 in equation 1 we get,
ax1 + bx2 + c = 0 — — — eq (2)
Now replacing the coefficients i.e a with w1, b with w2 and c with w0.
Since for higher dimensions instead of a, b, c we use w1, w2, w3, … wn.
w1x1 + w2x2 + w0 = 0
The same type of concept can be extended to 3D data
w1x1 + w2x2 + w3x3 + w0 = 0
So for n dimensional data the equation of hyperplane will be
w1x1 + w2x2 + w3x3 + w4x4 + …… + wnxn + w0 = 0
w1x1 + w2x2 is nothing but the
dot product of w.x in vectors.
For a 4 dimensional data, when the equation
of hyperplane is
w1x1 + w2x2 + w3x3 + w4x4 + w0 = 0
To generalize this we can assume a vector w which has components w1,
w2, w3 …. wn inside it and a vector x which has components x1, x2, x3
… xn inside it.
So rewriting equation as
w.x + w0 = 0
Here- w is normal vector
x is point on hyperplane
w0 is bias or offset
Here w0 = 0 whenever the hyperplane will pass through the origin
So equation can be modified as :
Dot Product of 2 Vectors
A . B = |A| cosθ * |B|
Where -
|A| cosθ is the projection of A on B
|B| is the magnitude of vector B
Use of Dot Product in SVM
Consider a random point X and we want to
know whether it lies on the right side of
the plane or the left side of the plane
(positive or negative).
Assume this point is a vector (X) and then we make a vector (w) which
is perpendicular to the hyperplane.
Let- The distance of vector w from origin to decision boundary is ‘c’.
Now take the projection of X vector on w.
Hence, we will take the dot product of X and w vectors and we just
need the projection of X not the magnitude of w from dot product.
To just get the projection, simply take the unit vector of w because it
will be in the direction of w but its magnitude will be 1. Hence now the
equation becomes:
X.w = |X| cosθ * unit vector of w
project all other data points onto this
perpendicular vector and compare their
distances.
Linear SVM Algorithm
Step 1:
{(𝑥1,𝑦1),(𝑥2,𝑦2),…,(𝑥𝑛,𝑦𝑛)}, where 𝑦𝑖∈{−1,+1}
Given training data:
Step 2:
Define a hyperplane:
wTx+b=0
Where:
• w is the weight vector (normal to the hyperplane)
• b is the bias term (offset)
This separates the data into:
Class +1 if w⋅x+b>0
Class -1 if w⋅x+b<0
Step 3: Maximize the Margin
The margin is the distance between the hyperplane and the nearest data
points from each class. The goal is to maximize this margin.
The distance from a point x to the hyperplane is:
To maximize the margin that maximally separates the classes. we minimize
Classification Condition
For every point to be correctly classified below condition should always
be true:, it enforces the following constraints on the data:
It defines the two margin boundaries:
+1 Margin (w·x + b = +1)
---------------------------
Support Vectors (class +1)
Decision Boundary (w·x + b = 0)
---------------------------
Support Vectors (class -1)
---------------------------
-1 Margin (w·x + b = -1)
Step 4: Margin and Optimization
Objective Function
Hard-Margin SVM (Linearly Separable
Case)
• We will select 2 support vectors:
• 1 from the negative class.
• 1 from the positive class.
• The distance between these two vectors, x1and x2, will be
represented as the vector (x2−x1).
• Our goal is to find the shortest distance between these two points.
• This can be achieved using a trick from the dot product:
• Take a vector w that is perpendicular to the hyperplane.
• Find the projection of the vector (x2−x1) onto w (this perpendicular vector
should be a unit vector then only this will work using dot product)
We already know how to find the
projection of a vector on another
vector.
We minimize a cost function for optimization
problems; therefore, invert the function.
Cost Function