Svm Student
Svm Student
Margins : Intuition
Linear Separators
• Binary classification can be viewed as the task of
separating classes in feature space:
wTx + b = 0
wTx + b > 0
wTx + b < 0
f(x) = sign(wTx + b)
Linear Separators
• Which of the linear separators is optimal?
Functional Margin
• Given a training example (x(i) , y(i) ), functional margin
M(i) = y(i) (wTx + b)
wT x(i) + b
γ(i) =
||w||
• So, y(wTx + b) ≥ 1
For support vectors, the inequality becomes an equality 9
Sec. 15.1
11
Sec. 15.1
Maximum Margin: Formalization
ρ =Margin Width
12
SVM - An Optimization Problem
C2
Depends on the position of
nearest feature vector.
We want to maximize γ:
Maximize this
≥γ
Minimize this
Solving the Optimization Problem
Any optimization problem can be formulated into two way, primal and dual problem. First we use
primal formulation for optimization algorithm, but if it does not yield any solution, we go for dual
optimization formulation, which is guaranteed to yield solution.
Primal Optimization Problem
• Find w and b such that:
Φ(w) =½ wT w = ½ w .w is minimized;
such that for all {(xi , yi)}: yi (wT xi + b) ≥ 1
https://www.youtube.com/watch?v=YOsrYl1JRrc&t=362s
Primal & Dual Concept(cont)
https://www.youtube.com/watch?v=YOsrYl1JRrc&t=362s
KKT Condition( Karush -Kuhn-Tucker Condition)
https://www.youtube.com/watch?v=YOsrYl1JRrc&t=362s
Primal Optimization Problem (cont)
Lp = 1/2(𝑤.𝑤)−∑α𝑖[ yi [ w . xi + b ] – 1 ]
𝜕𝐿/𝜕𝑤= 0
𝑤=∑ α𝑖yi xi
Primal to Dual Optimization Problem
(cont.)
By substitution we get,
another constraint :
Lagrangian Multiplier
If sgn = + ve , z 𝜺 C1
sgn = - ve , z 𝜺 C2
Types of SVM
1. Linear SVM
25
Hard Margin vs. Soft Margin
The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1
28
Sec. 15.2.1
• Neither slack variables ξi nor their Lagrange multipliers appear in the dual problem!
• Again, xi with non-zero αi will be support vectors.
• Solution to the dual problem is:
w is not needed explicitly for
w = Σ αi yi x i
classification!
b = yk(1- ξk) - wTxk where k =
argmax αk’ k’ f(x) = ΣαiyixiTx + b
29
Multiclass Classification
Non-Linearly Separable Data
Non-linear SVMs
When the dataset can not be separable in linear fashion:
0 x
x2
0 x
Non-linear SVMs: Feature spaces
General idea: the original input space can always be mapped to some
higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
Kernel Trick
• With feature mapping, the discriminant function becomes :
g(x) = wTɸ(x) + b = ∑iϵSV αiɸ(xi)Tɸ(x) + b
• A kernel fuction is defined as a function that corresponds to a
dot product of two feature vectors.
K(xa , xb) = ɸ(xa).ɸ(xb)
• Often K(xa , xb) may be very inexpensive to compute even if ɸ(xa)
may be extremely high dimensional.
Commonly used Kernel Functions
Linear: K(xi,xj)= xi Txj