SVM

This document provides an overview of support vector machines (SVM). It begins with a review of previous machine learning concepts. It then introduces SVM, noting that it is a classifier derived from statistical learning theory that is widely used for tasks like image recognition. The document outlines the linear SVM, explaining how it finds the optimal hyperplane that maximizes the margin between classes to minimize error. It describes how nonlinear SVM addresses non-linearly separable data using kernels to map data into higher dimensions where a linear separator may be found. In closing, it provides examples of commonly used kernel functions and outlines the optimization technique.

Uploaded by

Himanshu Bhatt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

604 views

SVM

Uploaded by

Himanshu Bhatt

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 36

An Introduction of

Support Vector Machine

Jinwei Gu
2008/10/16
Review: What Weve Learned So Far
Bayesian Decision Theory
Maximum-Likelihood & Bayesian Parameter Estimation
Nonparametric Density Estimation
Parzen-Window, k
n
-Nearest-Neighbor

K-Nearest Neighbor Classifier
Decision Tree Classifier
Today: Support Vector Machine (SVM)
A classifier derived from statistical learning theory by Vapnik, et
al. in 1992
SVM became famous when, using images as input, it gave
accuracy comparable to neural-network with hand-designed
features in a handwriting recognition task
Currently, SVM is widely used in object detection & recognition,
content-based image retrieval, text recognition, biometrics,
speech recognition, etc.
Also used for regression (will not cover today)

Chapter 5.1, 5.2, 5.3, 5.11 (5.4*) in textbook
V. Vapnik
Outline
Linear Discriminant Function
Large Margin Linear Classifier
Nonlinear SVM: The Kernel Trick
Demo of SVM

Discriminant Function
Chapter 2.4: the classifier is said to assign a feature vector x to
class w
i
if
( ) ( ) for all
i j
g g j i > = x x
An example weve learned before:
Minimum-Error-Rate Classifier
For two-category case,
1 2
( ) ( ) ( ) g g g x x x
1 2
Decide if ( ) 0; otherwise decide g e e > x
1 2
( ) ( | ) ( | ) g p p e e x x x
Discriminant Function
It can be arbitrary functions of x, such as:
Nearest
Neighbor
Decision
Tree
Linear
Functions
( )
T
g b = + x w x
Nonlinear
Functions
Linear Discriminant Function
g(x) is a linear function:
( )
T
g b = + x w x
x
1
x
2
w
T
x +b <0
w
T
x +b >0
A hyper-plane in the
feature space
(Unit-length) normal vector
of the hyper-plane:
=
w
n
w
n
How would you classify
these points using a linear
discriminant function in order
to minimize the error rate?
Linear Discriminant Function
denotes +1
denotes -1
x
1
x
2
Infinite number of answers!
How would you classify
these points using a linear
discriminant function in order
to minimize the error rate?
Linear Discriminant Function
denotes +1
denotes -1
x
1
x
2
Infinite number of answers!
How would you classify
these points using a linear
discriminant function in order
to minimize the error rate?
Linear Discriminant Function
denotes +1
denotes -1
x
1
x
2
Infinite number of answers!
x
1
x
2 How would you classify
these points using a linear
discriminant function in order
to minimize the error rate?
Linear Discriminant Function
denotes +1
denotes -1
Infinite number of answers!
Which one is the best?
Large Margin Linear Classifier
safe zone
The linear discriminant
function (classifier) with the
maximum margin is the best
Margin is defined as the
width that the boundary
could be increased by before
hitting a data point
Why it is the best?
Robust to outliners and thus
strong generalization ability
Margin
x
1
x
2
denotes +1
denotes -1
Large Margin Linear Classifier
Given a set of data points:
With a scale transformation
on both w and b, the above
is equivalent to
x
1
x
2
denotes +1
denotes -1
For 1, 0
For 1, 0
T
i i
T
i i
y b
y b
= + + >
= + <
w x
w x
{( , )}, 1, 2, ,
i i
y i n = x
, where
For 1, 1
For 1, 1
T
i i
T
i i
y b
y b
= + + >
= + s
w x
w x
Large Margin Linear Classifier
We know that
The margin width is:
x
1
x
2
denotes +1
denotes -1
1
1
T
T
b
b
+

+ =
+ =
w x
w x
Margin
x
+
x
+
x
-
( )
2
( )
M
+
+
=
= =
x x n
w
x x
w w
n
Support Vectors
Large Margin Linear Classifier
Formulation:
x
1
x
2
denotes +1
denotes -1
Margin
x
+
x
+
x
-
n
such that
2
maximize
w
For 1, 1
For 1, 1
T
i i
T
i i
y b
y b
= + + >
= + s
w x
w x
Large Margin Linear Classifier
Formulation:
x
1
x
2
denotes +1
denotes -1
Margin
x
+
x
+
x
-
n
2 1
minimize
2
w
such that
For 1, 1
For 1, 1
T
i i
T
i i
y b
y b
= + + >
= + s
w x
w x
Large Margin Linear Classifier
Formulation:
x
1
x
2
denotes +1
denotes -1
Margin
x
+
x
+
x
-
n
( ) 1
T
i i
y b + > w x
2 1
minimize
2
w
such that
Solving the Optimization Problem
( ) 1
T
i i
y b + > w x
2 1
minimize
2
w
s.t.
Quadratic
programming
with linear
constraints
( )
2
1
1
minimize ( , , ) ( ) 1
2
n
T
p i i i i
i
L b y b o o
=
= +

w w w x
s.t.
Lagrangian
Function
0
i
o >
Solving the Optimization Problem
( )
2
1
1
minimize ( , , ) ( ) 1
2
n
T
p i i i i
i
L b y b o o
=
= +

w w w x
s.t.
0
i
o >
0
p
L
b
c
=
c
0
p
L c
=
cw
1
n
i i i
i
y o
=
=

w x
1
0
n
i i
i
y o
=
=

Solving the Optimization Problem

( )
2
1
1
minimize ( , , ) ( ) 1
2
n
T
p i i i i
i
L b y b o o
=
= +

w w w x
s.t.
0
i
o >
1 1 1
1
maximize
2
n n n
T
i i j i j i j
i i j
y y o o o
= = =

x x
s.t. 0
i
o >
1
0
n
i i
i
y o
=
=

, and
Lagrangian Dual
Problem
Solving the Optimization Problem
The solution has the form:
( )
( ) 1 0
T
i i i
y b o + = w x
From KKT condition, we know:
Thus, only support vectors have 0
i
o =
1 SV
n
i i i i i i
i i
y y o o
= e
= =

w x x
get from ( ) 1 0,
where is support vector
T
i i
i
b y b + = w x
x
x
1
x
2
x
+
x
+
x
-
Support Vectors
Solving the Optimization Problem
SV
( )
T T
i i
i
g b b o
e
= + = +

x w x x x
The linear discriminant function is:
Notice it relies on a dot product between the test point x
and the support vectors x
i
Also keep in mind that solving the optimization problem
involved computing the dot products x
i
T
x
j
between all pairs
of training points
Large Margin Linear Classifier
What if data is not linear
separable? (noisy data,
outliers, etc.)
Slack variables
i
can be
added to allow mis-
classification of difficult
or noisy data points
x
1
x
2
denotes +1
denotes -1
1

Large Margin Linear Classifier

Formulation:
( ) 1
T
i i i
y b + > w x
2
1
1
minimize
2
n
i
i
C
=
+

w
such that
0
i
>
Parameter C can be viewed as a way to control over-fitting.
Large Margin Linear Classifier
Formulation: (Lagrangian Dual Problem)
1 1 1
1
maximize
2
n n n
T
i i j i j i j
i i j
y y o o o
= = =

x x
such that
0
i
C o s s
1
0
n
i i
i
y o
=
=

Non-linear SVMs
Datasets that are linearly separable with noise work out great:
0
x

0
x

x
2
0
x

But what are we going to do if the dataset is just too hard?
How about mapping data to a higher-dimensional space:
This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt
Non-linear SVMs: Feature Space
General idea: the original input space can be mapped to
some higher-dimensional feature space where the
training set is separable:
: x

(x)
This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt
Nonlinear SVMs: The Kernel Trick
With this mapping, our discriminant function is now:
SV
( ) ( ) ( ) ( )
T T
i i
i
g b b | o | |
e
= + = +

x w x x x
No need to know this mapping explicitly, because we only use
the dot product of feature vectors in both the training and test.
A kernel function is defined as a function that corresponds to
a dot product of two feature vectors in some expanded feature
space:
( , ) ( ) ( )
T
i j i j
K | | x x x x
Nonlinear SVMs: The Kernel Trick
2-dimensional vectors x=[x
1
x
2
];

let K(x
i
,x
j
)=(1 + x
i
T
x
j
)
2
,

Need to show that K(x
i
,x
j
) = (x
i
)

T
(x
j
):

K(x
i
,x
j
)=(1 + x
i
T
x
j
)
2
,

= 1+ x
i1
2
x
j1
2
+2 x
i1
x
j1

x
i2
x
j2
+x
i2
2
x
j2
2
+ 2x
i1
x
j1
+2x
i2
x
j2

=[1 x
i1
2
2 x
i1
x
i2
x
i2
2
2x
i1
2x
i2
]
T
[1 x
j1
2
2 x
j1
x
j2
x
j2
2
2x
j1
2x
j2
]
= (x
i
)

T
(x
j
), where (x) =

[1 x
1
2
2 x
1
x
2
x
2
2
2x
1
2x
2
]

An example:
This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt
Nonlinear SVMs: The Kernel Trick
Linear kernel:
2
2
( , ) exp( )
2
i j
i j
K
o

=
x x
x x
( , )
T
i j i j
K = x x x x
( , ) (1 )
T p
i j i j
K = + x x x x
0 1
( , ) tanh( )
T
i j i j
K | | = + x x x x
Examples of commonly-used kernel functions:
Polynomial kernel:
Gaussian (Radial-Basis Function (RBF) ) kernel:
Sigmoid:
In general, functions that satisfy Mercers condition can be
kernel functions.
Nonlinear SVM: Optimization
Formulation: (Lagrangian Dual Problem)
1 1 1
1
maximize ( , )
2
n n n
i i j i j i j
i i j
y y K o o o
= = =

x x
such that
0
i
C o s s
1
0
n
i i
i
y o
=
=

The solution of the discriminant function is

SV
( ) ( , )
i i
i
g K b o
e
= +

x x x
The optimization technique is the same.
Support Vector Machine: Algorithm
1. Choose a kernel function

2. Choose a value for C

3. Solve the quadratic programming problem
(many software packages available)

4. Construct the discriminant function from the
support vectors
Some Issues
Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
- domain experts can give assistance in formulating appropriate
similarity measures

Choice of kernel parameters
- e.g. in Gaussian kernel
- is the distance between closest points with different classifications
- In the absence of reliable criteria, applications rely on the use of a
validation set or cross-validation to set such parameters.

Optimization criterion Hard margin v.s. Soft margin
- a lengthy series of experiments in which various parameters are
tested
This slide is courtesy of www.iro.umontreal.ca/~pift6080/documents/papers/svm_tutorial.ppt
Summary: Support Vector Machine
1. Large Margin Classifier
Better generalization ability & less over-fitting

2. The Kernel Trick
Map data points to higher dimensional space in
order to make them linearly separable.
Since only dot product is used, we do not need to
represent the mapping explicitly.
Additional Resource
http://www.kernel-machines.org/
Demo of LibSVM
http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Module 1 Quiz - Coursera166
No ratings yet
Module 1 Quiz - Coursera166
1 page
SVM
No ratings yet
SVM
21 pages
College Documentation - Automated Image Captioning
No ratings yet
College Documentation - Automated Image Captioning
26 pages
svm
No ratings yet
svm
36 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Machine learning Lecture 03
No ratings yet
Machine learning Lecture 03
20 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM
No ratings yet
SVM
40 pages
Svm Student
No ratings yet
Svm Student
40 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
SVM
No ratings yet
SVM
44 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
SVM Class
No ratings yet
SVM Class
33 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
20 SVM
No ratings yet
20 SVM
35 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
10 SVM
No ratings yet
10 SVM
23 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
No ratings yet
Support Vector Machines (SVMS) : Cs479/679 Pattern Recognition Dr. George Bebis
37 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Svm
No ratings yet
Svm
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
9 Svm-Handout PDF
No ratings yet
9 Svm-Handout PDF
21 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
SVM
No ratings yet
SVM
28 pages
SVM
No ratings yet
SVM
11 pages
A09-Support-Vector-Machines-2up (3)
No ratings yet
A09-Support-Vector-Machines-2up (3)
15 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
MIT AI Ethics Education Curriculum
100% (1)
MIT AI Ethics Education Curriculum
94 pages
Iniya
No ratings yet
Iniya
21 pages
My First Article
No ratings yet
My First Article
10 pages
Parametric and Nonparametric Machine Learning Algorithms
No ratings yet
Parametric and Nonparametric Machine Learning Algorithms
16 pages
EXP-2-To Implement Logistic Regression
No ratings yet
EXP-2-To Implement Logistic Regression
5 pages
Studio 9 Questions
No ratings yet
Studio 9 Questions
6 pages
Impact_of_AI_Automation_on_Job_Market
No ratings yet
Impact_of_AI_Automation_on_Job_Market
5 pages
PGP in Data Science and Machine Learning Job Guarantee Program
No ratings yet
PGP in Data Science and Machine Learning Job Guarantee Program
15 pages
Agricultural 4.0 Leveraging On Technological Solutions: Study For Smart Farming Sector
No ratings yet
Agricultural 4.0 Leveraging On Technological Solutions: Study For Smart Farming Sector
9 pages
Machine Learning For Precision Medicine
No ratings yet
Machine Learning For Precision Medicine
10 pages
Blackberry IVY in Depth
No ratings yet
Blackberry IVY in Depth
12 pages
093
No ratings yet
093
9 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
DialogueGCN A Graph Convolutional Neural Network for Emotion Recognition in Conversation
No ratings yet
DialogueGCN A Graph Convolutional Neural Network for Emotion Recognition in Conversation
11 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time
22 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages
PHD Syllabus
No ratings yet
PHD Syllabus
119 pages
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
No ratings yet
MLP Week 6 NaiveBayesImplementation - Ipynb - Colaboratory
5 pages
Brain Tumor Segmentation Thesis
100% (3)
Brain Tumor Segmentation Thesis
7 pages
JOCC Volume 2 Issue 1 Page 9 19
No ratings yet
JOCC Volume 2 Issue 1 Page 9 19
11 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
ERP Features with AI
No ratings yet
ERP Features with AI
25 pages
IT8601 IQ Computational Intelligence
No ratings yet
IT8601 IQ Computational Intelligence
10 pages
AI and The Role of The Board of Directors
No ratings yet
AI and The Role of The Board of Directors
8 pages
RWKV-TS: Beyond Traditional Recurrent Neural Network For Time Series Tasks
No ratings yet
RWKV-TS: Beyond Traditional Recurrent Neural Network For Time Series Tasks
13 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Assignment 2, Machine Learning
No ratings yet
Assignment 2, Machine Learning
5 pages
Hattarki Project Report CSE 572
No ratings yet
Hattarki Project Report CSE 572
5 pages

SVM

Uploaded by

SVM

Uploaded by

An Introduction of

Support Vector Machine

Solving the Optimization Problem

Large Margin Linear Classifier

The solution of the discriminant function is

You might also like