0% found this document useful (0 votes)

724 views34 pages

SVM PRESENTATION

This document provides an overview of support vector machines (SVMs). It discusses: 1) The basic concept of linear SVMs which find the optimal separating hyperplane with the maximum margin between classes. 2) How SVMs can be adapted to allow for misclassified points using slack variables. 3) How SVMs can be extended to non-linear classification by mapping the original input space to a higher dimensional feature space.

Uploaded by

Ravi Chander

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

724 views34 pages

SVM PRESENTATION

Uploaded by

Ravi Chander

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 34

Support Vector

Machine & Its

Applications

A portion (1/3) of Mingyue Tan

the slides are taken from
Prof. Andrew Moore’s
SVM tutorial at The University of British Columbia
http://www.cs.cmu.edu/~awm/tutorials
Nov 26, 2004
Overview

 Intro. to Support Vector Machines (SVM)

 Properties of SVM
 Applications
 Gene Expression Data Classification
 Text Categorization if time permits
 Discussion
Linear 
Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1 w x + b>0
denotes -1

0
b=
+
x
w
How would you
classify this data?

w x + b<0
Linear 
Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?
Linear 
Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?
Linear 
Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

Any of these
would be fine..

..but which is
best?
Linear 
Classifiers
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1

How would you

classify this data?

Misclassified
to +1 class
Classifier 
Margin
x f yest
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1 Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
Maximum 
Margin
x f yest
1. Maximizing the margin is good
accordingf(x,w,b)
to intuition and PAC
= sign(w theory
x+ b)
denotes +1 2. Implies that only support vectors are
denotes -1 important; other The
training examples
maximum
are ignorable.
margin linear
classifier
3. Empirically it works iswell.
very very the
linear classifier
Support Vectors with the, um,
are those
datapoints that maximum margin.
the margin
This is the
pushes up
against simplest kind of
SVM (Called an
LSVM)
Linear SVM
Linear SVM Mathematically
1”
= +
ss x+ M=Margin Width
l a
i c t C one
r ed z
“ P
X- -1”
b =1 s =
+ la s
wx =0 C
+b
d i c zone
t
wx 1 e
+ b =- “Pr
wx

What we know:  
(x  x )  w 2
 w . x+ + b = +1 M  
 w . x- + b = -1
w w
 w . (x+-x-) = 2
Linear SVM Mathematically
 Goal: 1) Correctly classify all training data
if yi = +1
wxifi y= -1b  1
i

wxi  b  1
for all i
2) Maximize the Margin

yi ( wxi  b)  1
same as minimize
2
 Mandb
We can formulate a Quadratic Optimization Problem and solve for w

1 w
t
 Minimize ww
2
subject to

1 t
 ( w)  w w
2
yi ( wxi  b)  1 i
Solving the Optimization
Problem
Find w and b such that
Φ(w) =½ wTw is minimized;
and for all {(xi ,yi)}: yi (wTxi + b) ≥ 1
 Need to optimize a quadratic function subject to linear constrai
nts.
 Quadratic optimization problems are a well-known class of mat
hematical programming problems, and many (rather intricate) a
lgorithms exist for solving them.
 The solution involves constructing a dual problem where a Lag
range multiplier αi is associated with every constraint in the pri
mary problem:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
The Optimization Problem
Solution
 The solution has the form:
w =Σαiyixi b= yk- wTxk for any xk such that αk 0
 Each non-zero αi indicates that corresponding xi is a suppo
rt vector.
 Then the classifying function will have the form:
f(x) = ΣαiyixiTx + b
 Notice that it relies on an inner product between the test po
int x and the support vectors xi – we will return to this later.
 Also keep in mind that solving the optimization problem inv
olved computing the inner products xiTxj between all pairs o
f training points.
Dataset with noise

denotes +1  Hard Margin: So far we require

all data points be classified correctly
denotes -1
- No training error
 What if the training set is
noisy?
- Solution 1: use very powerful
kernels

OVERFITTING!
Soft Margin Classification
Slack variables ξi can be added to allow misclass
ification of difficult or noisy examples.

What should our quadratic

11 optimization criterion be?
2 Minimize
R
1
wx
+ b=
1
7 w.w  C  εk
+ b=
0
w x b =- 1
+
2 k 1
wx
Hard Margin v.s. Soft Margin
 The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1

 The new formulation incorporating slack variables:

Find w and b such that

Φ(w) =½ wTw + CΣξi is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1- ξi and ξi ≥ 0 for all i

 Parameter C can be viewed as a way to control overfi

tting.
Linear SVMs: Overview
 The classifier is a separating hyperplane.
 Most “important” training points are support vectors; they defi
ne the hyperplane.
 Quadratic optimization algorithms can identify which training p
oints xi are support vectors with non-zero Lagrangian multiplier
s αi.
 Both in the dual formulation of the problem and in the solution
training points appear only inside dot products:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) 0 ≤ αi ≤ C for all αi

f(x) = ΣαiyixiTx + b
Non-linear SVMs
 Datasets that are linearly separable with some noise
work out great:
0 x

 But what are we going to do if the dataset is just too

hard? x
0

 How about… mapping data to a higher-dimensional

space: x2

0 x
Non-linear SVMs: Feature sp
aces
 General idea: the original input space can always be
mapped to some higher-dimensional feature space
where the training set is separable:

Φ: x → φ(x)
The “Kernel Trick”
 The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj
 If every data point is mapped into high-dimensional space via some transfor
mation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
 A kernel function is some function that corresponds to an inner product in so
me expanded feature space.
 Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
What Functions are
Kernels?
 For some functions K(xi,xj) checking that
K(xi,xj)= φ(xi) Tφ(xj) can be cumbersome.
 Mercer’s theorem:
Every semi-positive definite symmetric function is a kernel
 Semi-positive definite symmetric functions correspond to a
semi-positive definite symmetric Gram matrix:

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)

K= K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)

… … … … …
K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN)
Examples of Kernel
Functions
 Linear: K(xi,xj)= xi Txj

 Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

 Gaussian (radial-basis function network):

2
xi  x j
K (x i , x j )  exp( )
2 2

 Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

Non-linear SVMs Mathematical
ly
 Dual problem formulation:
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi

 The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

 Optimization techniques for finding αi’s remain the same!

Nonlinear SVM - Overview
 SVM locates a separating hyperplane in the
feature space and classify points in that spa
ce
 It does not need to represent the space expl
icitly, simply by defining a kernel function
 The kernel function plays the role of the dot
product in the feature space.
Properties of SVM
 Flexibility in choosing a similarity function
 Sparseness of solution when dealing with large data
sets
- only support vectors are used to specify the separating hyperp
lane
 Ability to handle large feature spaces
- complexity does not depend on the dimensionality of the featu
re space
 Overfitting can be controlled by soft margin approac
h
 Nice math property: a simple convex optimization problem
which is guaranteed to converge to a single global solution
 Feature Selection
SVM Applications

 SVM has been used successfully in many

real-world problems
- text (and hypertext) categorization
- image classification
- bioinformatics (Protein classification,
Cancer classification)
- hand-written character recognition
Application 1: Cancer
Classification
 High Dimensional
Genes
- p>1000; n<100 Patients g-1 g-2 …… g-p
P-1
 Imbalanced p-2
…….
- less positive samples
p-n
n
K [ x , x ]  k ( x, x )  
N
FEATURE SELECTION
 Many irrelevant features
 Noisy In the linear case,
wi2 gives the ranking of dim i
SVM is sensitive to noisy (mis-labeled) data 
Weakness of SVM
 It is sensitive to noise
- A relatively small number of mislabeled examples can dramati
cally decrease the performance

 It only considers two classes

- how to do multi-class classification with SVM?
- Answer:
1) with output arity m, learn m SVM’s
 SVM 1 learns “Output==1” vs “Output != 1”

 SVM 2 learns “Output==2” vs “Output != 2”

 :

 SVM m learns “Output==m” vs “Output != m”

2)To predict the output for a new input, just predict with each S
VM and find out which one puts the prediction the furthest into
the positive region.
Application 2: Text
Categorization
 Task: The classification of natural text (or
hypertext) documents into a fixed number of
predefined categories based on their content.
- email filtering, web searching, sorting documents by
topic, etc..
 A document can be assigned to more than
one category, so this can be viewed as a
series of binary classification problems, one
for each category
Representation of Text
IR’s vector space model (aka bag-of-words representation)
 A doc is represented by a vector indexed by a pre-fixed

set or dictionary of terms

 Values of an entry can be binary or weights

 Normalization, stop words, word stems

 Doc x => φ(x)
Text Categorization using
SVM
 The distance between two documents is φ(x)·φ(z)

 K(x,z) = 〈 φ(x)·φ(z) is a valid kernel, SVM can be us

ed with K(x,z) for discrimination.

 Why SVM?
-High dimensional input space
-Few irrelevant features (dense concept)
-Sparse document vectors (sparse instances)
-Text categorization problems are linearly separable
Some Issues
 Choice of kernel
- Gaussian or polynomial kernel is default
- if ineffective, more elaborate kernels are needed
- domain experts can give assistance in formulating appropriate
similarity measures

 Choice of kernel parameters

- e.g. σ in Gaussian kernel
- σ is the distance between closest points with different
classifications
- In the absence of reliable criteria, applications rely on the use
of a validation set or cross-validation to set such parameters.

 Optimization criterion – Hard margin v.s. Soft margin

- a lengthy series of experiments in which various parameters ar
e tested
Additional Resources
 An excellent tutorial on VC-dimension and Support Vec
tor Machines:
C.J.C. Burges. A tutorial on support vector machines for pattern re
cognition. Data Mining and Knowledge Discovery, 2(2):955-974,
1998.

 The VC/SRM/SVM Bible:

Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience
; 1998

http://www.kernel-machines.org/
Reference
 Support Vector Machine Classification of Microa
rray Gene Expression Data, Michael P. S. Brown
William Noble Grundy, David Lin, Nello Cristianini, C
harles Sugnet, Manuel Ares, Jr., David Haussler
 www.cs.utexas.edu/users/mooney/cs391L/svm.ppt
 Text categorization with Support Vector Machine
s:
learning with many relevant features
T. Joachims, ECML - 98

23.0 Logistic Regression-6
No ratings yet
23.0 Logistic Regression-6
24 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Lab5 Cache System
No ratings yet
Lab5 Cache System
3 pages
Feature Selection Technique
No ratings yet
Feature Selection Technique
7 pages
ML 09 SVM Draft
No ratings yet
ML 09 SVM Draft
73 pages
Neural Network and Their Applications
No ratings yet
Neural Network and Their Applications
2 pages
Regression Variable Selection Methods
No ratings yet
Regression Variable Selection Methods
30 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Fisher Linear Discriminant Analysis: Max Welling
No ratings yet
Fisher Linear Discriminant Analysis: Max Welling
4 pages
4-Data Preprocessing (Cleaning) and Exploration
No ratings yet
4-Data Preprocessing (Cleaning) and Exploration
54 pages
Gaussian Mixture Models Unit-III
No ratings yet
Gaussian Mixture Models Unit-III
13 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
13 pages
Excel Data Analysis Resources
No ratings yet
Excel Data Analysis Resources
1 page
Lab Program
100% (1)
Lab Program
15 pages
Text
No ratings yet
Text
131 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
ML Lab Manual (5cs4-23)
No ratings yet
ML Lab Manual (5cs4-23)
53 pages
AAL Programs
No ratings yet
AAL Programs
12 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
Data Analytics - Ridge and LASSO Regression
No ratings yet
Data Analytics - Ridge and LASSO Regression
15 pages
Unit-4 Part-1 ML Ai&Ml r23
No ratings yet
Unit-4 Part-1 ML Ai&Ml r23
20 pages
Amt305 Introduction To Machine Learning, Pyq
No ratings yet
Amt305 Introduction To Machine Learning, Pyq
5 pages
MATLAB vs Octave: A Beginner's Guide
No ratings yet
MATLAB vs Octave: A Beginner's Guide
91 pages
Algorithms and Data Structures: Dynamic Programming Matrix-Chain Multiplication
No ratings yet
Algorithms and Data Structures: Dynamic Programming Matrix-Chain Multiplication
17 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Module 1 Quiz - Coursera166
No ratings yet
Module 1 Quiz - Coursera166
1 page
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
No ratings yet
Sat - 13.Pdf - Child Mortality Prediction Using Machine Learning
11 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Hierarchical Clustering Guide
No ratings yet
Hierarchical Clustering Guide
35 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
Unit 5
No ratings yet
Unit 5
29 pages
Regression for ML Beginners
No ratings yet
Regression for ML Beginners
18 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
58 pages
Numerical Computation - Computing
No ratings yet
Numerical Computation - Computing
97 pages
Machine Learning - Question
No ratings yet
Machine Learning - Question
5 pages
Deep Learning Exam Solutions
No ratings yet
Deep Learning Exam Solutions
14 pages
ML Unit 1-Notes
No ratings yet
ML Unit 1-Notes
21 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
IEEE 13 Node Test Feeder
No ratings yet
IEEE 13 Node Test Feeder
11 pages
Sentiment Analysis Over Online Product Reviews A Survey
No ratings yet
Sentiment Analysis Over Online Product Reviews A Survey
9 pages
AI Statistical Methods Course
No ratings yet
AI Statistical Methods Course
23 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
Naïve Bayes Classifier Algorithm
No ratings yet
Naïve Bayes Classifier Algorithm
10 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Quiz Week 7 - Support Vector Machines
100% (1)
Quiz Week 7 - Support Vector Machines
3 pages
Neural Networks & SVMs in AI
No ratings yet
Neural Networks & SVMs in AI
19 pages
Simplex Method in OR
No ratings yet
Simplex Method in OR
10 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
No ratings yet
Feature Selection Techniques For ML - A Survey of More Than Two Decades of Research - Dipti Theng
63 pages
1
No ratings yet
1
2 pages
Numerical Analysis MCQs
No ratings yet
Numerical Analysis MCQs
25 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Machine Learning for Fault Detection
No ratings yet
Machine Learning for Fault Detection
9 pages
Single Layer & Multilayer Perceptron
No ratings yet
Single Layer & Multilayer Perceptron
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Skinandskindiseases 160529105327
No ratings yet
Skinandskindiseases 160529105327
32 pages
IP Cam Setup Guide and Manual
No ratings yet
IP Cam Setup Guide and Manual
37 pages
BP6 11alzheimer2
No ratings yet
BP6 11alzheimer2
74 pages
GLCM Generalization & New Features
No ratings yet
GLCM Generalization & New Features
7 pages
Intro. To Support Vector Machines (SVM) Properties of SVM Applications
No ratings yet
Intro. To Support Vector Machines (SVM) Properties of SVM Applications
6 pages
K-Means Questions: K K K K K
No ratings yet
K-Means Questions: K K K K K
3 pages
Phdthesis: Docteur de L'Universite Pierre Et Marie Curie
No ratings yet
Phdthesis: Docteur de L'Universite Pierre Et Marie Curie
215 pages
Scientific Writing for DCU Students
No ratings yet
Scientific Writing for DCU Students
34 pages
Heart Rate Monitoring Ir Sensor
100% (1)
Heart Rate Monitoring Ir Sensor
29 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
Journal of Statistical Software: Actuar: An R Package For Actuarial Science
No ratings yet
Journal of Statistical Software: Actuar: An R Package For Actuarial Science
37 pages
Class 12 IP Practice Assignment Series 12
No ratings yet
Class 12 IP Practice Assignment Series 12
3 pages
AP Statistics - 2014-2015 Semester 1 Test 3
No ratings yet
AP Statistics - 2014-2015 Semester 1 Test 3
4 pages
Sta1610 2023 TL 101 0 B
No ratings yet
Sta1610 2023 TL 101 0 B
30 pages
Strategic Management CH 3 and 4
No ratings yet
Strategic Management CH 3 and 4
82 pages
Norms and Basic Statistics For Testing
No ratings yet
Norms and Basic Statistics For Testing
15 pages
Iot Da1
No ratings yet
Iot Da1
16 pages
Q-7, Ritika
No ratings yet
Q-7, Ritika
3 pages
MAEd Program for Educators
No ratings yet
MAEd Program for Educators
4 pages
CUSUM Test for Steel I-Beam Quality
No ratings yet
CUSUM Test for Steel I-Beam Quality
4 pages
Marital Adjustment and Life Satisfaction PDF
No ratings yet
Marital Adjustment and Life Satisfaction PDF
9 pages
Qualitative Quantitative Research Methodology Exploring The Interactive Continuum
100% (4)
Qualitative Quantitative Research Methodology Exploring The Interactive Continuum
237 pages
Econometrics Vs ML
No ratings yet
Econometrics Vs ML
45 pages
Lesson 15 - Crossover Designs
No ratings yet
Lesson 15 - Crossover Designs
20 pages
Probability and Statistics MCQs Question and Answ 2
No ratings yet
Probability and Statistics MCQs Question and Answ 2
1 page
6 One Hot Encoding
No ratings yet
6 One Hot Encoding
3 pages
Module Handbook Bremerhafen
No ratings yet
Module Handbook Bremerhafen
55 pages
Econometrics: Dr. Sayyid Salman Rizavi
0% (1)
Econometrics: Dr. Sayyid Salman Rizavi
23 pages
Chapter-2 Representation of Data Lecture
No ratings yet
Chapter-2 Representation of Data Lecture
24 pages
Cisco Press Computer Networking Data Analytics Developing in
No ratings yet
Cisco Press Computer Networking Data Analytics Developing in
469 pages
Kerala Trip Attraction Model
No ratings yet
Kerala Trip Attraction Model
9 pages
Two-Way ANOVA for Researchers
No ratings yet
Two-Way ANOVA for Researchers
13 pages
Syllabus MSC Photonics - 2020
No ratings yet
Syllabus MSC Photonics - 2020
81 pages
Death & Dying, Life & Living 7th Edition (Ebook PDF) Download
100% (1)
Death & Dying, Life & Living 7th Edition (Ebook PDF) Download
90 pages
20 +Hamzah+et+al,+ID+1427,+209-218
No ratings yet
20 +Hamzah+et+al,+ID+1427,+209-218
10 pages
Grade 7 PT
No ratings yet
Grade 7 PT
5 pages
Food Chemistry: Analytical Methods
No ratings yet
Food Chemistry: Analytical Methods
7 pages
Ruisen PDF
No ratings yet
Ruisen PDF
38 pages
Pinnacle - Quantitative Reliability Optimization (QRO) Executive Brief
100% (1)
Pinnacle - Quantitative Reliability Optimization (QRO) Executive Brief
9 pages

SVM PRESENTATION

Uploaded by

SVM PRESENTATION

Uploaded by

Support Vector

Machine & Its

A portion (1/3) of Mingyue Tan

 Intro. to Support Vector Machines (SVM)

How would you

How would you

How would you

denotes +1  Hard Margin: So far we require

What should our quadratic

 The new formulation incorporating slack variables:

Find w and b such that

 Parameter C can be viewed as a way to control overfi

 But what are we going to do if the dataset is just too

 How about… mapping data to a higher-dimensional

K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)

K= K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)

 Polynomial of power p: K(xi,xj)= (1+ xi Txj)p

 Gaussian (radial-basis function network):

 Sigmoid: K(xi,xj)= tanh(β0xi Txj + β1)

 The solution is:

f(x) = ΣαiyiK(xi, xj)+ b

 Optimization techniques for finding αi’s remain the same!

 SVM has been used successfully in many

 It only considers two classes

 SVM 2 learns “Output==2” vs “Output != 2”

 SVM m learns “Output==m” vs “Output != m”

set or dictionary of terms

 Normalization, stop words, word stems

 K(x,z) = 〈 φ(x)·φ(z) is a valid kernel, SVM can be us

 Choice of kernel parameters

 Optimization criterion – Hard margin v.s. Soft margin

 The VC/SRM/SVM Bible:

You might also like