0% found this document useful (0 votes)

35 views22 pages

04SVM

The document discusses Support Vector Machines (SVM) and their optimization through Regularized Risk Minimization (RRM). It covers concepts such as the margin, hard margin SVM, soft margin SVM, and the transition from primal to dual formulations. The importance of support vectors and the use of slack variables in non-separable data are also highlighted.

Uploaded by

zhanghaojing62

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views22 pages

04SVM

Uploaded by

zhanghaojing62

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topic 4: SUPPORT VECTOR MACHINES

STAT 37710/CAAM 37710/CMSC 35400 Machine Learning

Risi Kondor, The University of Chicago
Regularized Risk Minimization (RRM)
Find the hypothesis fb by solving a problem of the form

1 X
m
fb = arg min ℓ(f (xi ), yi ) + λ Ω[f ]
f ∈F m | {z }
i=1
| {z } regularizer
training error

• F can be quite a rich hypothesis space.

• The purpose of the regularizer is to avoid overfitting.
• λ is a tunable parameter.
• ℓ(yb, y) : loss function
• ℓ might or might not be the same loss as in Etrue .

[Tykhonov regularization] [Vapnik 1970’s–]

2
2/22
/22
Optimization: equality constraints

Problem:
minimize f (x) subject to g(x) = c.
x∈Rn

1. Form the Lagrangian L(x, λ) = f (x) − λ (g(x) − c) .

2. The solution must be at a critical point of L . → Setting

∂L(x, λ)
=0 i = 1, 2, . . . , n.
∂xi
yields a curve of solutions x = γ(λ) .
3. Reintroducing the constraint g(γ(λ))=c gives λ , hence the optimal x .

3
3/22
/22
Optimization: inequality constraints
Problem:
minimize f (x) subject to g(x) ≥ c.
x∈Rn

1. Form the Lagrangian L(x, λ) = f (x) − λ (g(x) − c) .

2. Introduce the dual function

h(λ) = inf L(x, λ).

3. Solve the dual problem

λ∗ = argmax h(λ) subject to λ ≥ 0.

4. The optimal x is inf x L(x, λ∗ ) (assuming strong duality).

When f is a convex function and g(x) ≥ c defines a convex region of
space, this gives the global optimum.
4
4/22
/22
Karush–Kuhn–Tucker conditions

At the optimal solution x∗ of

minimize f (x) subject to g(x) ≥ c.

x∈Rn

either
1. we are the boundary → g(x∗ ) = c or
2. we are at an interior point → λ∗ = 0.

→ Complementary slackness: λ∗ (g(x∗ ) − c) = 0.

5
5/22
/22
Support Vector Machines
Linear classifiers

To apply RRM, go back to binary classification in Rn with a linear (affine)

hyperplane:

Input space: X = Rn
Output space: Y = {−1, +1}
Hypothesis:
f (x) = w · x + b.
h(x) = sgn(f (x))
(Note the sneaky difference between f and h )

Question: Of all possible hyperplanes that separate the data which one do
we choose?

7
7/22
/22
The margin
Recall, the margin of a point (x, y) to the hyperplane f (x) = w · x + b = 0
(with ∥w∥ = 1 ) is
y (w · x + b).

The margin of a dataset S = {(x1 , y1 ), . . . , (xm , ym )} to f is

mini yi (w · xi + b) .

In the case of the perceptron we saw that having a large margin is desirable.

IDEA: Choose w and b explicitly to maximize the margin! → Support

Vector Machines (SVM)

8
8/22
/22
Maximizing the margin

Choose the hyperplane that has the largest margin!

9
9/22
/22
Hard Margin Support Vector
Machine

Given a dataset S = {(x1 , y1 ), . . . , (xm , ym )} ,

maximize δ s.t. yi (w · xi + b) ≥ δ ∀i.

∥w∥=1, b

Equivalent formulation: drop the ∥w∥ = 1 constraint and solve

1
minimize ∥w∥2 s.t. yi (w · xi + b) ≥ 1 ∀i.
w, b 2

10
10/22
/22
The primal problem

The primal SVM optimization problem

1
minimize ∥w∥2 s.t. yi (w · xi + b) ≥ 1 ∀i
w,b 2

This is a nice convex optimization problem (a QP) with a unique minimum.

→ Introduce a Lagrangian.

11
11/22
/22
From primal to dual
1
minimize ∥w∥2 s.t. yi (w · xi + b) ≥ 1 ∀i
w,b 2
Lagrangian:
P
L(w, b, α) = 1
2 ∥w∥2 − i αi (yi (w · xi + b) − 1)
P
∂
∂wi L(w, b, α) =0 ⇒ w− i α i y i xi =0
P
∂
∂b L(w, b, α) =0 ⇒ i α i yi =0

Dual function:
X 1X
L(α) = αi − αi αj yi yj (xi · xj )
2
i i,j

12
12/22
/22
The dual problem
The dual SVM optimization problem
X 1X
maximize L(α) = αi − αi αj yi yj (xi · xj )
α1 ,...,αm 2
i i,j
X
subject to yi αi = 0 and αi ≥ 0 ∀i
i

Still a QP, but in fewer variables, so easier to solve. In particular,

hX i hX i
h(x) = sgn αi yi (x · xi ) + b = sgn γi (x · xi ) + b ,
i i

where γi = yi αi . → The solution lies in the span of the data,

P
w= i γi xi .

13
13/22
/22
Support vector machine

14
14/22
/22
Sparsity of support vectors

The KKT conditions prescribe that

αi (yi (xi · w + b) − 1) = 0 ∀i

So αi ̸= 0 only for those examples that lie exactly on the margin, and
therefore only these “support vectors” influence the solution
hX i
h(x) = sgn α i y i ( x · xi ) + b
i

→ Sparsity is a precious thing.

Question: But what about non-separable data? → Soft margin SVMs

15
15/22
/22
The Soft Margin SVM

The primal SVM optimization problem

1 CX
minimize ∥w∥2 + ξi s.t. yi (w · xi + b) ≥ 1 − ξi ξi ≥ 0 ∀i
w,b,ξ1 ,...,ξm 2 m
i

The ξi ’s are called slack variables and C is a “softness parameter”

[Cortes & Vapnik, 1995]

16
/22
16/22
From primal to dual

1 CX
minimize ∥w∥2 + ξi s.t. yi (w · xi + b) ≥ 1 − ξi ξi ≥ 0 ∀i
w,b,ξ1 ,...,ξm 2 m
i

Lagrangian:
P P P
L(w, b, α, β) = 12 ∥w∥2 + m
C
i ξi − i αi (yi (w·xi +b)−1+ξi )− i β i ξi
P
∂
∂wi L(w, b, α, β) =0 ⇒ w − i αi yi xi = 0
P
∂
∂b L(w, b, α, β) =0 ⇒ i α i yi = 0

∂
∂ξi L(w, b, α, β) =0 ⇒ αi + βi = C
m

17
17/22
/22
Soft margin SVM dual

The dual SVM optimization problem

X 1X
maximize L(α) = αi − αi αj yi yj (xi · xj )
α1 ,...,αm 2
i i,j
X C
subject to yi αi = 0 and 0 ≤ αi ≤ ∀i
m
i

18
18/22
/22
SVM is just a form of RRM
At the optimum of the primal problem the slacks are as small as possible:

ξi = max {0, 1 − yi (w · xi + b)} = (1 − yi (w · xi + b))≥0 ,

| {z }
ℓhinge (w·xi ,yi )
where (z)≥0 = max(0, z) .

The soft-margin SVM finds

X m
b 1 1 2
f = argmin ℓhinge (f (xi ), yi ) + ∥w∥ .
f ∈F m |2C {z }
i=1
| {z } regularizer
empirical loss

where F is the hypothesis space of f (x) = w · x + b linear functions.

19
19/22
/22
Loss functions for classification

20
20/22
/22
Loss functions for regression

21
21/22
/22
22
22/22
/22

Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Main 7
No ratings yet
Main 7
25 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
40 pages
Class06 SVM
No ratings yet
Class06 SVM
47 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
SVM: Classification & Optimization
No ratings yet
SVM: Classification & Optimization
44 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
55 pages
Lec 06 SVM
No ratings yet
Lec 06 SVM
34 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
09 SupportVectorMachines
No ratings yet
09 SupportVectorMachines
58 pages
An Overview On Support Vector Machines
No ratings yet
An Overview On Support Vector Machines
14 pages
3 Classification 2
No ratings yet
3 Classification 2
27 pages
Lecture 17 - Hyperplane Classifiers - SVM - Plain
No ratings yet
Lecture 17 - Hyperplane Classifiers - SVM - Plain
16 pages
History and Basics of Support Vector Machines
No ratings yet
History and Basics of Support Vector Machines
35 pages
10 SVM
No ratings yet
10 SVM
23 pages
Lec15 16
No ratings yet
Lec15 16
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Foundations of Machine Learning: Part A: Logistic Regression
No ratings yet
Foundations of Machine Learning: Part A: Logistic Regression
63 pages
L5 SVMs
No ratings yet
L5 SVMs
37 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Overview and Applications
No ratings yet
SVM Overview and Applications
33 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
ML - 5 Sovan LR SVM 1
No ratings yet
ML - 5 Sovan LR SVM 1
59 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
100% (1)
Support Vector Machines - An Introduction: Department of Electrical Engineering Technion, Israel
44 pages
DDA3020 Lecture 07 SVM II Annotated
No ratings yet
DDA3020 Lecture 07 SVM II Annotated
44 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
28 pages
Kernel SVM For Image Classification
No ratings yet
Kernel SVM For Image Classification
20 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
36 pages
Classification: Linear SVM
No ratings yet
Classification: Linear SVM
26 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines in Machine Learning
No ratings yet
Support Vector Machines in Machine Learning
11 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
34 pages
SVM Classifiers: A Technical Guide
No ratings yet
SVM Classifiers: A Technical Guide
44 pages
2024 Scu ML 2 1 SVM
No ratings yet
2024 Scu ML 2 1 SVM
36 pages
SVMs & Lagrange Multipliers Lecture
No ratings yet
SVMs & Lagrange Multipliers Lecture
42 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
44 pages
Intro SVM PDF
No ratings yet
Intro SVM PDF
47 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Applications and Properties
100% (1)
SVM Applications and Properties
34 pages
cs221 Lecture11
No ratings yet
cs221 Lecture11
71 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
4 - Mathematical Foundations You'Ll Actually Use
No ratings yet
4 - Mathematical Foundations You'Ll Actually Use
19 pages
Agrarian Reform Law PRC 1950
No ratings yet
Agrarian Reform Law PRC 1950
58 pages
00intro 1
No ratings yet
00intro 1
43 pages
11 CNNs
No ratings yet
11 CNNs
64 pages
2.4 Owen Dulce
No ratings yet
2.4 Owen Dulce
2 pages
Blythe&Croft2012 (S-Curves and The Mechanisms of Propagation in Language Change)
No ratings yet
Blythe&Croft2012 (S-Curves and The Mechanisms of Propagation in Language Change)
36 pages
Ghoreishi Et Al., 2017
No ratings yet
Ghoreishi Et Al., 2017
10 pages
Stability Analysis in Control Systems
No ratings yet
Stability Analysis in Control Systems
7 pages
Class Nr. 5: The Measure of The Unit Ball in R
No ratings yet
Class Nr. 5: The Measure of The Unit Ball in R
4 pages
Overview of Functional Analysis
50% (2)
Overview of Functional Analysis
5 pages
MGMNT X115 Business Statistics (Online) Summer 2014
No ratings yet
MGMNT X115 Business Statistics (Online) Summer 2014
6 pages
2017TJS53
No ratings yet
2017TJS53
8 pages
IBS Hyderabad : Programme: Mba Course Code Course Title Faculty Name Consultation Hours (Day/time)
No ratings yet
IBS Hyderabad : Programme: Mba Course Code Course Title Faculty Name Consultation Hours (Day/time)
10 pages
Compound Interest Convergence
No ratings yet
Compound Interest Convergence
3 pages
Parametric Equations in Parabolas
No ratings yet
Parametric Equations in Parabolas
6 pages
17csl58 - Database Management System Lab With Mini Project
No ratings yet
17csl58 - Database Management System Lab With Mini Project
66 pages
Derivatives - Solve The Differential Equation - $ (Y 2-Xy) DX+X 2dy 0$ - Mathematics Stack Exchange
No ratings yet
Derivatives - Solve The Differential Equation - $ (Y 2-Xy) DX+X 2dy 0$ - Mathematics Stack Exchange
2 pages
Inverse Functions and Their Graphs
No ratings yet
Inverse Functions and Their Graphs
2 pages
Numerical Differentiation and Integration-BK9
No ratings yet
Numerical Differentiation and Integration-BK9
19 pages
Statstics Activity
No ratings yet
Statstics Activity
2 pages
Vector Calculus & Differential Equations
No ratings yet
Vector Calculus & Differential Equations
62 pages
Ka Detn Spectroph
No ratings yet
Ka Detn Spectroph
3 pages
Asymptotic Notation Overview
No ratings yet
Asymptotic Notation Overview
39 pages
Clarifications on Absolute Fit Indices
No ratings yet
Clarifications on Absolute Fit Indices
4 pages
DISS-1 and 2
No ratings yet
DISS-1 and 2
3 pages
EEM424 Design of Experiments: An Introduction To
No ratings yet
EEM424 Design of Experiments: An Introduction To
45 pages
Pharma Quality by Design Review
No ratings yet
Pharma Quality by Design Review
6 pages
Optimization Course Overview
No ratings yet
Optimization Course Overview
36 pages
Coherence, Reference, and The Theory of Grammar: Andrew Kehler
No ratings yet
Coherence, Reference, and The Theory of Grammar: Andrew Kehler
12 pages
KR20 & Coefficient Alpha: Their Equivalence For Binary Scored Items
No ratings yet
KR20 & Coefficient Alpha: Their Equivalence For Binary Scored Items
7 pages
Coddington E., Levinson N. - Theory of Ordinary Differential Equations PDF
91% (22)
Coddington E., Levinson N. - Theory of Ordinary Differential Equations PDF
444 pages
Notes On MATH 441
No ratings yet
Notes On MATH 441
122 pages
07a80805 Optimizationofchemicalprocesses
No ratings yet
07a80805 Optimizationofchemicalprocesses
8 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
15 pages
Investigation and Analysis of The Safety Risk Factors of Aging Construction Workers
No ratings yet
Investigation and Analysis of The Safety Risk Factors of Aging Construction Workers
14 pages

04SVM

Uploaded by

04SVM

Uploaded by

Topic 4: SUPPORT VECTOR MACHINES

STAT 37710/CAAM 37710/CMSC 35400 Machine Learning

• F can be quite a rich hypothesis space.

[Tykhonov regularization] [Vapnik 1970’s–]

1. Form the Lagrangian L(x, λ) = f (x) − λ (g(x) − c) .

1. Form the Lagrangian L(x, λ) = f (x) − λ (g(x) − c) .

h(λ) = inf L(x, λ).

3. Solve the dual problem

λ∗ = argmax h(λ) subject to λ ≥ 0.

4. The optimal x is inf x L(x, λ∗ ) (assuming strong duality).

At the optimal solution x∗ of

minimize f (x) subject to g(x) ≥ c.

→ Complementary slackness: λ∗ (g(x∗ ) − c) = 0.

To apply RRM, go back to binary classification in Rn with a linear (affine)

The margin of a dataset S = {(x1 , y1 ), . . . , (xm , ym )} to f is

IDEA: Choose w and b explicitly to maximize the margin! → Support

Choose the hyperplane that has the largest margin!

Given a dataset S = {(x1 , y1 ), . . . , (xm , ym )} ,

maximize δ s.t. yi (w · xi + b) ≥ δ ∀i.

Equivalent formulation: drop the ∥w∥ = 1 constraint and solve

The primal SVM optimization problem

This is a nice convex optimization problem (a QP) with a unique minimum.

Still a QP, but in fewer variables, so easier to solve. In particular,

where γi = yi αi . → The solution lies in the span of the data,

The KKT conditions prescribe that

→ Sparsity is a precious thing.

Question: But what about non-separable data? → Soft margin SVMs

The primal SVM optimization problem

The ξi ’s are called slack variables and C is a “softness parameter”

[Cortes & Vapnik, 1995]

The dual SVM optimization problem

ξi = max {0, 1 − yi (w · xi + b)} = (1 − yi (w · xi + b))≥0 ,

The soft-margin SVM finds

where F is the hypothesis space of f (x) = w · x + b linear functions.

You might also like