Machine learning Lecture 03
Machine learning Lecture 03
wT x + b < 0 x1
Linear Discriminant
denotes +1
Function denotes -1
• How would you classify x2
these points using a linear
discriminant function in
order to minimize the error
rate?
x1
Linear Discriminant
denotes +1
Function denotes -1
• How would you classify x2
these points using a linear
discriminant function in
order to minimize the error
rate?
x1
Linear Discriminant
denotes +1
Function denotes -1
• How would you classify x2
these points using a linear
discriminant function in
order to minimize the error
rate?
x1
Linear Discriminant
denotes +1
Function denotes -1
• How would you classify x2
these points using a linear
discriminant function in
order to minimize the error
rate?
x1
Large Margin Linear
denotes +1
Classifier denotes -1
• Aim: Learn a large margin x2
Margin
classifier
• Mathematical Formulation: x+
2
maximize
w
x+
such that
For yi 1, wT xi b 1 n
x-
For yi 1, wT xi b 1
x1
Large Margin Linear
denotes +1
Classifier denotes -1
• Formulation: x2
Margin
1 2 x+
minimize w
2
such that x+
For yi 1, wT xi b 1 n
x-
For yi 1, wT xi b 1
x1
Large Margin Linear
denotes +1
Classifier denotes -1
• Formulation: x2
Margin
1 2 x+
minimize w
2
such that x+
yi (wT xi b) 1 n
x-
x1
Solving the Optimization
Problem
Quadratic
programming 1 2
with linear
minimize w
2
constraints
s.t. yi (wT xi b) 1
Lagrangian
Function
2 i 1
s.t. i 0
Solving the Optimization
Problem
minimize Lp (w, b, i ) w i yi (wT xi b) 1
n
1 2
2 i 1
s.t. i 0
Lp
n
0 w i yi xi
w i 1
n
Lp
0 y i i 0
b i 1
Solving the Optimization
Problem
From the equations, we can prove
that: (KKT conditions): x2
i yi (w xi b) 1 0
T x+
i 0
x+
x1
Large Margin Linear
Formulation:
Classifier
n
1
w C i
2
minimize
2 i 1
such that
yi (wT xi b) 1 i
i 0
Φ: x → φ(x)
Sigmoid:
K (xi , x j ) tanh(0 xTi x j 1 )
In general, functions that satisfy Mercer’s condition can be
kernel functions.