ML Lecture#3
ML Lecture#3
Lecture # 3
Single & Multilayer Percceptron
1
Artificial Neural Network -
Perceptron
A (Linear) Decision Boundary
Represented by:
One artificial neuron
called a “Perceptron”
If enough sufficient
signals accumulate, the
neuron fires a signal.
a = x1w1+x2w2+x3w3... +xnwn
5
Perceptron
Perceptron
• input signals ‘x’ and weights ‘w’ are multiplied
• weights correspond to connection strengths
• signals are added up – if they are enough, FIRE!
x1 w1 add
if (a t)
M
x2 output output
output
w2 a xi w i 1 signa
else lsignal
i1
output
x3 w3 0
incoming connection
activation
signal strength
level
Calculation…
M Sum notation
a xi (just like a loop from 1 to
M)
wi
i1
Multiple corresponding
double[] x = elements and add them
up
a (activation)
double[] w =
if (activation > threshold) FIRE !
Perceptron Decision Rule
M
xw
i i t then output 1, else output
if
i 1 0
M
xw
if i i t then ou tp u t 1, els e ou tp u t 0
i 1
output = 0
output = 1
Is this a good decision boundary?
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 1.0
w2 = 0.2
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 2.1
w2 = 0.2
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = 1.9
w2 = 0.02
t = 0.05
M
xw
if
i 1
i i
t then output 1, else output
0
w1 = -0.8
w2 = 0.03
t = 0.05
t 1.0 x2
w2
x3 w3
x3 w3
x3 w3
M
if x w t then output 1, else output 0
i 1 i i
M
if x w t 0 then output 1, else output
i 1 i i 0
x w (1
M
if t)i 0 then output 1, else output
i1 i
0
if M
0 then output 1, else output
i 1 x w
i i ( x w )
0 0
0
We now treat the threshold like any other weight with a permanent input of -1
The Bias
False Input
Perceptron Learning
Algorithm
initialise weights (w)
Repeat until all points are correctly classified
Repeat for each point
Calculate margin (yiwXi) for point i)
If margin > 0, point is correclty
classified
Else change the weights to increase margin such that
Δw = ηyiXi and wnew = wold + Δw
end
end
LINEARLY SEPARABLE
problems
With a perceptron…
the decision boundary is
LINEAR
A
0
0 1
B
Overview of SVM w.r.t.
Perceptron
Perceptron
Perceptron VS SVM
Perceptron VS SVM
• The Perceptron does not try to optimize the separation
"distance". As long as it finds a hyperplane that separates the
two sets, it is good. SVM on the other hand tries to maximize
the "support vector", i.e., the distance between two closest
opposite sample points.
• The SVM typically tries to use a "kernel function" to project
the sample points to high dimension space to make them
linearly separable, while the perceptron assumes the sample
points are linearly separable.
• SVM Requires more parameters as compared to
– choice of kernel
– selection of kernel parameters
– selection of the value of the margin parameter
SVM and Margins
SVM for Nonlinear Data
Acknowledgements
Introduction to Machine Learning, Alphaydin
Statistical Pattern Recognition: A Review – A.K Jain et al., PAMI (22) 2000
Pattern Recognition and Analysis Course – A.K. Jain, MSU
Pattern Classification” by Duda et al., John Wiley & Sons.
Material in these slides has been taken from, the following
resources
37