Lec3ML - Preceptron - Updated v4
Lec3ML - Preceptron - Updated v4
Perceptron Algorithm
Ghada Khoriba
[email protected]
1
Linear classifiers
Inputs
parameters
2
Evaluation criteria
• The quality of predictions from a learned model is often expressed in terms
of a loss function. A loss function 𝑳(𝒈, 𝒂) tells you how much you will be
penalized for making a guess 𝑔 when the answer is actually 𝑎.
• There are many possible loss functions. Here are some frequently used
examples:
• 0 − 1 Loss applies to predictions drawn from finite domains.
3
Evaluation criteria
Why does Gradient Descent work with Sum Of Squared
• Squared loss Errors? Remember that gradient descent algorithm uses
the derivative of the function to be minimized. Squaring
the differences makes this error function differentiable
i.e. we can find the derivative of this function easily
• Linear loss
• Asymmetric loss Consider a situation in which you are trying to predict whether someone is having a
heart attack. It might be much worse to predict “no” when the answer is really “yes”, than the other
way around.
4
Evaluation Matrix, How good our model is?
6 1
2 5
5
Ref: udacity/machine-learning
Accuracy: out of all data how many did you classify correctly?
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑝𝑜𝑖𝑛𝑡𝑠
Accuracy=
𝑎𝑙𝑙 𝑝𝑜𝑖𝑛𝑡𝑠
Predicted Predicted
Sick Healthy
1000+8000
Accuracy= = 0.9= 90%
Sick 1000 200 10000
6
Ref: udacity/machine-learning
When will Accuracy not work?
199,567 433
All
classified
as cats
199,567
Accuracy= = 0.9978= 99.8%
200,000
All
classified
as dog
433
Accuracy= = 0.00216 = 0.2%
200,000
7
Sick, healthy
False Positive, or False Negative Spam, not Spam
In the medical example, what is worse, a False Positive, In the spam detector example, what is worse, a
or a False Negative? False Positive, or a False Negative?
8
Evaluating a learning algorithm
• How should we evaluate the performance of a classifier ℎ?
• The best method is to measure test error on data that was not used to train
it.
• How should we evaluate the performance of a learning algorithm?
This is trickier. There are many potential sources of variability in the
possible result of computing test error on a learned hypothesis ℎ:
• Which particular training examples occurred in 𝐷𝑛
• Which particular testing examples occurred in 𝐷𝑛′
• Randomization inside the learning algorithm itself
• Generally, we would like to execute the following process multiple
times:
• Train on a new training set
• Evaluate resulting ℎ on a testing set that does not overlap the training set
9
Evaluating a learning algorithm
• Doing this multiple times controls for possible poor choices of training
set or unfortunate randomization inside the algorithm itself.
• in many applications, data is expensive or difficult to acquire.
• We can re-use data with cross-validation
11
Recall: Classifiers 0
x2
• A linear classifier: > 0
✓0
h(x; ✓⇢
, ✓0 ) = sign( ✓> x + ✓0 ) x+ ✓0 e
=
✓
>
+ 1 if ✓> x + ✓0 > 0 : > x
+ n
la
=
− 1 if ✓> x + ✓0 0
x ✓ r p
✓ x : yp e
• Hypothesis class H of all
h x 1
linear classifiers 0
⇢ <
• 0-1 Loss
0 if g = a ✓0
L (g, a) = x+
:✓
>
1 else
• Training error x
n
1X
En (h) = L (h(x ( i ) ), y( i ) )
n i= 1
• Example learning algorithm (given hypotheses h( j ))
12
Ex_learning_alg( D n ; k )
changed = False A. point is not on the lin
for i = 1 to n & prediction is wrong
Perceptron B. point isideaon the line
if y ( ✓ x + ✓0 ) 0 observation passed through the network, the fitted value is
(i ) > (i ) Learning Rule. The is that, for each
Return ✓, ✓0 = y ( ✓ x + ✓0 ) + (y ) (x
(i ) > (i ) (i ) 2 (i )> (i )
x + 1
= y( i ) ( ✓> x ( i ) + ✓0 ) + (kx ( i ) k2 + 1)
13
Example/ Preceptron
• Current weight vector: 𝜃1 =(1,2,-2) , 𝜃2 =(3,-2,-1), 𝜃3 = (-1,2,4)
• Next training data point 𝑓 𝑥 =(1,-0.5, 3), 𝑦 = 2
14
Classifier Quality
15
Classifier Quality
16
Classifier Quality
17
Classifier Quality
18
Classifier Quality
19
Theorem: Perceptron Performance
20