0% found this document useful (0 votes)
34 views

EE2211 Lecture 7

Uploaded by

Tze Long Gan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

EE2211 Lecture 7

Uploaded by

Tze Long Gan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

EE2211 Introduction to Machine

Learning
Lecture 7

Thomas Yeo
[email protected]

Electrical and Computer Engineering Department


National University of Singapore

Acknowledgement: EE2211 development team


Thomas Yeo, Kar-Ann Toh, Chen Khong Tham, Helen Zhou, Robby Tan & Haizhou

© Copyright EE, NUS. All Rights Reserved.


Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Kar-Ann / Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks
2
© Copyright EE, NUS. All Rights Reserved.
Fundamental ML Algorithms:
Overfitting, Bias-Variance Tradeoff

Module III Contents


• Overfitting, underfitting & model complexity
• Regularization
• Bias-variance trade-off
• Loss function
• Optimization
• Gradient descent
• Decision trees
• Random forest

3
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set { , from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

4
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

5
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

6
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

7
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test


sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?
8
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

9
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

10
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

Training Test Set


Set Fit Fit
Order 9 Good Bad
Very Big Prediction Error
Order 1 Bad Bad
Order 2 Good Good

11
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

• If we take one of the blue


lines and compute the
square of its length, this is
called “squared error” for
that particular data point
Very Big Prediction Error • If we average squared
errors across all the red
crosses, it’s called mean
squared error (MSE) in the
test set

12
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

13
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

14
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

15
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

16
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

17
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set


Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

18
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting

Training Test Set


Set Fit Fit
Overfitting Order 9 Good Bad
Underfitting Order 1 Bad Bad
Order 2 Good Good

19
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)
20
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

21
© Copyright EE, NUS. All Rights Reserved.
Overfitting / Underfitting Schematic

Underfitting Overfitting
regime regime

22
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• For , matrix becomes invertible (Motivation 1)

23
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• might also perform better in test set, i.e., reduces


overfitting (Motivation 2) – will show example later
24
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

cost function quantifying Regularization


data fitting error in training
set

25
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Consider minimization from previous slide

L2 - Regularization

• Encourage to be small (also called shrinkage or
weight-decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

• Data-Loss(w) quantifies fitting error to training set given


parameters w: smaller error => better fit to training data
• Regularization(w) penalizes more complex models
26
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9 + = 1 Good Good

27
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, =1 Good

28
© Copyright EE, NUS. All Rights Reserved.
Regularization Example

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, =1 Good Good

29
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue High Bias: blue


High Bias

predictions on average predictions on average


not close to red target not close to red target
Low Variance: Low High Variance: high
variability among variability among
predictions predictions

30
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off
• Test error = Bias Squared + Variance + Irreducible Noise

High Bias High Variance


Low Variance Low Bias

31
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

32
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

33
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

34
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

35
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

36
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

37
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

38
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias Order 2
Achieves Lower
• Fit with order 4 polynomial: high variance, low bias Test Error
4th Order Polynomials 2nd Order Polynomials

39
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off Theorem

40
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off Theorem

41
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Trade off Theorem

42
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

43
© Copyright EE, NUS. All Rights Reserved.

You might also like