0% found this document useful (0 votes)

34 views

EE2211 Lecture 7

Uploaded by

Tze Long Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

EE2211 Lecture 7

Uploaded by

Tze Long Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

EE2211 Introduction to Machine

Learning
Lecture 7

Thomas Yeo
[email protected]

Electrical and Computer Engineering Department

National University of Singapore

Acknowledgement: EE2211 development team

Thomas Yeo, Kar-Ann Toh, Chen Khong Tham, Helen Zhou, Robby Tan & Haizhou

© Copyright EE, NUS. All Rights Reserved.

Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Kar-Ann / Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks
2
© Copyright EE, NUS. All Rights Reserved.
Fundamental ML Algorithms:
Overfitting, Bias-Variance Tradeoff

Module III Contents

• Overfitting, underfitting & model complexity
• Regularization
• Bias-variance trade-off
• Loss function
• Optimization
• Gradient descent
• Decision trees
• Random forest

3
© Copyright EE, NUS. All Rights Reserved.
Regression Review
• Goal: Given feature(s) , we want to predict target
– can be 1-D or more than 1-D
– is 1-D
• Two types of input data
– Training set , from
– Test set { , from
• Learning/Training
– Training set used to estimate regression coefficients
• Prediction/Testing/Evaluation
– Prediction performed on test set to evaluate performance

4
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Linear Case
• is 1D & is 1-D
• Linear relationship between &
• Illustration (4 training samples):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

5
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

6
© Copyright EE, NUS. All Rights Reserved.
Regression Review: Polynomial
• is 1-D (or more than 1-D) & is 1-D
• Polynomial relationship between &
• Quadratic illustration (4 training samples, is 1-D):

• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

7
© Copyright EE, NUS. All Rights Reserved.
Note on Training & Test Sets
• Linear is special case of polynomial => use “P”
instead of “X” from now on
• Training/Learning (primal) on training set

• Prediction/Testing/Evaluation on test set

• There should be zero overlap between training & test

sets
• Important goal of regression: prediction on new
unseen data, i.e., test set
• Why is test set important for evaluation?
8
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

9
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

10
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

Training Test Set

Set Fit Fit
Order 9 Good Bad
Very Big Prediction Error
Order 1 Bad Bad
Order 2 Good Good

11
© Copyright EE, NUS. All Rights Reserved.
Overfitting Example
Big Prediction Error

• If we take one of the blue

lines and compute the
square of its length, this is
called “squared error” for
that particular data point
Very Big Prediction Error • If we average squared
errors across all the red
crosses, it’s called mean
squared error (MSE) in the
test set

12
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

13
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

14
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

15
© Copyright EE, NUS. All Rights Reserved.
Underfitting Example

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

16
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

17
© Copyright EE, NUS. All Rights Reserved.
“Just Nice”

Training Test Set

Set Fit Fit
Order 9 Good Bad
Order 1 Bad Bad
Order 2 Good Good

18
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting

Training Test Set

Set Fit Fit
Overfitting Order 9 Good Bad
Underfitting Order 1 Bad Bad
Order 2 Good Good

19
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Overfitting occurs when model predicts the training data
well, but predicts new data (e.g., from test set) poorly
• Reason 1
– Model is too complex for the data
– Previous example: Fit order 9 polynomial to 10 data points
• Reason 2
– Too many features but number of training samples too small
– Even linear model can overfit, e.g., linear model with 9 input
features (i.e., is 10-D) and 10 data points in training set =>
data might not be enough to estimate 10 unknowns well
• Solutions
– Use simpler models (e.g., lower order polynomial)
– Use regularization (see next part of lecture)
20
© Copyright EE, NUS. All Rights Reserved.
Overfitting & Underfitting
• Underfitting is the inability of trained model to predict the
targets in the training set
• Reason 1
– Model is too simple for the data
– Previous example: Fit order 1 polynomial to 10 data points
that came from an order 2 polynomial
– Solution: Try more complex model
• Reason 2
– Features are not informative enough
– Solution: Try to develop more informative features

21
© Copyright EE, NUS. All Rights Reserved.
Overfitting / Underfitting Schematic

Underfitting Overfitting
regime regime

22
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• For , matrix becomes invertible (Motivation 1)

23
© Copyright EE, NUS. All Rights Reserved.
Regularization
• Regularization is an umbrella term that includes methods
forcing learning algorithm to build less complex models.
• Motivation 1: Solve an ill-posed problem
– For example, estimate 10th order polynomial with just 5 datapoints
• Motivation 2: Reduce overfitting
• For example, in previous lecture, we added :

• Minimizing with respect to , primal solution is

• might also perform better in test set, i.e., reduces

cost function quantifying Regularization

data fitting error in training
set

L2 - Regularization
•
• Encourage to be small (also called shrinkage or
weight-decay) => constrain model complexity
• More generally, most machine learning algorithms can be
formulated as the following optimization problem

• Data-Loss(w) quantifies fitting error to training set given

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9 + = 1 Good Good

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, =1 Good

Training Test
Set Fit Set Fit
Order 9 Good Bad
Order 9, =1 Good Good

29
© Copyright EE, NUS. All Rights Reserved.
Bias versus Variance
• Suppose we are trying to predict red target below:
Low Variance High Variance
Low Bias: blue Low Bias: blue
predictions on average Low Bias predictions on average
close to red target close to red target
Low Variance: low High Variance: large
variability among variability among
predictions predictions

High Bias: blue High Bias: blue

High Bias

predictions on average predictions on average

not close to red target not close to red target
Low Variance: Low High Variance: high
variability among variability among
predictions predictions

High Bias High Variance

Low Variance Low Bias

31
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

32
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

33
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

34
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

35
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

36
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

37
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias
• Fit with order 4 polynomial: high variance, low bias
4th Order Polynomials 2nd Order Polynomials

38
© Copyright EE, NUS. All Rights Reserved.
Bias + Variance Example
• Simulate data from order 2 polynomial (+ noise)
• Randomly sample 10 training samples each time
• Fit with order 2 polynomial: low variance, low bias Order 2
Achieves Lower
• Fit with order 4 polynomial: high variance, low bias Test Error
4th Order Polynomials 2nd Order Polynomials

42
© Copyright EE, NUS. All Rights Reserved.
Summary
• Overfitting, underfitting & model complexity
– Overfitting: low error in training set, high error in test set
– Underfitting: high error in both training & test sets
– Overly complex models can overfit; Overly simple models can underfit
• Regularization (e.g., L2 regularization)
– Solve “ill-posed” problem (e.g., more unknowns than data points)
– Reduce overfitting
• Bias-Variance Tradeoff
– Test error = Bias Squared + Variance + Irreducible Noise
– Interpretation:
• Overly complex models can have high variance, low bias
• Overly simple models can have low variance, high bias
• Interpretation is not always true (see tutorial)

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
ECS7020P Sample Paper Solutions
No ratings yet
ECS7020P Sample Paper Solutions
6 pages
Grace Smith Math IA An Investigation Into The Strategies of Rock Paper Scissors
No ratings yet
Grace Smith Math IA An Investigation Into The Strategies of Rock Paper Scissors
10 pages
Course: Educational Assessment and Evaluation Code: 8602 Semester: Spring, 2021 Assignment # 2 Level: B. Ed
No ratings yet
Course: Educational Assessment and Evaluation Code: 8602 Semester: Spring, 2021 Assignment # 2 Level: B. Ed
21 pages
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
No ratings yet
Lecture 7 - Overfitting, Bias-Variance Trade Off (DONE!!) PDF
42 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
94 pages
DSA Module 3
No ratings yet
DSA Module 3
30 pages
week2
No ratings yet
week2
43 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow
85 pages
EE2211 Introduction To Machine Learning
No ratings yet
EE2211 Introduction To Machine Learning
99 pages
Lec3 Linear Regression With Multiple Vars
No ratings yet
Lec3 Linear Regression With Multiple Vars
30 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
linear+regression+with+multiple+variable
No ratings yet
linear+regression+with+multiple+variable
30 pages
Regression-and-generalization (1)
No ratings yet
Regression-and-generalization (1)
67 pages
DL_Unit1 (1)
100% (1)
DL_Unit1 (1)
79 pages
EE5434 Regression
No ratings yet
EE5434 Regression
96 pages
Lect03 Linear Model ML
No ratings yet
Lect03 Linear Model ML
93 pages
Lab Manual 05
No ratings yet
Lab Manual 05
13 pages
4.machine Learning Basics (C)
No ratings yet
4.machine Learning Basics (C)
9 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
116 pages
ML 01
No ratings yet
ML 01
24 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Lec19 Introduction2LinearRegression
No ratings yet
Lec19 Introduction2LinearRegression
53 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
No ratings yet
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
30 pages
Supervised_Learning (2)
No ratings yet
Supervised_Learning (2)
41 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
lec8_Regularization_polynomial_regression
No ratings yet
lec8_Regularization_polynomial_regression
30 pages
RADL TQKhoat
No ratings yet
RADL TQKhoat
50 pages
Underfitting and Overfitting Slides and Transcript
No ratings yet
Underfitting and Overfitting Slides and Transcript
13 pages
Lect03 Linear Model ML
No ratings yet
Lect03 Linear Model ML
100 pages
sol_eval_1
No ratings yet
sol_eval_1
4 pages
Neural Network Lectures RBF 1
No ratings yet
Neural Network Lectures RBF 1
44 pages
10 Advice for Applying Machine Learning
No ratings yet
10 Advice for Applying Machine Learning
25 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
Introduction ML
No ratings yet
Introduction ML
65 pages
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
No ratings yet
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
51 pages
Week11_regularization and optimization
No ratings yet
Week11_regularization and optimization
75 pages
Choosing Model and Tuning
No ratings yet
Choosing Model and Tuning
20 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Lecture 3-Linear-Regression-Part2
No ratings yet
Lecture 3-Linear-Regression-Part2
45 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Lab 4 - Markdown Practical - Solution
No ratings yet
Lab 4 - Markdown Practical - Solution
5 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
SML_Lecture1
No ratings yet
SML_Lecture1
37 pages
Week 6 Lecture Notes
No ratings yet
Week 6 Lecture Notes
9 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
01 Machine Learning Basics
No ratings yet
01 Machine Learning Basics
51 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
10 pages
training-models
No ratings yet
training-models
13 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
ML U-4
No ratings yet
ML U-4
63 pages
Lec-6
No ratings yet
Lec-6
31 pages
EE2211 Lecture 3
No ratings yet
EE2211 Lecture 3
35 pages
F2F Lecture 1 Slides
No ratings yet
F2F Lecture 1 Slides
33 pages
Lecture 4a Solar Thermal
No ratings yet
Lecture 4a Solar Thermal
34 pages
Lecture3 Wind Energy
100% (1)
Lecture3 Wind Energy
47 pages
Aligned Rank Transform PDF
No ratings yet
Aligned Rank Transform PDF
4 pages
L1 - SLM Notes (Bacground, ML)
No ratings yet
L1 - SLM Notes (Bacground, ML)
29 pages
3 Residual Analysis
No ratings yet
3 Residual Analysis
5 pages
Likelihood Ratio Test
No ratings yet
Likelihood Ratio Test
13 pages
Statistics for Environmental Engineers 2nd ed Edition Linfield C. Brown - Download the complete ebook in PDF format and read freely
100% (1)
Statistics for Environmental Engineers 2nd ed Edition Linfield C. Brown - Download the complete ebook in PDF format and read freely
47 pages
Kaur2020 Article Hyper-parameterOptimizationOfD
No ratings yet
Kaur2020 Article Hyper-parameterOptimizationOfD
15 pages
3D PRINTING AND AUGMENTED REALITY FOR TEACHING COMPLEX ANATOMICAL CONCEPTS
No ratings yet
3D PRINTING AND AUGMENTED REALITY FOR TEACHING COMPLEX ANATOMICAL CONCEPTS
17 pages
Group 2 Descriptive Statistics
No ratings yet
Group 2 Descriptive Statistics
27 pages
LREC Tobacco3482 Appendix
No ratings yet
LREC Tobacco3482 Appendix
1 page
2018 Book ProbabilityAndStatisticsForCom
No ratings yet
2018 Book ProbabilityAndStatisticsForCom
374 pages
BRM Project Report Final
50% (2)
BRM Project Report Final
57 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
1 Quanti Quali Mixed
No ratings yet
1 Quanti Quali Mixed
17 pages
Statistics and Probability: Quarter 2 Week 3 Test of Hypothesis
No ratings yet
Statistics and Probability: Quarter 2 Week 3 Test of Hypothesis
6 pages
Spillover
No ratings yet
Spillover
17 pages
MMW Reviewer Data Management
No ratings yet
MMW Reviewer Data Management
17 pages
Business Analytics Process and Data Exploration
No ratings yet
Business Analytics Process and Data Exploration
38 pages
Introduction To R Programming
No ratings yet
Introduction To R Programming
23 pages
BP 801t Biostatistics and Research Methodology Jun 2020
No ratings yet
BP 801t Biostatistics and Research Methodology Jun 2020
3 pages
Lecture Chi Square Non Parametric Test
No ratings yet
Lecture Chi Square Non Parametric Test
41 pages
Multivariate Analysis and Forecasting of The Crude Oil Prices
No ratings yet
Multivariate Analysis and Forecasting of The Crude Oil Prices
13 pages
FinalReport LIPSTICKS
No ratings yet
FinalReport LIPSTICKS
39 pages
March, 2016
No ratings yet
March, 2016
363 pages
Working With Data in Eviews
No ratings yet
Working With Data in Eviews
24 pages
Download Full (Ebook) The Quantitative Analysis of the Dynamics and Structure of Terminologies by Kyo Kageura ISBN 9789027223395, 9789027272461, 9027223394, 9027272468 PDF All Chapters
100% (2)
Download Full (Ebook) The Quantitative Analysis of the Dynamics and Structure of Terminologies by Kyo Kageura ISBN 9789027223395, 9789027272461, 9027223394, 9027272468 PDF All Chapters
67 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
44 pages
Labview: Statistical Process Control Toolkit Reference Manual
No ratings yet
Labview: Statistical Process Control Toolkit Reference Manual
107 pages
Applied Statistics PDF
100% (1)
Applied Statistics PDF
213 pages