Lecture 4 - More On Linear Regression and Polynomial Regression

Uploaded by

royeha2011

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Lecture 4 - More On Linear Regression and Polynomial Regression

Uploaded by

royeha2011

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Lec 4 – More on Linear

Regression and Gradient

Descent
Mariette Awad

Slide sources for this set of slides: Stanford Intro to ML course

Lecture Outline
• Multivariate Linear Regression
• Stochastic Gradient Descent
• Practical Tricks to make GD work well - Feature Scaling
• Practical Tricks to make GD work well - Plotting J(θ) and choosing
Learning rate
• Polynomial Regression
Lecture Outcomes
• What is a Multivariate Linear Regression
• What is a Stochastic Gradient Descent
• What are Practical Tricks to make GD work well
• Feature Scaling
• Plotting J(θ)
• Choosing well the Learning rate
• What is a Polynomial Regression
Multivariate Linear Regression
or Multiple Features Hypothesis
Linear Regression with Multiple features (variables).
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms floors (years) Before: single feature (price)
and we were trying to predict
2104 5 1 45 460 price.
1416 3 2 40 232 Now: We have more features
1534 3 2 30 315 (e.g. size, # of bedrooms, # of
852 2 1 36 178 floors, age of home).
… … … … …

Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Multivariate linear regression
For convenience of notation, define
Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat
(simultaneously update for
)

(simultaneously update )
Stochastic Gradient Descent
(SGD)
Stochastic Gradient Descent (SGD)
• So far, what we used is called the Batch gradient descent since it uses
all training (i.e. we are looking at the whole batch) data in every
iteration.
• There are other versions of gradient descent such as SGD:
Batch Gradient Descent Stochastic Gradient Descent
𝑅𝑒𝑝𝑒𝑎𝑡 𝑢𝑛𝑡𝑖𝑙 𝑐𝑜𝑛𝑣𝑒𝑟𝑔𝑒𝑛𝑐𝑒{ 𝐿𝑜𝑜𝑝 {
𝑓𝑜𝑟 𝑖 = 1 𝑡𝑜 𝑛, {
𝑛
𝑖 𝑖 𝑖 𝑖 𝑖 𝑖
𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 ෍ 𝑦 − ℎ𝜃 𝑥 𝑥𝑗 , (𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑗) 𝜃𝑗 ≔ 𝜃𝑗 + 𝛼 𝑦 − ℎ𝜃 𝑥 𝑥𝑗 , (𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑗)
𝑖=1

}
}
}
SGD versus Batch GD
• Batch gradient descent has to process the entire training set at every
iteration of making an update to the parameters.
• SGD parameter updates are conducted with one training sample at
every step.
• Often, SGD gets θ “close” to the minimum much faster than batch
gradient descent. Note however that it may never “converge” to the
minimum
• The parameters θ will keep oscillating around the minimum of J(θ);
but in practice most of the values near the minimum will be
reasonably good approximations to the true minimum.
• When training set is large, SGD preferred.
SGD (for linear regression) updates in vector
form

Note here:
• Θ is a vector in this equation with dimension = (Number of features + 1)
• x(i) is a vector of feature values for the ith (one) training sample.
Practical Tricks to make gradient
descent work well – Feature
Scaling
Feature Scaling
• If you have features, where one feature values are much larger than
another feature values, the contour plot for the cost function will
likely be skewed in the direction orthogonal to the feature with the
larger values since any change will increase dramatically.
• Accordingly, the gradient descent may take a long time to converge. It
may actually oscillate between steps on its way to converging.
• To address this problem, we scale all features by their maximum
values to bring them all to a similar range (or close to the same range)
• Mathematically, it can be shown that the convergence now takes a
more direct path and the convergence is faster.
Example
Idea: Make sure features are on a similar scale.
E.g. = size (0-2000 feet2) size (feet2)
= number of bedrooms (1-5)
number of bedrooms

0 ≤ 𝑥1 ≤ 1, 0 ≤ 𝑥2 ≤ 1
Scaling and Mean Normalization
Scaling: Get every feature into approximately a range

Scaling with Mean normalization:

Replace with to make features have approximately zero mean
(Do not apply to ).

E.g.

Other option for normalization: Divide by standard deviation

Practical Tricks to make gradient
descent work well – Learning
Rate
Debugging - Making sure gradient descent is working correctly

• The job of gradient descent is to

find the theta that minimizes the
cost function.
• Debugging approach: Plot the cost
function with the number of
iterations.
• For example, Run Gradient descent
for 100 iterations and evaluate the
cost function for the value of theta
after 100 iterations.
• Same after 200 iterations.
• If gradient descent is working
properly, J(theta) should decrease
0 100 200 300 400 after every iteration.
No. of iterations
Number of Iterations
• Assume after 300-400 iterations,
if the change is not much, then
you know it has converged or
close to convergence.
• Note that the number of
iterations may be very different
for different algorithms.
• For some, it may be 30 iterations,
for others it could be 3 millions.
0 100 200 300 400 Best way to know is from the
No. of iterations plot.
Easy to spot failure

• We can also find out when it is not

working. If we plot the cost
function with number of iterations,
and we noticed that it is going up,
then it means gradient descent is
not converging.

0 100 200 300 400

No. of iterations
Automatic Convergence test

Example automatic
convergence test:

Declare convergence if
decreases by less than
in one iteration.
• The problem with this approach is
that it is very hard to decide what
0 100 200 300 400 threshold to choose. Checking the
No. of iterations plot is a better approach.
Choice of Learning Rate
• If gradient descent is not converging, a good approach is to choose a
smaller learning rate (alpha)
• Example scenarios with gradient descent not converging:
• Cost function increasing with iterations
• Cost function oscillating with iterations
• It can be shown that if the learning rate is chosen small enough, the
gradient descent would decrease with every iteration.
• However, it may take a while for the algorithm to converge when the
learning rate is too small.
Advice to choose , try …., 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, ….
• Then choose the one that gives you the fastest rate of decrease while still converging
Choice of features and
polynomial regression
Choice of features
• Sometimes you may be given a set features, but decide you need
other features that can be derived from existing features.
• Example:
• Consider the case where you are given frontage of house and depth of house,
and you want to compute house prices.
• You may choose to compute the area, which is the product of frontage and
depth.
• In some cases, we may want new features as squares or cubes of
original features.
• This would lead to polynomial regression.
Polynomial regression For the given data,
one possible option:

Price
(y) But this may cause a drop with
larger values.
Another alternative may be:

Size (x)

This can be represented using Linear Regression Modeling

Same Gradient descent is now applicable as in Linear Regression Models

Other choices of features to match data patterns

Price
(y)

Size (x)

By having insight into the fact that square root has a saturating pattern:

Lecture 3
No ratings yet
Lecture 3
32 pages
Netengine 8000 M1A Service Router Data Sheet
No ratings yet
Netengine 8000 M1A Service Router Data Sheet
11 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
Linear Regression
100% (1)
Linear Regression
51 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Lecture 3-4
No ratings yet
Lecture 3-4
87 pages
Lecture Slides - Linear Reg
No ratings yet
Lecture Slides - Linear Reg
34 pages
Lecture3-Linear Regression With Multiple Variables
No ratings yet
Lecture3-Linear Regression With Multiple Variables
27 pages
Linear Regression With Multiple Variables
100% (1)
Linear Regression With Multiple Variables
38 pages
3
No ratings yet
3
14 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Multivariate Linear Regression-Shared
No ratings yet
Multivariate Linear Regression-Shared
41 pages
4.Multivariate Linear Regression-Shared
No ratings yet
4.Multivariate Linear Regression-Shared
41 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
20 pages
TensorFlow With R
No ratings yet
TensorFlow With R
46 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
MLPPT
No ratings yet
MLPPT
36 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
lecture7-linear-regression
No ratings yet
lecture7-linear-regression
36 pages
Linear - Regression - 01
No ratings yet
Linear - Regression - 01
81 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
2 LinearRegression2
No ratings yet
2 LinearRegression2
45 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
ML03
No ratings yet
ML03
14 pages
02-Linear Regression
No ratings yet
02-Linear Regression
17 pages
LInear
No ratings yet
LInear
14 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
MachineLearning PDF
No ratings yet
MachineLearning PDF
94 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
Mathematics Behind Machine Learning:: Linear Regression Model
No ratings yet
Mathematics Behind Machine Learning:: Linear Regression Model
21 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
Lecture-CNN
No ratings yet
Lecture-CNN
68 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Document 2
No ratings yet
Document 2
30 pages
L5 Normal Equations For Regression PDF
No ratings yet
L5 Normal Equations For Regression PDF
20 pages
Dynamic Programming and Single Word Recognizers (Part 1)
No ratings yet
Dynamic Programming and Single Word Recognizers (Part 1)
25 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Machine Learning (ML) RIME-832: Dr. Hasan Sajid
No ratings yet
Machine Learning (ML) RIME-832: Dr. Hasan Sajid
57 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Lesson 4 Linear Assumptions
No ratings yet
Lesson 4 Linear Assumptions
27 pages
Crashcourse DL Pytorch Parr
No ratings yet
Crashcourse DL Pytorch Parr
39 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
IGE-Basic Math-2021
No ratings yet
IGE-Basic Math-2021
25 pages
ML Notes
No ratings yet
ML Notes
14 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
w4 Generalisation
No ratings yet
w4 Generalisation
42 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Week 04
No ratings yet
Week 04
101 pages
PAMS-22Fall-Smart Marketing With RRM-5-Optimization
No ratings yet
PAMS-22Fall-Smart Marketing With RRM-5-Optimization
40 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Polynomials
No ratings yet
Polynomials
8 pages
Report Robotics - MINI PROJECT - MCTE 4352 (2023)
No ratings yet
Report Robotics - MINI PROJECT - MCTE 4352 (2023)
16 pages
Chapte 2
No ratings yet
Chapte 2
58 pages
Transdutor - MPX2300DT1
No ratings yet
Transdutor - MPX2300DT1
7 pages
Semi-Structured Documents Mining - A Review and Comparison
No ratings yet
Semi-Structured Documents Mining - A Review and Comparison
10 pages
SeismoStruct User Manual
No ratings yet
SeismoStruct User Manual
598 pages
School Estimation - 625 SQFT
No ratings yet
School Estimation - 625 SQFT
1 page
Manual JIK-6 V1ind
No ratings yet
Manual JIK-6 V1ind
46 pages
SKF Cmxa75 User Manual-En
No ratings yet
SKF Cmxa75 User Manual-En
311 pages
Poster Final For ML
No ratings yet
Poster Final For ML
1 page
Lesson 3.1 Algebraic Terms
No ratings yet
Lesson 3.1 Algebraic Terms
4 pages
MF 4300 Series Lift Hydraulics Manual
100% (1)
MF 4300 Series Lift Hydraulics Manual
49 pages
GATE Thermodynamics
75% (4)
GATE Thermodynamics
84 pages
Creative Projects NAO en
No ratings yet
Creative Projects NAO en
148 pages
Manual Sharon
No ratings yet
Manual Sharon
84 pages
Procedures of Data Collection, Data Processing and Analysis
No ratings yet
Procedures of Data Collection, Data Processing and Analysis
37 pages
En Iso 1183-3
No ratings yet
En Iso 1183-3
12 pages
Userguide PerifitPump v20231108 2
No ratings yet
Userguide PerifitPump v20231108 2
134 pages
Divyangjan Form
No ratings yet
Divyangjan Form
1 page
Subtypes of PV Systems
No ratings yet
Subtypes of PV Systems
2 pages
Catalogo Baylan Modelo Tk5c 40mm
No ratings yet
Catalogo Baylan Modelo Tk5c 40mm
1 page
The CEO of Technology Lead Reimagine and Reinvent To Drive Growth and Create Value in Unprecedented Times 1st Edition Hunter Muller
100% (4)
The CEO of Technology Lead Reimagine and Reinvent To Drive Growth and Create Value in Unprecedented Times 1st Edition Hunter Muller
62 pages
3.1-6 Folder Redirection - Copy (Full Permission)
No ratings yet
3.1-6 Folder Redirection - Copy (Full Permission)
42 pages
MCGB - Data Sheet For Suppliers Old MAT Nos.: 211, - , - : Heat-Treatable Steel, Low Alloy Steel, Cr-Mo
No ratings yet
MCGB - Data Sheet For Suppliers Old MAT Nos.: 211, - , - : Heat-Treatable Steel, Low Alloy Steel, Cr-Mo
3 pages
Order 82699017 - Global Opportunities For Xiaomi Electronic Company
No ratings yet
Order 82699017 - Global Opportunities For Xiaomi Electronic Company
24 pages
Chicken Claw Foundation System
No ratings yet
Chicken Claw Foundation System
3 pages
Raspberry Pi Tips Tricks Hacks
No ratings yet
Raspberry Pi Tips Tricks Hacks
164 pages
Ritik's Official Resume
No ratings yet
Ritik's Official Resume
1 page
As - Carpentry 7 - W3 - Q2
No ratings yet
As - Carpentry 7 - W3 - Q2
4 pages