0% found this document useful (0 votes)

2 views

Lecture Slides - Linear Reg

The document provides an overview of regression techniques, including linear and logistic regression, and discusses gradient descent methods such as batch, stochastic, and mini-batch gradient descent. It covers polynomial regression, regularization techniques (L1, L2, ElasticNet), and the concept of overfitting and underfitting. Additionally, it introduces logistic regression for binary outcomes and softmax regression for multi-class classification, along with relevant Python parameters for implementation.

Uploaded by

Neeraja

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture Slides - Linear Reg

Uploaded by

Neeraja

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Linear/Logistic

Regression
Varol Kayhan, PhD
Agenda
• Regression recap
• Gradient Descent
• Stochastic gradient descent
• Batch/mini-bath gradient descent
• Polynomial regression
• Regularization
• L1
• L2
• ElasticNet
• Logistic Regression
Recall: Regression

: Intercept
, …, : Beta coefficients
, … , : dimensions, features, variables, predictors
Recall: Regression
• Example: House prices with one variable Sqft Price
1,000 110,000
• : Sqft
1,500 150,000
• : Price
… …

• Fit the best line through observations (by minimizing error)

Price 𝑎 𝑎
= 𝑆𝑙𝑜𝑝𝑒= 𝛽
𝑏 𝑏 1

𝛽0

Sqft
Recall: Regression
• Unfortunately, life is not that simple
• We usually have more than one variable
• It is a "multi-dimensional" space
• (Impossible to visualize it)

𝑦 = 𝛽0 + 𝛽1 𝑥 1 + 𝛽 2 𝑥 2 + 𝛽 3 𝑥 3 + …+ 𝛽 𝑛 𝑥 𝑛
Regression

How can we find the BEST set of beta coefficients?

(This is also called "training the model" in machine learning)

1) The Normal Equation (closed-form solution):

X: Training set
y: vector of output values
Computationally very costly (if there are lots of features, or lots of data)
2) Approximate solution using an "optimizer": Gradient Descent
Gradient Descent
• A generic optimization algorithm
• Tweak the parameters iteratively to minimize the cost function (SSE,
MSE, RMSE, etc.)

• Analogy:
• Lost in a mountain
• How do you get to the bottom? (i.e., "minimize")
• Feel slope below your feet!
• Then, go in the direction of steepest (descending) slope
Gradient Descent

Cost

Beta value
Gradient Descent - Mechanics
• Initialize the beta coefficients with random values
• Calculate the "gradient" of each beta coefficient
• Based on the gradient and "learning rate", change the beta coefficients.
• New value = Previous value – learning rate x "gradient value"
• If gradient is zero, minimum achieved!
Gradient Descent - Example

𝑃𝑟𝑖𝑐𝑒=𝛽0+𝛽1 𝐴𝑔𝑒+𝛽2 𝑆𝑞𝑓𝑡

Age SqFt Price
10 2,500 110,000
5 1700 120,000
… … …

Iteration #1: Assign random values for each beta coefficient, calculate the "cost" of each

𝛽0 𝛽1 𝛽2
Iteration #2: Adjust the coefficients, recalculate the cost

𝛽0 𝛽1 𝛽2
Iteration #3 and so forth: Repeat
Learning Rate
• Determines how fast to move in each iteration
• If too small: too many steps to converge
• Might not converge or take too long to converge
• It too large: jump around and never reach minimum
Gradient Descent
• Two challenges:

If you are on the left, If you are on the right,

you will get stuck in the you will get stuck in the
local minimum plateau
Gradient Descent
• MSE is always shaped like a bowl (for linear regression)
• Only one global minimum
• No local minimum
• Slope doesn't change abruptly
• It is guaranteed to approach global minimum
• However, always standardize your numeric variables
• Non-standardized variables change the shape of bowl,
you might end up with “plateaus”
Batch Gradient Descent
• Uses the “entire” training set to calculate MSE at every step
• Calculate MSE
• Calculate the gradients
• Update the beta coefficients (based on gradients)
• Repeat
• Can be very slow if you have a large set
Stochastic Gradient Descent (SGD)
• Default algorithm in many libraries
• Works really well
• Stochastic means "random"
• Pick a random instance at every step
• Calculate the gradients for that instance only
• (Gradient decreases on average)
• Much faster than batch gradient
• Problem: jumps around (even at global min)
• It is instance specific
• Might require more iterations to find the optimal solution
Stochastic Gradient Descent (SGD)
• Problem with SGD (for the example discussed earlier)

Iteration #1: Assign random values for each beta coefficient, calculate the "cost" of each for ONE random observation

𝛽0 𝛽1 𝛽2
Iteration #2: Adjust the coefficients, recalculate the cost for another random observation

𝛽0 𝛽1 𝛽2
Iteration #3 and so forth: Repeat
is going in the wrong
direction
Stochastic Gradient Descent (SGD)
• Works even if the cost function is not like a “bowl”
• It can get out of local minima
• Uses a "learning schedule" to gradually decrease the learning rate
• Increases the chances of converging on the optimal solution

High learning rate

Low learning rate

Mini-Batch Gradient Descent
• Calculates the gradient on a small subsample (i.e., mini-batch)
• Batch uses the entire data set
• Stochastic uses one instance at a time
• Mini-batch uses a subsample at a time
• Less erratic than stochastic gradient
Polynomial Regression
• Regression with polynomial terms
• Used when you think the regression line is "curved"
Polynomial Regression
• Examples:
One variable, first-degree polynomial: One variable, second-degree polynomial:
(i.e., no polynomial term)
Polynomial Regression
• Example: Second-degree polynomial:
• One variable: (3 beta coefficients)

• Two variables: (6 beta coefficients)

Polynomial Regression
• Example: Third-degree polynomial
• One variable: (4 beta coefficients)

• Two variables: (10 beta coefficients)

• Higher degrees generate lots of terms

• Models become difficult to train
• Models become more susceptible to overfit
Learning Curves
• Overfitting: Models performs well on training, but not on test
• Underfitting: Model performs badly on both training and test
Regularization:
• A technique to reduce overfitting
• A technique to "penalize the model complexity"
(so you don't learn too much)
• "Constrains" the weights (i.e., betas) of the model
• Two types:
• L2 Regularization (i.e., Ridge Regression)
• L1 Regularization (i.e., Lasso Regression)
• Both are performed by adding a new term to the cost function (during
training)
L2 Regularization (Ridge Regression)
• Goal: model simplicity
• Forces the algorithm to keep the beta coefficients as small as possible

• Cost function = MSE + α

L2 Regularization (Ridge Regression)
• Cost function = MSE + α
• The term alpha (α) controls the magnitude.
• If zero, then it is regular regression
• If too large, then all weights are very close to
zero and you end up with the intercept only.
(i.e., a straight line through the mean)

alpha value
Very
Comple
simple
x model
Very low (overfitt
model Very high
(underfi
ing)
tting)
L1 Regularization (Lasso Regression)
• Goal: model sparsity
• Cost function = MSE + α
• It eliminates the least important features/variables
(by setting their betas to zero)
• Automatically performs feature selection
Elastic Net
• Mix of L2 and L1
• MSE + rα +
• Control the mix ratio using the term "r" in the cost function:
• 0 <= r <= 1
• r = 0 , then L2
• r = 1 , then L1
• Same as before: α (alpha) controls the magnitude of "regularization"
• α = 0, then no regularization (i.e., a regular regression model)
• Higher values mean more regularization
Early Stopping
• Another regularization technique: stop when validation error is
minimum

Is it baked into the algorithms?

Yes&No
(Sometimes, you have to write your own)
Logistic Regression
• Regression for binary outcomes
• Works just like regular regression
• Output is probability of belonging to class 1 vs. 0
• No known closed-form to find the beta coefficients
• Uses gradient descent
• Cost function is like a bowl, so guaranteed global minimum
• Can be regularized using both L1 and L2
Logistic Regression
• The output value is constrained between 0 and 1
• Logistic function =
Softmax Regression
• Also know as, "Multinomial Logistic Regression"
• Used for multi-class classification
• Finds probabilities of each class using the softmax function
• Class is assigned using the highest estimated probability
• Uses the cross-entropy cost function
• If classes = 2, reverts back to logistic regression
• Uses gradient descent for optimization
Python Cheatsheet
• eta0: learning rate in gradient descent algorithms
• alpha (α): regularization hyperparameter (for both L2 and L1)
• l1_ratio: the mix ratio of r in Elastic Net
• C: regularization hyperparameter for softmax regression
Conclusion
• Regression recap
• Gradient Descent
• Stochastic gradient descent
• Batch/mini-bath gradient descent
• Polynomial regression
• Regularization
• L1
• L2
• ElasticNet
• Logistic Regression

Dark Heresy Adventure Silence of The Void
100% (1)
Dark Heresy Adventure Silence of The Void
30 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Cybersecurity Board Questions
100% (2)
Cybersecurity Board Questions
20 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Lecture04. Training Models (Regression in Chapter 4)
No ratings yet
Lecture04. Training Models (Regression in Chapter 4)
44 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Linear Regression Python Programming
No ratings yet
Linear Regression Python Programming
25 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
No ratings yet
Lecture 7 - Part A - Mutli Class and Overfitting and Regularization
43 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
CH 4
No ratings yet
CH 4
41 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
ML4 Linear Models
No ratings yet
ML4 Linear Models
34 pages
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
No ratings yet
Introduction To Machine Learning: Dr. Muhammad Amjad Iqbal
20 pages
Gradient Descent and SGD
No ratings yet
Gradient Descent and SGD
8 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
Logistic
No ratings yet
Logistic
14 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML-1
No ratings yet
ML-1
24 pages
A Layman's Guide to the Project
No ratings yet
A Layman's Guide to the Project
34 pages
ANN-Regression-Python Examples
No ratings yet
ANN-Regression-Python Examples
35 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
ML - WEEK 06
No ratings yet
ML - WEEK 06
31 pages
3 Gradient Descent
No ratings yet
3 Gradient Descent
8 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Lec 07-08 - Final
No ratings yet
Lec 07-08 - Final
32 pages
Document 2
No ratings yet
Document 2
30 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Gradient descent
No ratings yet
Gradient descent
16 pages
Neural Network
No ratings yet
Neural Network
14 pages
Lecture02a Optimization Annotated PDF
No ratings yet
Lecture02a Optimization Annotated PDF
23 pages
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Stochastic Gradient Descent: Ryan Tibshirani Convex Optimization 10-725
22 pages
ML models and when to choose one over others
No ratings yet
ML models and when to choose one over others
7 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
L23 Stochastic Gradient and Mini Batch
No ratings yet
L23 Stochastic Gradient and Mini Batch
9 pages
fileml
No ratings yet
fileml
54 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
A Tutorial of Machine Learning
No ratings yet
A Tutorial of Machine Learning
16 pages
Machine Learning Notes Cs229 1
No ratings yet
Machine Learning Notes Cs229 1
217 pages
Gdesc LMS
No ratings yet
Gdesc LMS
7 pages
Gradient_decent
No ratings yet
Gradient_decent
15 pages
Regression
No ratings yet
Regression
30 pages
Lec1 PDF
No ratings yet
Lec1 PDF
56 pages
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
No ratings yet
WINSEM2024-25_CSE4006_ETH_AP2024254000693_2025-01-08_Reference-Material-I
40 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
MECH4403 LR Week04
No ratings yet
MECH4403 LR Week04
25 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Text Mining
No ratings yet
Text Mining
35 pages
Lecture Slides - ML - Part 1
No ratings yet
Lecture Slides - ML - Part 1
12 pages
Lecture Slides - ML - Part 2
No ratings yet
Lecture Slides - ML - Part 2
22 pages
Q4Mentor Course Details
No ratings yet
Q4Mentor Course Details
3 pages
CSB Ias Academy: Wef Annual Meeting 2021 Will Take Place in Lucerne-Burgenstock
No ratings yet
CSB Ias Academy: Wef Annual Meeting 2021 Will Take Place in Lucerne-Burgenstock
2 pages
M Cap 31122020
No ratings yet
M Cap 31122020
80 pages
CSB Ias Academy: Svamitva Scheme: PM To Distribute Property Cards For Rural Landowners
No ratings yet
CSB Ias Academy: Svamitva Scheme: PM To Distribute Property Cards For Rural Landowners
2 pages
India - Chile Relations
No ratings yet
India - Chile Relations
2 pages
NFC Aaac Al4 - 34.4&54.6 Ees Cable
No ratings yet
NFC Aaac Al4 - 34.4&54.6 Ees Cable
5 pages
Syllabus in Sales Management
No ratings yet
Syllabus in Sales Management
7 pages
Nnouncement of Inners: The Putnam Fellows - The Five Highest Ranking Individuals
No ratings yet
Nnouncement of Inners: The Putnam Fellows - The Five Highest Ranking Individuals
18 pages
EC Document List
No ratings yet
EC Document List
3 pages
2020 WTS 12 Trigonometry
No ratings yet
2020 WTS 12 Trigonometry
62 pages
4 Digestive Disorders of Dogs
No ratings yet
4 Digestive Disorders of Dogs
72 pages
Manali - Kasol - Jibhi
No ratings yet
Manali - Kasol - Jibhi
17 pages
Chapter 5
No ratings yet
Chapter 5
44 pages
Current Cognition of Rock Tensile Strength Testing by Brazilian Test
No ratings yet
Current Cognition of Rock Tensile Strength Testing by Brazilian Test
15 pages
2023-04-18 Dahua-Access-Control-Integration-Host - User's-Manual - V1.1.5
No ratings yet
2023-04-18 Dahua-Access-Control-Integration-Host - User's-Manual - V1.1.5
22 pages
Accounting Textbook Solutions - 51
No ratings yet
Accounting Textbook Solutions - 51
19 pages
Download Complete Deep Learning with Swift for TensorFlow Differentiable Programming with Swift 1st Edition Rahul Bhalley PDF for All Chapters
100% (1)
Download Complete Deep Learning with Swift for TensorFlow Differentiable Programming with Swift 1st Edition Rahul Bhalley PDF for All Chapters
65 pages
Kaun Banega Crorepati Computer C++ Project
No ratings yet
Kaun Banega Crorepati Computer C++ Project
20 pages
Deep Learning For Iot: Tausif Diwan, Jitendra V. Tembhurne, Tapan Kumar Jain, and Pooja Jain
No ratings yet
Deep Learning For Iot: Tausif Diwan, Jitendra V. Tembhurne, Tapan Kumar Jain, and Pooja Jain
17 pages
Loctite Max 2018
No ratings yet
Loctite Max 2018
6 pages
Ted Mcgowan: Professional Experience
No ratings yet
Ted Mcgowan: Professional Experience
1 page
Swordmaster Efl
No ratings yet
Swordmaster Efl
3 pages
Semantics and Pragmatics
100% (1)
Semantics and Pragmatics
38 pages
International Folklore Reflected in The Novels
No ratings yet
International Folklore Reflected in The Novels
12 pages
Strategy, Organizational Design, and Effectiveness
No ratings yet
Strategy, Organizational Design, and Effectiveness
20 pages
Indian Banking Industry by Ravi Ranjan Sir
No ratings yet
Indian Banking Industry by Ravi Ranjan Sir
20 pages
Arcode Hardware Manual.V211.En
100% (2)
Arcode Hardware Manual.V211.En
12 pages
Evaluation Form Trial Period
No ratings yet
Evaluation Form Trial Period
2 pages
Victor Vega-Encarnacion v. United States, 993 F.2d 1531, 1st Cir. (1993)
No ratings yet
Victor Vega-Encarnacion v. United States, 993 F.2d 1531, 1st Cir. (1993)
7 pages
Mini 5
No ratings yet
Mini 5
11 pages
Amazon
No ratings yet
Amazon
59 pages
Fisa Tehnica Invertor Hibrid Monofazat ASCET 5K-1201P2T2 5kW WiFi SmartMeter
No ratings yet
Fisa Tehnica Invertor Hibrid Monofazat ASCET 5K-1201P2T2 5kW WiFi SmartMeter
2 pages
Agua 2
No ratings yet
Agua 2
5 pages

Lecture Slides - Linear Reg

Uploaded by

Lecture Slides - Linear Reg

Uploaded by

Linear/Logistic

• Fit the best line through observations (by minimizing error)

How can we find the BEST set of beta coefficients?

1) The Normal Equation (closed-form solution):

𝑃𝑟𝑖𝑐𝑒=𝛽0+𝛽1 𝐴𝑔𝑒+𝛽2 𝑆𝑞𝑓𝑡

If you are on the left, If you are on the right,

High learning rate

Low learning rate

• Two variables: (6 beta coefficients)

• Two variables: (10 beta coefficients)

• Higher degrees generate lots of terms

• Cost function = MSE + α

Is it baked into the algorithms?

You might also like