0% found this document useful (0 votes)

2 views

2-LR_Optim

The document outlines the fundamentals of linear regression in machine learning, detailing the model definition, loss function, and parameter optimization. It explains the supervised learning setup, where the goal is to predict a target value based on input features. The document also discusses the steps involved in defining the model, measuring its goodness through loss functions, and optimizing parameters to minimize error.

Uploaded by

vinay

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

2-LR_Optim

Uploaded by

vinay

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

EECS 836: Machine Learning

Zijun Yao
Assistant Professor, EECS Department
The University of Kansas

1
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

2
Supervised learning setup
• Given a collection of records (training set) Training
• Each record is characterized by a pair (x, y) set

• x: feature, attribute, independent variable

Learning
• y: target, label, dependent variable algorithm
• Goal
• Learn a model (function ) so that f (x) can Function
correctly predict 𝑦ො for the corresponding value of y
• Tasks x 𝑦ො
• Regression: to predict a continuous value 𝑦ො Feature Predicted
• Classification: to predict a category class 𝑦ො value of y

3
* x is bolded because it represents a set of features; y is not because it is just a value.
House price prediction - regression

Size of House
# of Bedrooms f Price of House
…….

4
Linear regression

Independent variables (features) x Dependent variables (targets) y

• Given
• Data
• Corresponding labels

• Goal: find a continuous function that models the continuous points

5
3 ML steps for linear regression

Step 1: Step 2: Step 3: pick

define a set goodness the best
of function* of function function

Define a model Measure the error Optimizing parameters

*A set of function means same model but with different values of parameter. 6
Step 1: Model definition

Predicted value of y

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 7

Step 1: Model definition

Bias: a fixed offset Weights: significance of each feature

Parameters

Predicted value of y

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 8

Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

9
Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

Linear Regression model

10
Step 1: Define set of functions
Size of
house

𝑓 𝑥 = price 𝑦
𝑥𝑠𝑖𝑧𝑒 , 𝑥𝑏𝑎𝑡ℎ
# of bath

Linear Regression model

w and b are parameters (can be any value)

A set of f1:
f1 , f 2 
function f2:
With different values of parameters …… infinite 11
Step 1: A variant form of linear model

1st Feature 2nd Feature d-th Feature

12
Step 1: A variant form of linear model

Equivalence
by

1st Feature 2nd Feature d-th Feature

0st Feature 1st Feature 2nd Feature d-th Feature

13
Step 1: A variant form of linear model

0st Feature 1st Feature 2nd Feature d-th Feature

Prediction Parameters

Linear Regression model

Function of 𝑥 Features

14
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

15
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model

A set of
function

f1 , f 2 

Training
Data

16
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model function function

input: Output (scalar): label value:

A set of (𝟏)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,043 sqft
function (𝟏)
𝒙𝒂𝒈𝒆 : 26 years 𝑦ො (1) = f(𝐱 (1) ) 𝑦 (1) 784,000

f1 , f 2 

Training (𝟐)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,976 sqft
Data (𝟐) 𝑦ො (2) = f(𝐱 (2) ) 𝑦 (2) 724,900
𝒙𝒂𝒈𝒆 : 8 years
Suppose we have two house features in
this data: size and age Superscript means the data index
Step 2: Goodness of function
How good is a function? - measure the difference between predicted and true y

Model function function

input: Output (scalar): label value:
Measure the difference
A set of (𝟏)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,043 sqft
function (𝟏)
𝒙𝒂𝒈𝒆 : 26 years 𝑦ො (1) = f(𝐱 (1) ) 𝑦 (1) 784,000

f1 , f 2 

Training (𝟐)
𝒙𝒔𝒊𝒛𝒆 ∶ 4,976 sqft
Data (𝟐) 𝑦ො (2) = f(𝐱 (2) ) 𝑦 (2) 724,900
𝒙𝒂𝒈𝒆 : 8 years
Suppose we have two house features in
this data: size and age Superscript means the data index
Step 2: Measure error
How good is a function? - Loss function 𝐿

A set of Model
Input: a function Output: the loss - how far is
function f1 , f 2  and data prediction from true value
𝑛 Sum of square error (SSE)
2
Goodness of 𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
function f 𝑖=1
Sum over examples Estimated y based
on input function
Training 1
𝑛
2
Averaged by n, you have mean squared error (MSE) loss ෍ 𝑦 (𝑖) − 𝑓 𝑥 (𝑖)
Data 𝑛
𝑖=1

Loss function also called cost function and objective function

Step 2: Measure error
How good is a function? - Loss function 𝐿
𝑛
Input: a function and data 2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
output: the loss - how far is 𝑖=1
prediction from true value Sum over examples
True y
Estimated y based
on input function
𝐿 𝑤𝑠𝑖𝑧𝑒 , 𝑤𝑎𝑔𝑒 , 𝑏
𝑛
2
(𝑖) (𝑖)
=෍ 𝑦 (𝑖) − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
𝑖=1

Loss function also called cost function and objective function

Step 2: Loss function

𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
𝑖=1

Estimated y based on
input function

A simple case where only one feature is used to predict y 21

Step 2: Loss function

𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑓 𝑥 (𝑖)
𝑖=1

A simple case where only one feature is used to predict y When there are 2 features to predict y 22
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15
𝑤=1
𝑦 2 L 𝑓 10

1 5

𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−1)2 +(2−2)2 +(3−3)2 = 0
23
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15

𝑤 = 0.5 10
𝑦 2 L 𝑓

1 5

𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0.5)2 +(2−1)2 +(3−1.5)2 = 3.5
24
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓

3 15

𝑦 2 L 𝑓 10

1 5
𝑤=0
𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0)2 +(2−0)2 +(3−0)2 = 14
25
Step 2: Intuition of loss function
• Let’s use a simple case with only one feature
𝑛
2
𝐿 𝑓 =෍ 𝑦 (𝑖) −𝑤∙𝑥 𝑖

𝑖=1
𝑓 𝑥 𝐿 𝑓
Loss function L is
3 15 a concave

𝑦 2 L 𝑓 10

1 5
𝑤=0
𝑏=0
0 1 2 3 0 0.5 1 1.5
𝑥 𝑤
𝐿 𝑓 = (1−0)2 +(2−0)2 +(3−0)2 = 14
26
Step 2: Intuition of loss function

One-feature case Two-feature case

L(𝑓)
𝐿 𝑓 10

0 0.5 1 1.5 𝑤1
𝑤2
𝑤

Loss function tracks the performance of model as parameters change

27
Agenda

• Linear Regression model

• Model definition

• Loss function

• Optimizing parameters

28
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss
Goodness of
function f

Training
Data

29
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss The “best” – the function

Goodness of gives minimum loss
function f
Optimizing Search 𝑤, 𝑏 to find
Parameters minimum 𝐿
Training
Data

30
Step 3: Find the best function

𝐿 𝑤, 𝑏
𝑛
A set of Model 𝑖
2
=෍ 𝑦𝑖 − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑖
𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
function f1 , f 2  𝑖=1

Loss The “best” – the function

Goodness of gives minimum loss
function f
𝑓 ∗ = 𝑎𝑟𝑔 min L 𝑓 Optimizing Search 𝑤, 𝑏 to find
𝑓
Parameters minimum 𝐿
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min L 𝑤, 𝑏
𝑤,𝑏
Training 𝑛
2
Data (𝑖) (𝑖)
= 𝑎𝑟𝑔 min ෍ 𝑦 (𝑖) − 𝑏 + 𝑤𝑠𝑖𝑧𝑒 ∙ 𝑥𝑠𝑖𝑧𝑒 + 𝑤𝑎𝑔𝑒 ∙ 𝑥𝑎𝑔𝑒
𝑤,𝑏
𝑖=1
31
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

The sensitivity to change of loss function

with respect to a change in a parameter 𝑤

32
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

Function derivative with respect to one of

parameters 𝑤𝑖 , with the others held constant

33
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

• Gradients
Gradients is a vector consists of
partial derivative of each parameter

34
𝐿(𝑤)
Derivatives Tangent line (𝑤=1)

𝐿(𝑤)
• The derivative of loss function 𝐿 is

• Partial Derivatives: Let be loss function of parameters

• Gradients How to reduce Loss function?

Subtract gradient from each
parameter w
Gradient is a direction that increase the value of 𝐿(𝒘)
35
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0 at time 0

𝑑𝐿
➢ Compute | 0
Loss 𝑑𝑤 𝑤=𝑤
𝐿 𝑤 Negative Increase w

Positive Decrease w

w0 w
36
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0 at time 0

𝑑𝐿
➢ Compute | 𝑡 𝑑𝐿
Loss 𝑑𝑤 𝑤=𝑤 𝑤 𝑡+1 ← 𝑤𝑡 −𝛼 |𝑤=𝑤 𝑡
𝐿 𝑤 𝑑𝑤

𝛼 is called
w0 𝑑𝐿 w1 “learning rate” w
−𝛼 |𝑤=𝑤 0 37
𝑑𝑤 Usually small, like 0.05
Step 3: Gradient descent
∗
• Consider loss function 𝐿(𝑤) with one parameter w: 𝑤 = 𝑎𝑟𝑔 min
𝑤
𝐿 𝑤

➢ (Randomly) Pick an initial value w0

𝑑𝐿
➢ Compute | 0 1 0
𝑑𝐿
Loss 𝑑𝑤 𝑤=𝑤 𝑤 ←𝑤 −𝛼 |𝑤=𝑤 0
𝐿 𝑤 𝑑𝑤
𝑑𝐿
➢ Compute |𝑤=𝑤 1 2 1
𝑑𝐿
𝑑𝑤 𝑤 ←𝑤 −𝛼 |𝑤=𝑤 1
𝑑𝑤
repeat iterations until convergence
Local global
minima minima
w0 w1 w2 wT w
38
𝜕𝐿
Step 3: Gradient descent 𝛻𝐿 = 𝜕𝑤
𝜕𝐿
• How about two parameters? 𝜕𝑏 gradient
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿 𝑤, 𝑏
𝑤,𝑏

➢ (Randomly) Pick an initial value for each parameter w0, b0

𝜕𝐿 𝜕𝐿
➢ Compute | 0 0, | 0 0
𝜕𝑤 𝑤=𝑤 ,𝑏=𝑏 𝜕𝑏 𝑤=𝑤 ,𝑏=𝑏

1 0
𝜕𝐿 𝜕𝐿
𝑤 ←𝑤 −𝛼 |𝑤=𝑤 0 ,𝑏=𝑏0 𝑏1 ← 𝑏0 − 𝛼 |𝑤=𝑤 0 ,𝑏=𝑏0
𝜕𝑤 𝜕𝑏
𝜕𝐿 𝜕𝐿
➢ Compute | 1 1, | 1 1
𝜕𝑤 𝑤=𝑤 ,𝑏=𝑏 𝜕𝑏 𝑤=𝑤 ,𝑏=𝑏

𝜕𝐿 𝜕𝐿
𝑤2 ← 𝑤1 −𝛼 |𝑤=𝑤 1 ,𝑏=𝑏1 2 1
𝑏 ← 𝑏 − 𝛼 |𝑤=𝑤 1 ,𝑏=𝑏1
𝜕𝑤 𝜕𝑏
repeat iterations until convergence
39
Step 3: Gradient descent
• Gradient of Linear Regression

40
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 41
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 42
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 43
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 44
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 45
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 46
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 47
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 48
Step 3: Gradient descent
𝑓 𝑥 𝐿 𝑤, 𝑏 Find minimum!
High 𝐿

𝑤
Low 𝐿
𝑏

How does the loss get minimized by gradient descent

Slide by Andrew Ng 49
Step 3: Gradient descent
• Small gradient can slow down or halt the optimization
Loss
Very slow at
the plateau
Stuck at
saddle point
Stuck at local minima

𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤
≈0 =0 =0

The value of the parameter w 50

Step 3: Gradient descent – learning rate 𝜶

Monitor the loss at each iteration

- Search 𝛼 by order of magnitude
at first (like, 10−1 … 10−5 )
- Tune 𝛼 locally to achieve efficient
convergence

http://www.bdhammel.com/learning-rates/
Linear algebra review
• Vector in ℝ𝑑 is an ordered set of 𝑑 real values
• Matrix in ℝ𝑛×𝑚 is a 𝑛 by 𝑚 object with 𝑛 rows and 𝑚
columns
• Transpose

• Matrix production

52
4×2 2×3 4×3
Vectorization of linear regression
• Benefits of vectorization
• More compact equations
• Faster code (using optimized matrix libraries)
• Linear regression model:

• Let

• In vectorized form, linear regression model:

53
Vectorization of linear regression
• Consider the model for 𝑛 instance

• Let

ℝ(𝑑+1)×1 ℝ𝑛×(𝑑+1)
• In vectorized form, linear regression model:
54
Vectorization of linear regression
• For the loss function

One time calculation, without iterating through all data samples.

55
Improving learning
• Feature scaling (or normalization)
• Ensure all features have similar scales
• Gradient descent would converge faster

56
https://towardsdatascience.com/gradient-descent-algorithm-and-its-variants-10f652806a3
Feature standardization
• Rescale features to have zero mean and unit variance

• Let 𝜇𝑗 be the mean of feature 𝑗

• Let 𝑠𝑗 be the standard deviation of feature 𝑗

• Replace each value with

for 𝑗 = 1 … 𝑑 (not 𝑥0 )

• Must apply the same transformation for both training and testing instances

• Outliers can cause problems

57
Regularization
• A method to control the complexity of model, avoid overfitting
• Why - address overfitting issues by keeping 𝑤 small
• How - Penalize for large value of 𝑤𝑗
• Can incorporate into the loss function
• Works well when we have a lot of features

𝑛 𝑑
2
Also called 𝐿2 -norm
𝐿 𝑓 =෍ 𝑦 𝑖 −𝑓 𝑥 𝑖 + 𝜆 ෍ 𝑤𝑗2
𝑖=1 𝑗=1

model fit to data regularization

o 𝜆 is the predefined hyperparameter to control the degree of regularization
o No regularization on 𝑤0 (bias 𝑏)
58
Summary
• Problem: estimate a real value
• Model: 𝑦ො = 𝑏 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑑 𝑥𝑑
• Loss function: sum of square (SSE)
𝑛 𝑛
1
𝐿 𝐰, 𝑏 = ෍ 𝑙 𝑖 𝐰, 𝑏 = ෍ (𝑦 𝑖
− 𝑦ො (𝑖) )2
𝑛
𝑖=1 𝑖=1

• Optimize parameters by Gradient Descent method

• Choose a starting point
• Repeat
• Compute gradient
• Update parameters
59
Demo
• Use ML library
• https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

Unit 2 Machine Learning Notes
100% (1)
Unit 2 Machine Learning Notes
25 pages
Machine Learning Cheat Sheet
100% (1)
Machine Learning Cheat Sheet
211 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Week 04
No ratings yet
Week 04
101 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
eng
No ratings yet
eng
10 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Lecture3_Linear Regression and Logistic Regression
No ratings yet
Lecture3_Linear Regression and Logistic Regression
60 pages
D2L CH3 Part1
No ratings yet
D2L CH3 Part1
36 pages
ML Lecture # 02 Linear Regression
No ratings yet
ML Lecture # 02 Linear Regression
28 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Unit 2
No ratings yet
Unit 2
35 pages
CM20315 02 Supervised
No ratings yet
CM20315 02 Supervised
53 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Lecture 6,7-Linear Regression
No ratings yet
Lecture 6,7-Linear Regression
47 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
HongDaiNghia NguyenPhucToan
No ratings yet
HongDaiNghia NguyenPhucToan
33 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Linear Regression
100% (1)
Linear Regression
51 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
1-Review of Linear Regression
No ratings yet
1-Review of Linear Regression
29 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
Lecture 4 - Cost Function
No ratings yet
Lecture 4 - Cost Function
18 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
02 Linear Regression
No ratings yet
02 Linear Regression
40 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Lec 03
No ratings yet
Lec 03
42 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
linear regression
No ratings yet
linear regression
130 pages
Lec6 Linear Model With LSP
No ratings yet
Lec6 Linear Model With LSP
35 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Question 1 B
No ratings yet
Question 1 B
6 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
2.1 Linear Regression
No ratings yet
2.1 Linear Regression
39 pages
Regression Linear Simple
No ratings yet
Regression Linear Simple
37 pages
vertopal.com_22644501_lab02 (4)
No ratings yet
vertopal.com_22644501_lab02 (4)
14 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
5.1Loss Function, Optimization,Gd
No ratings yet
5.1Loss Function, Optimization,Gd
39 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Introduction to Advanced Mathematical Analysis
From Everand
Introduction to Advanced Mathematical Analysis
Simone Malacrida
No ratings yet
Linear Regression
No ratings yet
Linear Regression
30 pages
AMP Chapter 13 - Nonlinier Programing
No ratings yet
AMP Chapter 13 - Nonlinier Programing
13 pages
DSA5105 Lecture5
No ratings yet
DSA5105 Lecture5
52 pages
AIML-Module-3-part 2
No ratings yet
AIML-Module-3-part 2
122 pages
Levenberg Marquardt Algorithm
100% (5)
Levenberg Marquardt Algorithm
5 pages
Lecture 04 (3hrs) Neural Network and Deep Learning-Part A
No ratings yet
Lecture 04 (3hrs) Neural Network and Deep Learning-Part A
76 pages
Machine Learning Assignments
No ratings yet
Machine Learning Assignments
3 pages
Radial Basis Function Interpolation: Wilna Du Toit
No ratings yet
Radial Basis Function Interpolation: Wilna Du Toit
58 pages
Artigo Tese 4 - PRATICO
No ratings yet
Artigo Tese 4 - PRATICO
9 pages
Dommel Tinney Opf
No ratings yet
Dommel Tinney Opf
11 pages
Machine Learning
No ratings yet
Machine Learning
135 pages
Quiz-2 Along With Solution
No ratings yet
Quiz-2 Along With Solution
2 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
Character Recognition Using ANN
No ratings yet
Character Recognition Using ANN
52 pages
A Limited T,: Memory Algorithm For Bound Constrained T, T
No ratings yet
A Limited T,: Memory Algorithm For Bound Constrained T, T
19 pages
Practice questions-EE5180
No ratings yet
Practice questions-EE5180
2 pages
Subjective Questions
No ratings yet
Subjective Questions
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Department of Computer Science: Mcgill University
No ratings yet
Department of Computer Science: Mcgill University
6 pages
Module 2
No ratings yet
Module 2
44 pages
Banana Function - Steepest Descent
No ratings yet
Banana Function - Steepest Descent
7 pages
4 - Multidimensional Gradient Method
No ratings yet
4 - Multidimensional Gradient Method
14 pages
SGN 21006 Advanced Signal Processing: Stochastic Gradient Based Adaptation: Least Mean Square (LMS) Algorithm
No ratings yet
SGN 21006 Advanced Signal Processing: Stochastic Gradient Based Adaptation: Least Mean Square (LMS) Algorithm
30 pages
Development of An Original Process For The Preliminary Design of An Uav System Based On Multiple Modules
No ratings yet
Development of An Original Process For The Preliminary Design of An Uav System Based On Multiple Modules
23 pages
Assignment Week 4-Deep-Learning PDF
100% (1)
Assignment Week 4-Deep-Learning PDF
7 pages

2-LR_Optim

Uploaded by

2-LR_Optim

Uploaded by

EECS 836: Machine Learning

• Linear Regression model

• x: feature, attribute, independent variable

Independent variables (features) x Dependent variables (targets) y

• Goal: find a continuous function that models the continuous points

Step 1: Step 2: Step 3: pick

Define a model Measure the error Optimizing parameters

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 7

Bias: a fixed offset Weights: significance of each feature

Target 1st Feature 2nd Feature d-th Feature

A linear relationship between feature and target Data 8

Linear Regression model

Linear Regression model

w and b are parameters (can be any value)

1st Feature 2nd Feature d-th Feature

1st Feature 2nd Feature d-th Feature

0st Feature 1st Feature 2nd Feature d-th Feature

Linear Regression model

• Linear Regression model

Model function function

Model function function

Loss function also called cost function and objective function

Loss function also called cost function and objective function

A simple case where only one feature is used to predict y 21

One-feature case Two-feature case

Loss function tracks the performance of model as parameters change

• Linear Regression model

Loss The “best” – the function

Loss The “best” – the function

The sensitivity to change of loss function

• Partial Derivatives: Let be loss function of parameters

Function derivative with respect to one of

• Partial Derivatives: Let be loss function of parameters

• Partial Derivatives: Let be loss function of parameters

• Gradients How to reduce Loss function?

➢ (Randomly) Pick an initial value w0 at time 0

➢ (Randomly) Pick an initial value w0 at time 0

➢ (Randomly) Pick an initial value w0

➢ (Randomly) Pick an initial value for each parameter w0, b0

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

How does the loss get minimized by gradient descent

The value of the parameter w 50

Monitor the loss at each iteration

• In vectorized form, linear regression model:

One time calculation, without iterating through all data samples.

• Let 𝜇𝑗 be the mean of feature 𝑗

• Let 𝑠𝑗 be the standard deviation of feature 𝑗

• Replace each value with

• Outliers can cause problems

model fit to data regularization

• Optimize parameters by Gradient Descent method

You might also like