0% found this document useful (0 votes)

14 views

Wk05 machine learning

Uploaded by

bhaskarsupplychain

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Wk05 machine learning

Uploaded by

bhaskarsupplychain

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Week-5: Linear Regression - Introduction

Sherry Thomas
21f3001449

Contents
Introduction to Supervised Learning 1

Linear Regression 2

Optimizing the Error Function 3

Gradient Descent 3
Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . . . 4

Kernel Regression 4

Probabilistic View of Linear Regression 5

Acknowledgments 6
Abstract
The week commences with an exploration of Supervised Learning,
specifically focusing on the topic of Regression. The aim is to provide
a comprehensive understanding of the underlying mechanism of this pop-
ular machine learning technique, and its various applications. Addition-
ally, this study delves into the variants of Regression, including kernel
regression, and examines the probabilistic aspects of the technique.

Introduction to Supervised Learning

Supervised learning, a fundamental machine learning technique, involves train-
ing an algorithm on labeled data where the target variable or outcome is known.
The primary objective of supervised learning is to establish a mapping between
input data and corresponding output variables.
Given a dataset x1 , … , x𝑛 , where each x𝑖 belongs to ℝ𝑑 , the corresponding labels
y1 , … , y𝑛 can fall into the following categories:
• Regression: y𝑖 ∈ ℝ (e.g., Rainfall Prediction)
• Binary Classification: y𝑖 ∈ {0, 1} (e.g., Distinguishing between cats and
dogs)
• Multi-class Classification: y𝑖 ∈ {0, 1, … , 𝐾} (e.g., Digit classification)

1
Linear Regression
Linear regression is a supervised learning algorithm employed to predict a con-
tinuous output variable based on one or more input features, assuming a linear
relationship between the input and output variables. The primary objective of
linear regression is to determine the line of best fit that minimizes the sum of
squared errors between the predicted and actual output values.
Given a dataset x1 , … , x𝑛 where each x𝑖 belongs to ℝ𝑑 , and the corresponding
labels y1 , … , y𝑛 belong to ℝ, the goal of linear regression is to find a mapping
between the input and output variables, represented as follows:

ℎ ∶ ℝ𝑑 → ℝ

The error for this mapping function can be quantified as:

𝑛
error(ℎ) = ∑(ℎ(x𝑖 ) − y𝑖 )2
𝑖=1

Ideally, this error should be minimized, which occurs when ℎ(x𝑖 ) = y𝑖 for all 𝑖.
However, achieving this may only result in memorizing the data and its outputs,
which is not a desired outcome.
To mitigate the memorization problem, introducing a structure to the mapping
becomes necessary. The simplest and commonly used structure is linear, which
we will adopt as the underlying structure for our data.
Let ℋlinear denote the solution space for the mapping in the linear domain:

ℋlinear = {ℎ𝑤 ∶ ℝ𝑑 → ℝ s.t. ℎ𝑤 (x) = w𝑇 x ∀w ∈ ℝ𝑑 }

Thus, our objective is to minimize:

𝑛
min ∑(ℎ(x𝑖 ) − y𝑖 )2
ℎ∈ℋlinear
𝑖=1
Equivalently,
𝑛
min ∑(w𝑇 x𝑖 − y𝑖 )2
w∈ℝ𝑑
𝑖=1

Optimizing the above objective is the main aim of the linear regression algo-
rithm.

2
Optimizing the Error Function
The minimization equation can be expressed in vectorized form as:

min ‖X𝑇 w − y‖22

w∈ℝ𝑑

Defining a function 𝑓(w) that captures this minimization problem, we have:

𝑓(w) = min ‖X𝑇 w − y‖22

w∈ℝ𝑑
𝑓(w) = (X w − y)𝑇 (X𝑇 w − y)
𝑇

∴∇𝑓(w) = 2(XX𝑇 )w − 2(Xy)

Setting the gradient equation to zero, we obtain:

(XX𝑇 )w∗ = Xy
∴w∗ = (XX𝑇 )+ Xy

Here, (XX𝑇 )+ represents the pseudo-inverse of XX𝑇 .

Further analysis reveals that X𝑇 w corresponds to the projection of the labels
onto the subspace spanned by the features.

Gradient Descent
The normal equation for linear regression, as shown above, involves calculating
(XX𝑇 )+ , which can be computationally expensive with a complexity of 𝑂(𝑑3 ).
Since w∗ represents the solution of an unconstrained optimization problem, it
can be solved using gradient descent. The iterative formula for gradient descent
is:

w𝑡+1 = w𝑡 − 𝜂𝑡 ∇𝑓(w𝑡 )
∴w𝑡+1 = w𝑡 − 𝜂𝑡 [2(XX𝑇 )w𝑡 − 2(Xy)]

Here, 𝜂 is a scalar that controls the step-size of the descent, and 𝑡 represents
the current iteration.
Even in the above equation, the calculation of XX𝑇 is required, which remains
computationally expensive. Is there a way to further enhance this process?

3
Stochastic Gradient Descent
Stochastic gradient descent (SGD) is an optimization algorithm widely employed
in machine learning to minimize the loss function of a model by determining
the optimal parameters. Unlike traditional gradient descent, which updates the
model parameters based on the entire dataset, SGD updates the parameters
using a randomly selected subset of the data, known as a batch. This approach
leads to faster training times and makes SGD particularly suitable for handling
large datasets.
Instead of updating w using the entire dataset at each step 𝑡, SGD leverages a
small randomly selected subset of 𝑘 data points to update w. Consequently, the
new gradient becomes 2(X̃ X̃ 𝑇 w𝑡 − X̃ y),
̃ where X̃ and ỹ represent small samples
randomly chosen from the dataset. This strategy is feasible since X̃ ∈ ℝ𝑑×𝑘 ,
which is considerably smaller compared to X.
After 𝑇 rounds of training, the final estimate is obtained as follows:

1 𝑇
w𝑇SGD = ∑ w𝑖
𝑇 𝑖=1

The stochastic nature of SGD contributes to optimal convergence to a certain

extent.

Kernel Regression
What if the data points reside in a non-linear subspace? Similar to dealing with
non-linear data clustering, kernel functions are employed in this scenario as well.
Let w∗ = X𝛼∗ , where 𝛼∗ ∈ ℝ𝑛 .

X𝛼∗ = w∗
∴X𝛼∗ = (XX𝑇 )+ Xy
(XX𝑇 )X𝛼∗ = (XX𝑇 )(XX𝑇 )+ Xy
(XX𝑇 )X𝛼∗ = Xy
X𝑇 (XX𝑇 )X𝛼∗ = X𝑇 Xy
(X𝑇 X)2 𝛼∗ = X𝑇 Xy
K2 𝛼∗ = Ky
∴𝛼∗ = K−1 y

Here, K ∈ ℝ𝑛×𝑛 , and it can be obtained using a kernel function such as the
Polynomial Kernel or RBF Kernel.
To predict using 𝛼 and the kernel function, let Xtest ∈ ℝ𝑑×𝑚 represent the test
dataset. The prediction is made as follows:

𝑛
w∗ 𝜙(Xtest ) = ∑ 𝛼∗𝑖 𝑘(x𝑖 , xtest𝑖 )
𝑖=1

4
Here, 𝛼∗𝑖 denotes the importance of the 𝑖-th data point in relation to w∗ , and
𝑘(x𝑖 , xtest𝑖 ) signifies the similarity between xtest𝑖 and x𝑖 .

Probabilistic View of Linear Regression

Consider a dataset x1 , … , x𝑛 with x𝑖 ∈ ℝ𝑑 , and the corresponding labels
y1 , … , y𝑛 with y𝑖 ∈ ℝ. The probabilistic view of linear regression assumes that
the target variable y𝑖 can be modeled as a linear combination of the input
features x𝑖 , with an additional noise term 𝜖 following a zero-mean Gaussian
distribution with variance 𝜎2 . Mathematically, this can be expressed as:

y𝑖 = w𝑇 x𝑖 + 𝜖𝑖

where w ∈ ℝ𝑑 represents the weight vector that captures the relationship be-
tween the inputs and the target variable.
To estimate the weight vector w that best fits the data, we can apply the
principle of Maximum Likelihood (ML). The ML estimation seeks to find the
parameter values that maximize the likelihood of observing the given data.
Assuming that the noise term 𝜖𝑖 follows a zero-mean Gaussian distribution with
variance 𝜎2 , we can express the likelihood function as:

ℒ(w; X, y) = 𝑃 (y|X; w)
𝑛
= ∏ 𝑃 (y𝑖 |x𝑖 ; w)
𝑖=1
𝑛
1 (w𝑇 x𝑖 − y𝑖 )2
=∏√ exp (− )
𝑖=1 2𝜋𝜎 2𝜎2

Taking the logarithm of the likelihood function, we have:

𝑛
1 (w𝑇 x𝑖 − y𝑖 )2
log ℒ(w; X, y) = ∑ log ( √ )−
𝑖=1 2𝜋𝜎 2𝜎2
𝑛 1 𝑛
=− log(2𝜋𝜎2 ) − 2 ∑(w𝑇 x𝑖 − y𝑖 )2
2 2𝜎 𝑖=1

To find the maximum likelihood estimate wML , we want to maximize

log ℒ(w; X, y). Maximizing the likelihood is equivalent to minimizing the
negative log-likelihood. Thus, we seek to minimize:

1 𝑛
− log ℒ(w; X, y) = ∑(w𝑇 x𝑖 − y𝑖 )2
2𝜎2 𝑖=1

5
This expression is equivalent to the mean squared error (MSE) objective function
used in linear regression. Therefore, finding the maximum likelihood estimate
wML is equivalent to solving the linear regression problem using the squared
error loss.
To obtain the closed-form solution for wML , we differentiate the negative log-
likelihood with respect to w and set the derivative to zero:

1 𝑛
∇w (− log ℒ(w; X, y)) = ∑(w𝑇 x𝑖 − y𝑖 )x𝑇𝑖 = 0
𝜎2 𝑖=1

This can be rewritten as:

1
(XX𝑇 w − Xy) = 0
𝜎2

where X is the matrix whose rows are the input vectors x𝑖 and y is the column
vector of labels. Rearranging the equation, we have:

XX𝑇 w = Xy

To obtain the closed-form solution for wML , we multiply both sides by the
inverse of XX𝑇 , denoted as (XX𝑇 )−1 :

wML = (XX𝑇 )−1 Xy

Thus, the closed-form solution for the maximum likelihood estimate wML is
given by the product of (XX𝑇 )−1 and Xy.
The closed-form solution for wML in linear regression demonstrates that it can
be obtained by directly applying a matrix inverse operation to the product of
the input matrix X and the target variable vector y. This closed-form solution
provides an eﬀicient and direct way to estimate the weight vector w based on
the given data.

Acknowledgments
Professor Arun Rajkumar: The content, including the concepts and nota-
tions presented in this document, has been sourced from his slides and lectures.
His expertise and educational materials have greatly contributed to the devel-
opment of this document.

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Introml 02 Regression Annotated PDF
No ratings yet
Introml 02 Regression Annotated PDF
26 pages
Lecture 3_Regression (1)
No ratings yet
Lecture 3_Regression (1)
47 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
2. Linear_ Regression_SGD
No ratings yet
2. Linear_ Regression_SGD
71 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
GradientDescent-Regression_slides
No ratings yet
GradientDescent-Regression_slides
26 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
CS464 Ch9 LinearRegression
100% (1)
CS464 Ch9 LinearRegression
43 pages
Chapter Regression
No ratings yet
Chapter Regression
10 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Final Ml
No ratings yet
Final Ml
54 pages
fileml
No ratings yet
fileml
54 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Lecture3_upload
No ratings yet
Lecture3_upload
28 pages
Mlfa Autumn 22 Lec 02
No ratings yet
Mlfa Autumn 22 Lec 02
24 pages
IML-Summary
No ratings yet
IML-Summary
12 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
ML-2
No ratings yet
ML-2
155 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
M6 RegressionLinearModels v2
No ratings yet
M6 RegressionLinearModels v2
97 pages
ML_Lec 4-introduction to regression
No ratings yet
ML_Lec 4-introduction to regression
65 pages
Group 30 Ppt
No ratings yet
Group 30 Ppt
33 pages
eng
No ratings yet
eng
10 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Group30 Linear Regression
No ratings yet
Group30 Linear Regression
20 pages
Chapter II - Lecture 3 - Linear Regression
No ratings yet
Chapter II - Lecture 3 - Linear Regression
44 pages
Supervised Machine Learning - Regression
No ratings yet
Supervised Machine Learning - Regression
34 pages
lecture3_supervised_learning_I
No ratings yet
lecture3_supervised_learning_I
84 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
No ratings yet
Chapter 4 - Linear Model: Prepared By: Shier Nee, SAW Based On: Probabilistic Machine Learning by Kevin Murphy
42 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Machine Learning: Introduction and Linear Regression
No ratings yet
Machine Learning: Introduction and Linear Regression
29 pages
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
No ratings yet
Exam in Statistical Machine Learning Statistisk Maskininlärning (1RT700)
12 pages
10 Regression, Including Least-Squares Linear and Logistic Regression
No ratings yet
10 Regression, Including Least-Squares Linear and Logistic Regression
5 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lecture 09_02.09.2024_Regression-01
No ratings yet
Lecture 09_02.09.2024_Regression-01
62 pages
Wk02 machine learning
No ratings yet
Wk02 machine learning
4 pages
Wk04 machine learning
No ratings yet
Wk04 machine learning
6 pages
Wk01 machine learning
No ratings yet
Wk01 machine learning
6 pages
Wk03 machine learning
No ratings yet
Wk03 machine learning
5 pages
SAP MM Interview Questions
No ratings yet
SAP MM Interview Questions
30 pages
Samsung ProRo4u TV Service Manual
No ratings yet
Samsung ProRo4u TV Service Manual
174 pages
ZVEI Industrie 40 Component English
No ratings yet
ZVEI Industrie 40 Component English
2 pages
RAHUL
No ratings yet
RAHUL
3 pages
Ite1002 Web-Technologies Eth 1.1 47 Ite1002
No ratings yet
Ite1002 Web-Technologies Eth 1.1 47 Ite1002
5 pages
3 Best Beaver Builder Addons To Power Up Your WordPress Website
No ratings yet
3 Best Beaver Builder Addons To Power Up Your WordPress Website
7 pages
IPSEC SITE-TO-SITE VPN
No ratings yet
IPSEC SITE-TO-SITE VPN
12 pages
Machine Learning Infographics by Slidesgo
No ratings yet
Machine Learning Infographics by Slidesgo
38 pages
TOP 50 Logbook Niches List - TOP 50 Logbook Niches List
No ratings yet
TOP 50 Logbook Niches List - TOP 50 Logbook Niches List
2 pages
Factoring Trinomials Completely Algebra 1 Homework
100% (1)
Factoring Trinomials Completely Algebra 1 Homework
6 pages
Impact of Online Interactions On The Self
No ratings yet
Impact of Online Interactions On The Self
9 pages
KDE13SS3 Spare Parts List
No ratings yet
KDE13SS3 Spare Parts List
36 pages
What's The Difference Between Robotics and Artificial Intelligence?
No ratings yet
What's The Difference Between Robotics and Artificial Intelligence?
8 pages
Engineering Encyclopedia: Combustion Gas Turbines
No ratings yet
Engineering Encyclopedia: Combustion Gas Turbines
30 pages
Papacad: Light Gantry Crane 500 KG
No ratings yet
Papacad: Light Gantry Crane 500 KG
1 page
STULZ CeilAir Engineering Manual 60Hz QE-OHS0023A
No ratings yet
STULZ CeilAir Engineering Manual 60Hz QE-OHS0023A
120 pages
Copy of TNP_SAMPLE_RESUME
No ratings yet
Copy of TNP_SAMPLE_RESUME
1 page
Water Flow Chart-July-22
No ratings yet
Water Flow Chart-July-22
1 page
GT100IEMPI Datasheet V20 RevB
No ratings yet
GT100IEMPI Datasheet V20 RevB
2 pages
Data2Unified M32 en
No ratings yet
Data2Unified M32 en
35 pages
Assignment For Digital Payments
No ratings yet
Assignment For Digital Payments
7 pages
Script-for-Video-1
No ratings yet
Script-for-Video-1
10 pages
ChE Calc Reactive Recyle Bypass and Purge - 001
No ratings yet
ChE Calc Reactive Recyle Bypass and Purge - 001
18 pages
STQC_IOTSCS_ER_009_Prama
No ratings yet
STQC_IOTSCS_ER_009_Prama
3 pages
ShopeeFood Template For New - Update Request - en
No ratings yet
ShopeeFood Template For New - Update Request - en
17 pages
(Ebook) Telecommunications demystified: a streamlined course in digital communications by Carl R. Nassar ISBN 9781878707550, 1878707558 - The ebook is available for instant download, read anywhere
100% (1)
(Ebook) Telecommunications demystified: a streamlined course in digital communications by Carl R. Nassar ISBN 9781878707550, 1878707558 - The ebook is available for instant download, read anywhere
54 pages
BDML - v2.1 - PDF - Module 1 - Introduction To Google Cloud Platform
No ratings yet
BDML - v2.1 - PDF - Module 1 - Introduction To Google Cloud Platform
58 pages
Brochure 2.0
No ratings yet
Brochure 2.0
15 pages
Company Name Date of Recent Assessment Version No. Date of Next Review Approved by & Date
No ratings yet
Company Name Date of Recent Assessment Version No. Date of Next Review Approved by & Date
20 pages
Crash 2024 07 14 - 13.22.00 Server
No ratings yet
Crash 2024 07 14 - 13.22.00 Server
13 pages

Wk05 machine learning

Uploaded by

Wk05 machine learning

Uploaded by

Week-5: Linear Regression - Introduction

Optimizing the Error Function 3

Probabilistic View of Linear Regression 5

Introduction to Supervised Learning

The error for this mapping function can be quantified as:

ℋlinear = {ℎ𝑤 ∶ ℝ𝑑 → ℝ s.t. ℎ𝑤 (x) = w𝑇 x ∀w ∈ ℝ𝑑 }

Thus, our objective is to minimize:

min ‖X𝑇 w − y‖22

Defining a function 𝑓(w) that captures this minimization problem, we have:

𝑓(w) = min ‖X𝑇 w − y‖22

∴∇𝑓(w) = 2(XX𝑇 )w − 2(Xy)

Setting the gradient equation to zero, we obtain:

Here, (XX𝑇 )+ represents the pseudo-inverse of XX𝑇 .

The stochastic nature of SGD contributes to optimal convergence to a certain

Probabilistic View of Linear Regression

Taking the logarithm of the likelihood function, we have:

To find the maximum likelihood estimate wML , we want to maximize

This can be rewritten as:

wML = (XX𝑇 )−1 Xy

You might also like