0% found this document useful (0 votes)
8 views

CS435 Ch6

This document discusses linear regression, which is a machine learning algorithm for predicting a numerical output value based on input feature values. It describes how linear regression works by finding weights that minimize the difference between predicted and actual output values, using an algorithm called gradient descent that iteratively adjusts the weights to reduce this error.

Uploaded by

Kareem CS
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

CS435 Ch6

This document discusses linear regression, which is a machine learning algorithm for predicting a numerical output value based on input feature values. It describes how linear regression works by finding weights that minimize the difference between predicted and actual output values, using an algorithm called gradient descent that iteratively adjusts the weights to reduce this error.

Uploaded by

Kareem CS
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

CS435

Chapter 6: Linear Regression


• Suppose we have a dataset giving the living areas and prices of
47 houses from Riyadh:

Problem for Today

• Given data like this, how can we learn to predict the prices of
other houses in Riyadh, as a function of the size of their living
Credit: areas?
CS229, Lecture Notes, Stanford University
• x(i) to denote the “input” variables (living area in this example),
also called input features, and
• y(i) to denote the “output” or target variable that we are trying
Some Notations to predict (price).
• A pair (x(i), y(i)) is called a training example, and the dataset
that we’ll be using to learn—a list of m training examples {(x (i),
y(i)); i = 1, . . . , m } —is called a training set.
• We will also use X denote the space of input values, and Y the
space of output values. In this example, X = Y = ℝ.

Credit:
CS229, Lecture Notes, Stanford University
• To describe the supervised learning problem slightly more
formally, our goal is, given a training set, to learn a function
h:X↦Y
Some Notations
• So that h(x) is a “good” predictor for the corresponding value
of y.
• For historical reasons, this function h is called a hypothesis.
• The process is therefore like this:

Credit:
CS229, Lecture Notes, Stanford University
• When the target variable that we’re trying to predict is
continuous, such as in our housing example, we call the
learning problem a regression problem.
Some Notations • When y can take on only a small number of discrete values
(such as if, given the living area, we wanted to predict if a
dwelling is a house or an apartment, say), we call it a
classification problem.

Credit:
CS229, Lecture Notes, Stanford University
• Linear regression is one of the most well known and well
understood algorithms in statistics and machine learning.
Does it belong to statistics or machine learning?
Linear Regression
• Machine learning, more specifically the field of predictive
modeling, is primarily concerned with minimizing the error of a
model or making the most accurate predictions possible.
• As such, linear regression was developed in the field of
statistics and is studied as a model for understanding the
relationship between input and output numerical variables but
has been borrowed by machine learning.
• It is both a statistical algorithm and a machine learning
• Linear regression is a linear model, e.g. a model that assumes a
linear relationship between the input variables (x) and the
single output variable (y).
• More specifically, that y can be calculated from a linear
Credit:
combination of the input variables (x).
CS229, Lecture Notes, Stanford University
• To make our housing example more interesting, lets consider a
slightly richer dataset in which we also know the number of
bedrooms in each house:
Linear Regression

• Here, the x’s are two-dimensional vectors in ℝ2.


• For instance, x1(i) is the living area of the i-th house in the
training set, and x2(i) is its number of bedrooms.

Credit:
CS229, Lecture Notes, Stanford University
• To make our housing example more interesting, lets consider a
slightly richer dataset in which we also know the number of
bedrooms in each house:
Linear Regression • Here, the x’s are two-dimensional vectors in ℝ2.
• For instance, x1(i) is the living area of the i-th house in the
training set, and x2(i) is its number of bedrooms.
• To perform supervised learning, we must decide how we’re
going to represent functions/hypotheses h in a computer.
• As an initial choice, let's say we decide to approximate y as a
linear function of x:
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2
• Here, the θi’s are the parameters (also called weight(
parameterizing the space of linear functions mapping from X to Y

• Where on the right-hand side above we are viewing θ and x both


Credit: as vectors, and here n is the number of input variables (not
CS229, Lecture Notes, Stanford University counting x0).
• Now, given a training set, how do we pick, or learn, the
parameters θ?
• One reasonable method seems to be to make h(x) close to y, at
Linear Regression least for the training examples we have.
• To formalize this, we will define a function that measures, for
each value of the θ’s, how close the h(x(i))’s are to the
corresponding y(i)’s. We define the cost function:

Credit:
CS229, Lecture Notes, Stanford University
Least Mean Square (LMS) algorithm
• We want to choose θ so as to minimize J(θ).
Linear Regression • To do so, lets use a search algorithm that starts with some “initial
guess” for θ, and that repeatedly changes θ to make J(θ) smaller,
until hopefully we converge to a value of θ that minimizes J(θ).
• Specifically, lets consider the gradient descent algorithm, which
starts with some initial θ, and repeatedly performs the update:

• (This update is simultaneously performed for all values of j = 0, . .


. , n.) Here, α is called the learning rate.
Credit:
• This is a very natural algorithm that repeatedly takes a step in
CS229, Lecture Notes, Stanford University
the direction of steepest decrease of J.
Least Mean Square (LMS) algorithm
• In order to implement this algorithm, we have to work out what
Linear Regression is the partial derivative term on the right-hand side.
• Let's first work it out for the case of if we have only one training
example (x, y), so that we can neglect the sum in the definition
of J. We have:

Credit:
CS229, Lecture Notes, Stanford University
Least Mean Square (LMS) algorithm
• In order to implement this algorithm, we have to work out what
is the partial derivative term on the right-hand side.
Linear Regression
• Let's first work it out for the case of if we have only one training
example (x, y), so that we can neglect the sum in the definition
of J. We have:

Credit: • For a single training example, this gives the update rule:
CS229, Lecture Notes, Stanford University
Linear Regression
Linear Regression

When we run batch gradient descent to fit θ on our previous dataset, to learn to


predict housing price as a function of living area, we obtain θ0 = 71.27, θ1 = 0.1345.
If we plot hθ(x) as a function of x (area), along with the training data, we obtain the
following figure:

Credit: CS229, Lecture Notes, Stanford University

You might also like