CS435 Ch6
CS435 Ch6
• Given data like this, how can we learn to predict the prices of
other houses in Riyadh, as a function of the size of their living
Credit: areas?
CS229, Lecture Notes, Stanford University
• x(i) to denote the “input” variables (living area in this example),
also called input features, and
• y(i) to denote the “output” or target variable that we are trying
Some Notations to predict (price).
• A pair (x(i), y(i)) is called a training example, and the dataset
that we’ll be using to learn—a list of m training examples {(x (i),
y(i)); i = 1, . . . , m } —is called a training set.
• We will also use X denote the space of input values, and Y the
space of output values. In this example, X = Y = ℝ.
Credit:
CS229, Lecture Notes, Stanford University
• To describe the supervised learning problem slightly more
formally, our goal is, given a training set, to learn a function
h:X↦Y
Some Notations
• So that h(x) is a “good” predictor for the corresponding value
of y.
• For historical reasons, this function h is called a hypothesis.
• The process is therefore like this:
Credit:
CS229, Lecture Notes, Stanford University
• When the target variable that we’re trying to predict is
continuous, such as in our housing example, we call the
learning problem a regression problem.
Some Notations • When y can take on only a small number of discrete values
(such as if, given the living area, we wanted to predict if a
dwelling is a house or an apartment, say), we call it a
classification problem.
Credit:
CS229, Lecture Notes, Stanford University
• Linear regression is one of the most well known and well
understood algorithms in statistics and machine learning.
Does it belong to statistics or machine learning?
Linear Regression
• Machine learning, more specifically the field of predictive
modeling, is primarily concerned with minimizing the error of a
model or making the most accurate predictions possible.
• As such, linear regression was developed in the field of
statistics and is studied as a model for understanding the
relationship between input and output numerical variables but
has been borrowed by machine learning.
• It is both a statistical algorithm and a machine learning
• Linear regression is a linear model, e.g. a model that assumes a
linear relationship between the input variables (x) and the
single output variable (y).
• More specifically, that y can be calculated from a linear
Credit:
combination of the input variables (x).
CS229, Lecture Notes, Stanford University
• To make our housing example more interesting, lets consider a
slightly richer dataset in which we also know the number of
bedrooms in each house:
Linear Regression
Credit:
CS229, Lecture Notes, Stanford University
• To make our housing example more interesting, lets consider a
slightly richer dataset in which we also know the number of
bedrooms in each house:
Linear Regression • Here, the x’s are two-dimensional vectors in ℝ2.
• For instance, x1(i) is the living area of the i-th house in the
training set, and x2(i) is its number of bedrooms.
• To perform supervised learning, we must decide how we’re
going to represent functions/hypotheses h in a computer.
• As an initial choice, let's say we decide to approximate y as a
linear function of x:
ℎ𝜃 𝑥 = 𝜃0 + 𝜃1 𝑥1 + 𝜃2 𝑥2
• Here, the θi’s are the parameters (also called weight(
parameterizing the space of linear functions mapping from X to Y
Credit:
CS229, Lecture Notes, Stanford University
Least Mean Square (LMS) algorithm
• We want to choose θ so as to minimize J(θ).
Linear Regression • To do so, lets use a search algorithm that starts with some “initial
guess” for θ, and that repeatedly changes θ to make J(θ) smaller,
until hopefully we converge to a value of θ that minimizes J(θ).
• Specifically, lets consider the gradient descent algorithm, which
starts with some initial θ, and repeatedly performs the update:
Credit:
CS229, Lecture Notes, Stanford University
Least Mean Square (LMS) algorithm
• In order to implement this algorithm, we have to work out what
is the partial derivative term on the right-hand side.
Linear Regression
• Let's first work it out for the case of if we have only one training
example (x, y), so that we can neglect the sum in the definition
of J. We have:
Credit: • For a single training example, this gives the update rule:
CS229, Lecture Notes, Stanford University
Linear Regression
Linear Regression