CS229 Lecture 2 PDF
CS229 Lecture 2 PDF
Chris Ré
April 2, 2023
Disclaimers
I Definitions
I Linear Regression
I Batch and Stochastic Gradient
I Normal Equations
Supervised Learning
size Price
x (1) 2104 y (1) 400
x (2) 2500 y (2) 900
How do we represent h? (One popular choice)
size Price
x (1) 2104 y (1) 400
x (2) 2500 y (2) 900
An example prediction?
How do we represent h? (One popular choice)
size Price
x (1) 2104 y (1) 400
x (2) 2500 y (2) 900
An example prediction?
h(x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 .
Slightly More Interesting Data
We add features (bedrooms and lot size) to incorporate more
information about houses.
size bedrooms lot size Price
x (1) 2104 4 45k y (1) 400
x (2) 2500 3 30k y (2) 900
What’s a prediction here?
h(x) = θ0 + θ1 x1 + θ2 x2 + θ3 x3 .
Pd
Let hθ (x) = j=0 θj xj want to choose θ so that hθ (x) ≈ y .
Visual version of linear regression
Choose
θ = argmin J(θ).
θ
Linear Regression Summary
θ(0) =0
(t+1) (t) ∂
θj =θj −α J(θ(t) ) for j = 0, . . . , d.
∂θj
Gradient Descent Computation
(t+1) (t) ∂
θj = θj −αJ(θ(t) ) for j = 0, . . . , d.
∂θj
Note that α is called the learning rate or step size.
Here A ∈ Rn×d .
Vector Derivatives
Here A ∈ Rn×d .
I With this notation, to find the minimum of J(θ) we compute
∇θ J(θ) = 0.
I Note that ∇θ J(θ) ∈ Rd+1 since θ ∈ Rd+1 .
The normal equation
∇θ J(θ) = X T X θ − X T y .
−1
θ = XTX XT y.