A derivation of the OLS estimator (𝜷)̂
Definitions:
1 𝑥 𝑥 ⋯ 𝑥 𝑥
𝑦
⎡1 𝑥 𝑥 ⋯ 𝑥 ⎤ ⎡𝑥 ⎤
𝒚≡ ⋮ 𝑿≡⎢
⎢⋮ ⎥
⎥ = [𝟏 𝒙 𝒙 ⋯ 𝒙 ] where 𝒙 ≡ ⎢ ⋮ ⎥
𝑦 ⋮ ⋮ ⋱ ⋮
× ⎣1 𝑥 𝑥 ⋯ 𝑥 ⎦ ⎣𝑥 ⎦ ×
×( + )
𝛽 𝑢
⎡𝛽 ⎤ ⎡𝑢 ⎤
𝜷≡⎢
⎢⋮ ⎥
⎥ 𝒖≡⎢ ⋮ ⎥
⎣𝛽 ⎦( ⎣𝑢 ⎦ ×
+ )×
Let the population model be 𝒚 = 𝑿𝜷 + 𝒖. The estimated equation is 𝒚̂ = 𝑿𝜷.̂
We can thus write 𝒖 = 𝒚 − 𝒚̂ = 𝒚 − 𝑿𝜷.̂
The sum of squared residuals is
𝑆 𝜷̂ ≡ 𝑢̂ = 𝒖 𝒖 = 𝒚 − 𝑿𝜷 ̂ 𝒚 − 𝑿𝜷 ̂ = 𝒚 − 𝜷 ̂ 𝑿 𝒚 − 𝑿𝜷 ̂
=
= 𝒚 𝒚 − 𝒚 𝑿𝜷 ̂ − 𝜷 ̂ 𝑿 𝒚 + 𝜷 ̂ 𝑿 𝑿𝜷.̂
The expressions in red are equal scalars. We can thus write them together as −2𝒚 𝑿𝜷 ̂ and re-express 𝑆 𝜷 ̂ :
𝑆 𝜷 ̂ = 𝒚 𝒚 − 2𝒚 𝑿𝜷 ̂ + 𝜷 ̂ 𝑿 𝑿𝜷.̂
We want to choose the 𝜷 ̂ that minimizes 𝑆 𝜷 ̂ . The first order condition is
𝜕𝑆 𝜷 ̂
=𝟎 ×( + ) ⟹ −2𝒚 𝑿 + 2𝜷 ̂ 𝑿 𝑿 = 𝟎 ×( + ) .
𝜕𝜷 ̂
[Note: By rules of matrix differentiation, 𝜕 −2𝒚 𝑿𝜷 ̂ /𝜕𝜷 ̂ = −2𝒚 𝑿 and 𝜕(𝜷 ̂ 𝑿 𝑿𝜷)/𝜕𝜷
̂ ̂ = 2𝜷 ̂ 𝑿 𝑿.]
Manipulating the first order condition above, we get
𝜷 ̂ 𝑿 𝑿 = 𝒚 𝑿.
Getting the transpose of both sides of the equation above yields
𝑿 𝑿𝜷 ̂ = 𝑿 𝒚.
Pre-multiplying both sides by (𝑿 𝑿)− , we get
𝜷 ̂ = (𝑿 𝑿)− 𝑿 𝒚 ∎