0% found this document useful (0 votes)
20 views11 pages

Gauss Markov Theorem (Full Derivation)

The Gauss-Markov theorem states that under specific assumptions, the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE) of the population parameters in a linear regression model. These assumptions include linearity in parameters, random sampling, sample variation in the explanatory variable, and zero conditional mean of the error term. The theorem guarantees that OLS estimators have the least variance among all linear unbiased estimators, making them efficient for estimating parameters in regression analysis.

Uploaded by

a.jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views11 pages

Gauss Markov Theorem (Full Derivation)

The Gauss-Markov theorem states that under specific assumptions, the Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (BLUE) of the population parameters in a linear regression model. These assumptions include linearity in parameters, random sampling, sample variation in the explanatory variable, and zero conditional mean of the error term. The theorem guarantees that OLS estimators have the least variance among all linear unbiased estimators, making them efficient for estimating parameters in regression analysis.

Uploaded by

a.jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Gauss-Markov theorem

Model: Assume the true PRF is given as:

Y = β0 + β1 X + u

Model Assumptions

SLR.1: Linear in Parameters


In the population, Y is related to X and the error u by

Y = β0 + β1 X + u,

where β0 , β1 are the population intercept and slope parameters.


u is the additive error term.

SLR.2: Random Sampling

We observe a random sample of size N , {(Xi , Yi ) : i = 1, 2, . . . , N }, that follows


the population model above.

SLR.3: Sample Variation in the Explanatory Variable

The Sample outcomes on X, namely

{Xi : i = 1, 2, . . . , N } are not all equal.

=⇒ Var(X) > 0

SLR.4: Zero Conditional Mean

The error has zero mean given X:

E(u | X) = 0 (and for a random sample: E(ui | Xi ) = 0 for all i).

These, together with random sampling, allow derivation of OLS properties


conditional on the observed Xi ’s.

1
Gauss–Markov Theorem (Statement)
• Under these assumptions, OLS estimator β̂j is the Best Linear Unbiased
Estimator (BLUE) of population parameter βj in a linear regression model
• The term BLUE has 3 components:
1. Best: Least variance amongst all linear unbiased estimators
2. Linear: β’s can be expressed as linear function of Yi (population
observations)
3. Unbiased: The mean value of β̂j obtained through repeated sampling
should equal the true βj

• An unbiased estimator with the least variance is known as an efficient


estimator
• OLS estimators have the least variance amongst the set of all linear and
unbiased estimators.

Given model assumptions hold, we can derive the OLS estimators of β0 and
β1 as:

Cov(X,Y)
βˆ1 = (1)
Var(X)

βˆ0 = Ȳ − βˆ1 X̄ (2)

Linearity of βˆ1

P
Cov(X,Y) (Xi − X̄)(Yi − Ȳ )
β̂1 = = P
Var(X) (Xi − X̄)2

P P
xi yi xi Yi
=⇒ β̂1 = = P 2
x2i
P
xi

X xi
=⇒ β̂1 = ki Yi , where ki = P 2 (3)
xi

Thus βˆ1 is a linear function of the observed population Yi

2
Linearity of βˆ0
We know that:
β̂0 = Ȳ − β̂1 X̄
Expanding further:
PN N
i=1 Yi X
β̂0 = − X̄ ki Yi
N i=1

N N
1 X X
=⇒ β̂0 = Yi − X̄ ki Yi
N i=1 i=1

Thus,
β̂0 is also a linear function of the observed Yi .

Notes:
Note the following
P
X X xi xi
i) ki = ( P 2 ) = ( Pi 2 ) = 0
i i i xi i xi

(Xi − X̄)2
P P
X X xi Xi
i (Xi − X̄)Xi
ii) ki Xi = P 2 = P 2 = Pi 2
=1
i i i xi i xi i (Xi − X̄)

[
X X X X X
(Xi − X̄)2 = (Xi − X̄) (Xi − X̄) = (Xi − X̄)Xi − X̄ (Xi − X̄)
i i i i i
X
= (Xi − X̄)Xi
i

X X
=⇒ (Xi − X̄)2 = (Xi − X̄)Xi
i i

Unbiasedness of βˆ1

To prove unbiasedness of βˆ1 , let us use equation (3)

3
X X X X
β̂1 = ki (β0 + β1 Xi + ui ) = β0 k i + β1 ki Xi + ki ui

X
=⇒ = β1 + ki ui (4)

P P
Given ki = 0 and ki Xi = 1,

For unbiasedness of β̂1 , take expectations both sides:


X  X 
E(β̂1 ) = E β1 + ki ui = β1 + E ki ui (5)

Note that we have assumed that Xi (regressor) is fixed in repeated sampling


(Repeated sampling is needed becasue unbiasedness is a repeated sample prop-
erty). Therefore, Xi is non-random (non-stochastic). Since Xi is non-stochastic,
therefore, ki is non-stochastic.
P  Hence ki can be treated as constant. The only
random term in ki ui is ui , thus the expectation will only be over the ui
term.

Also, everything that we are doing proceeds under the assumption that Xi
is given and fixed. Hence, all the OLS properties in the Gauss-Markov theorem
are derived under the assumption of fixed Xi in repeated sampling. Also, when
we say E(u), it is actually E(u|Xi ) as everything is conditional on the fixed
values of Xi

Now, (5) becomes:

X
E(β̂1 ) = β1 + ki E(ui ) = β1
Therefore,
E(β̂1 ) = β1
Hence βˆ1 is an unbiased estimator of the true population parameter β1 .

Unbiasedness of βˆ0

We start with:
β̂0 = Ȳ − β̂1 X̄

4
Since
Y = β0 + β1 X + u,
we have
Ȳ = β0 + β1 X̄ + ū.
Therefore:
β̂0 = β0 + β1 X̄ + ū − β̂1 X̄

=⇒ β̂0 = β0 − X̄(β̂1 − β1 ) + ū
Taking expectations:

E(β̂0 ) = β0 − X̄ E(β̂1 − β1 ) + E(ū)

Since E(u) = 0 and β̂1 is unbiased (E(β̂1 ) = β1 ), it follows that:

E(β̂0 ) = β0

Thus, β̂0 is an unbiased estimator of β0 .

Variance and Standard Error of β̂1

The true PRF is given as:

Y = β0 + β1 X + u

Since the OLS estimators βˆ1 and βˆ0 change with sample, thus βˆ1 and βˆ0
have a sampling distribution. The mean of the distribution of βˆ1 and βˆ0 are
β1 and β0 as E(βˆ1 ) = β1 and E(βˆ0 ) = β0 . Now, what is the variance of the
sampling distribution of βˆ1 and βˆ0 .

The variance of the OLS estimator βˆ1 is given by:

σ2
Var(β̂1 ) = P 2
xi

5
Assumption SLR.5: Homoscedasticity
The error term u has the same variance given any value of the explanatory
variable:
Var(u|X) = σ 2

• “Homo” means equal; “scedasticity” means spread.

• Homoscedasticity means equal variance.


• Homoscedasticity implies that conditional on Xi , the variance of u is con-
stant.
• The variance of u does not change with Xi .

Proof:

From equation (4)


N
X
β̂1 = β1 + ki ui
i=1

N
! N
X X X
Var(β̂1 ) = Var ki ui = ki2 Var(ui ) + 2 ki kj Cov(ui , uj )
i=1 i=1 i<j

[Since ki is non-stochastic, it is only ui that is random. Also, β1 is constant


population parameter, thus Var(β1 ) = 0]

From the assumption of homoscedasticity, Var(u|X) = σ 2

N
X
=⇒ Var(β̂1 ) = σ 2 ki2
 
as ui and uj are random/stochastic draws, Cov(ui , uj ) = 0 for i ̸= j
i=1

N  2
x2i ( x2i ) σ2
P
X
2 xi 2
X
2
=σ P 2 =σ P 2 2 =σ P 2 2 = P 2
i=1
xi ( xi ) ( xi ) xi

Therefore:

σ2
Var(β̂1 ) = P 2
xi

6
Standard Deviation:
s
σ2 σ
s.d.(β̂1 ) = P 2 = pP
xi x2i

Since σ 2 is not known, we use an estimator:


P 2
ûi
σ̂ 2 =
n−2

Therefore, we call it the standard error (instead of standard devi-


ation) of β̂1 . The standard error (instead of standard deviation) of β̂1
is given as:

σ̂
se(β̂1 ) = pP
x2i

Estimator of σ 2 :

û2i
P
σ̂ 2 =
n−2

where (n − 2) is the degrees of freedom.

Sum of squares of deviations


[Intutively, estimator of variance = Number of observations/degree of freedom
P ¯i )2
(uˆi −ub
P
uˆi 2
= N −2 = N −2 ]
Note that we could have alternatively estimated σ̂ 2 using the following:
P 2
ˆ2 = ûi
σ̂
n

ˆ 2 . Hence,
However, it is σ̂ 2 which is an unbiased estimator of σ 2 and not σ̂
we will use σ̂ 2 as the estimator of σ 2 .

Minimum Variance of OLS Estimator


Introduction
We consider the linear regression model:

7
Y = β0 + β1 X + u
The OLS estimator β̂1 is the least square estimator of β1 . It is also the
estimator with minimum variance in the class of all linear unbiased
estimators. Hence, it is also called an efficient estimator. (Efficiency re-
flects precision/standard errors of the estimates. Efficiency implies least stanard
errors/highest precision)

Proof Setup

The OLS estimator is given by:


P P
Cov(X, Y ) (Xi − X̄)Yi xY
β̂1 = = P = P i2i
Var(X) (Xi − X̄)2 xi

Let
xi
ki = P 2 .
xi

Then, X
β̂1 = ki Yi

Thus,

β̂1 is a weighted average of Yi ⇒ β̂1 is linear and unbiased (already proved).

Alternative Linear Estimator

Let us assume there exists another linear and unbiased estimator β1∗ defined as:
X
β1∗ = wi Yi

where wi is not necessarily equal to ki .

Note: Recall that in the simple linear regression (SLR) model

Yi = β0 + β1 Xi + ui ,

8
with the assumption E(ui |Xi ) = 0, we have

E[Yi |Xi ] = β0 + β1 Xi .

Thus, the conditional expectation of Yi is the population regression line,


which is non-random given Xi . Consequently,

Var(E[Yi |Xi ]) = 0.

The variance of Yi arises only from the error term ui :

Var(Yi |Xi ) = Var(ui |Xi ) = σ 2 .

Expectation of β1∗

X  X X
E(β1∗ ) = E wi Yi = wi E(Yi ) = wi (β0 + β1 Xi )

X X
E(β1∗ ) = β0 wi + β1 wi Xi
For β1∗ to be unbiased:
X X
E(β1∗ ) = β1 ⇒ wi = 0, wi Xi = 1

Variance of β1∗

X  X X
Var(β1∗ ) = Var wi Yi = wi2 Var(Yi ) + 2 wi Wj Cov(Yi , Yj )
i<j
X
= wi2 σ 2 (since Var(Yi ) = σ 2 , Cov(Yi , Yj ) = 0 for i ̸= j)

Cov(Yi , Yj ) = Cov(β0 + β1 Xi + ui , β0 + β1 Xj + uj )
Since β0 , β1 , Xi , and Xj are non-random constants,

Cov(Yi , Yj ) = Cov(ui , uj ).

By the classical assumption of no serial correlation (autocorrelation) of er-


rors:
Cov(ui , uj ) = 0 for i ̸= j.
Therefore,
Cov(Yi , Yj ) = 0 for i ̸= j.

9
X
=⇒ Var(β1∗ ) = σ 2 wi2

Adding and substracting Pxi 2 and expanding, we get:


xi

X 2
2 xi xi
=σ wi − P 2 + P 2
xi xi

" 2 2 #
X  xi

x

xi

xi
=σ 2
wi − P 2 + Pi 2 + 2 wi − P 2 P 2
xi xi xi xi

2
X σ2 X 
xi xi x
=σ 2
wi − P 2 + + 2σ 2
wi − Pi 2
x2i x2i
P P
xi xi

We know that:
1
P 2 is constant (non-random),
xi

therefore:
X xi

xi 1 X

xi

wi − P 2 P 2 = P 2 w i − P 2 xi .
xi xi xi xi

So, it suffices to show:


X xi

wi − xi = 0.
x2i
P

X xi
 X 1 X 2
wi − P 2 xi = wi xi − P 2 xi .
xi xi
Simplify the second term
1 X 2
P 2 xi = 1.
xi

Thus, the expression becomes:


X
wi xi − 1.

For X
β1∗ = wi Yi

10
to be unbiased, we require: X
wi Xi = 1.
Since
xi = (Xi − X̄),
this implies: X
wi xi = 1.

X
wi xi − 1 = 1 − 1 = 0.

X xi

x
∴ wi − P 2 P i 2 = 0.
xi xi

Thus: 2
X xi σ2
Var(β1∗ ) =σ 2
wi − P 2 +P 2
xi xi

Minimum Variance Condition

The variance is minimized if:


xi
wi = P 2
xi

Therefore:

σ2
Var(β1∗ ) = P 2 = Var(β̂1 )
xi

xi
Hence, variance of any linear unbiased estimator is minimized if wi = P 2 .
xi

which is exactly the weight used in the OLS estimator, i.e., ki .

11

You might also like