Gauss-Markov theorem
Model: Assume the true PRF is given as:
Y = β0 + β1 X + u
Model Assumptions
SLR.1: Linear in Parameters
In the population, Y is related to X and the error u by
Y = β0 + β1 X + u,
where β0 , β1 are the population intercept and slope parameters.
u is the additive error term.
SLR.2: Random Sampling
We observe a random sample of size N , {(Xi , Yi ) : i = 1, 2, . . . , N }, that follows
the population model above.
SLR.3: Sample Variation in the Explanatory Variable
The Sample outcomes on X, namely
{Xi : i = 1, 2, . . . , N } are not all equal.
=⇒ Var(X) > 0
SLR.4: Zero Conditional Mean
The error has zero mean given X:
E(u | X) = 0 (and for a random sample: E(ui | Xi ) = 0 for all i).
These, together with random sampling, allow derivation of OLS properties
conditional on the observed Xi ’s.
1
Gauss–Markov Theorem (Statement)
• Under these assumptions, OLS estimator β̂j is the Best Linear Unbiased
Estimator (BLUE) of population parameter βj in a linear regression model
• The term BLUE has 3 components:
1. Best: Least variance amongst all linear unbiased estimators
2. Linear: β’s can be expressed as linear function of Yi (population
observations)
3. Unbiased: The mean value of β̂j obtained through repeated sampling
should equal the true βj
• An unbiased estimator with the least variance is known as an efficient
estimator
• OLS estimators have the least variance amongst the set of all linear and
unbiased estimators.
Given model assumptions hold, we can derive the OLS estimators of β0 and
β1 as:
Cov(X,Y)
βˆ1 = (1)
Var(X)
βˆ0 = Ȳ − βˆ1 X̄ (2)
Linearity of βˆ1
P
Cov(X,Y) (Xi − X̄)(Yi − Ȳ )
β̂1 = = P
Var(X) (Xi − X̄)2
P P
xi yi xi Yi
=⇒ β̂1 = = P 2
x2i
P
xi
X xi
=⇒ β̂1 = ki Yi , where ki = P 2 (3)
xi
Thus βˆ1 is a linear function of the observed population Yi
2
Linearity of βˆ0
We know that:
β̂0 = Ȳ − β̂1 X̄
Expanding further:
PN N
i=1 Yi X
β̂0 = − X̄ ki Yi
N i=1
N N
1 X X
=⇒ β̂0 = Yi − X̄ ki Yi
N i=1 i=1
Thus,
β̂0 is also a linear function of the observed Yi .
Notes:
Note the following
P
X X xi xi
i) ki = ( P 2 ) = ( Pi 2 ) = 0
i i i xi i xi
(Xi − X̄)2
P P
X X xi Xi
i (Xi − X̄)Xi
ii) ki Xi = P 2 = P 2 = Pi 2
=1
i i i xi i xi i (Xi − X̄)
[
X X X X X
(Xi − X̄)2 = (Xi − X̄) (Xi − X̄) = (Xi − X̄)Xi − X̄ (Xi − X̄)
i i i i i
X
= (Xi − X̄)Xi
i
X X
=⇒ (Xi − X̄)2 = (Xi − X̄)Xi
i i
Unbiasedness of βˆ1
To prove unbiasedness of βˆ1 , let us use equation (3)
3
X X X X
β̂1 = ki (β0 + β1 Xi + ui ) = β0 k i + β1 ki Xi + ki ui
X
=⇒ = β1 + ki ui (4)
P P
Given ki = 0 and ki Xi = 1,
For unbiasedness of β̂1 , take expectations both sides:
X X
E(β̂1 ) = E β1 + ki ui = β1 + E ki ui (5)
Note that we have assumed that Xi (regressor) is fixed in repeated sampling
(Repeated sampling is needed becasue unbiasedness is a repeated sample prop-
erty). Therefore, Xi is non-random (non-stochastic). Since Xi is non-stochastic,
therefore, ki is non-stochastic.
P Hence ki can be treated as constant. The only
random term in ki ui is ui , thus the expectation will only be over the ui
term.
Also, everything that we are doing proceeds under the assumption that Xi
is given and fixed. Hence, all the OLS properties in the Gauss-Markov theorem
are derived under the assumption of fixed Xi in repeated sampling. Also, when
we say E(u), it is actually E(u|Xi ) as everything is conditional on the fixed
values of Xi
Now, (5) becomes:
X
E(β̂1 ) = β1 + ki E(ui ) = β1
Therefore,
E(β̂1 ) = β1
Hence βˆ1 is an unbiased estimator of the true population parameter β1 .
Unbiasedness of βˆ0
We start with:
β̂0 = Ȳ − β̂1 X̄
4
Since
Y = β0 + β1 X + u,
we have
Ȳ = β0 + β1 X̄ + ū.
Therefore:
β̂0 = β0 + β1 X̄ + ū − β̂1 X̄
=⇒ β̂0 = β0 − X̄(β̂1 − β1 ) + ū
Taking expectations:
E(β̂0 ) = β0 − X̄ E(β̂1 − β1 ) + E(ū)
Since E(u) = 0 and β̂1 is unbiased (E(β̂1 ) = β1 ), it follows that:
E(β̂0 ) = β0
Thus, β̂0 is an unbiased estimator of β0 .
Variance and Standard Error of β̂1
The true PRF is given as:
Y = β0 + β1 X + u
Since the OLS estimators βˆ1 and βˆ0 change with sample, thus βˆ1 and βˆ0
have a sampling distribution. The mean of the distribution of βˆ1 and βˆ0 are
β1 and β0 as E(βˆ1 ) = β1 and E(βˆ0 ) = β0 . Now, what is the variance of the
sampling distribution of βˆ1 and βˆ0 .
The variance of the OLS estimator βˆ1 is given by:
σ2
Var(β̂1 ) = P 2
xi
5
Assumption SLR.5: Homoscedasticity
The error term u has the same variance given any value of the explanatory
variable:
Var(u|X) = σ 2
• “Homo” means equal; “scedasticity” means spread.
• Homoscedasticity means equal variance.
• Homoscedasticity implies that conditional on Xi , the variance of u is con-
stant.
• The variance of u does not change with Xi .
Proof:
From equation (4)
N
X
β̂1 = β1 + ki ui
i=1
N
! N
X X X
Var(β̂1 ) = Var ki ui = ki2 Var(ui ) + 2 ki kj Cov(ui , uj )
i=1 i=1 i<j
[Since ki is non-stochastic, it is only ui that is random. Also, β1 is constant
population parameter, thus Var(β1 ) = 0]
From the assumption of homoscedasticity, Var(u|X) = σ 2
N
X
=⇒ Var(β̂1 ) = σ 2 ki2
as ui and uj are random/stochastic draws, Cov(ui , uj ) = 0 for i ̸= j
i=1
N 2
x2i ( x2i ) σ2
P
X
2 xi 2
X
2
=σ P 2 =σ P 2 2 =σ P 2 2 = P 2
i=1
xi ( xi ) ( xi ) xi
Therefore:
σ2
Var(β̂1 ) = P 2
xi
6
Standard Deviation:
s
σ2 σ
s.d.(β̂1 ) = P 2 = pP
xi x2i
Since σ 2 is not known, we use an estimator:
P 2
ûi
σ̂ 2 =
n−2
Therefore, we call it the standard error (instead of standard devi-
ation) of β̂1 . The standard error (instead of standard deviation) of β̂1
is given as:
σ̂
se(β̂1 ) = pP
x2i
Estimator of σ 2 :
û2i
P
σ̂ 2 =
n−2
where (n − 2) is the degrees of freedom.
Sum of squares of deviations
[Intutively, estimator of variance = Number of observations/degree of freedom
P ¯i )2
(uˆi −ub
P
uˆi 2
= N −2 = N −2 ]
Note that we could have alternatively estimated σ̂ 2 using the following:
P 2
ˆ2 = ûi
σ̂
n
ˆ 2 . Hence,
However, it is σ̂ 2 which is an unbiased estimator of σ 2 and not σ̂
we will use σ̂ 2 as the estimator of σ 2 .
Minimum Variance of OLS Estimator
Introduction
We consider the linear regression model:
7
Y = β0 + β1 X + u
The OLS estimator β̂1 is the least square estimator of β1 . It is also the
estimator with minimum variance in the class of all linear unbiased
estimators. Hence, it is also called an efficient estimator. (Efficiency re-
flects precision/standard errors of the estimates. Efficiency implies least stanard
errors/highest precision)
Proof Setup
The OLS estimator is given by:
P P
Cov(X, Y ) (Xi − X̄)Yi xY
β̂1 = = P = P i2i
Var(X) (Xi − X̄)2 xi
Let
xi
ki = P 2 .
xi
Then, X
β̂1 = ki Yi
Thus,
β̂1 is a weighted average of Yi ⇒ β̂1 is linear and unbiased (already proved).
Alternative Linear Estimator
Let us assume there exists another linear and unbiased estimator β1∗ defined as:
X
β1∗ = wi Yi
where wi is not necessarily equal to ki .
Note: Recall that in the simple linear regression (SLR) model
Yi = β0 + β1 Xi + ui ,
8
with the assumption E(ui |Xi ) = 0, we have
E[Yi |Xi ] = β0 + β1 Xi .
Thus, the conditional expectation of Yi is the population regression line,
which is non-random given Xi . Consequently,
Var(E[Yi |Xi ]) = 0.
The variance of Yi arises only from the error term ui :
Var(Yi |Xi ) = Var(ui |Xi ) = σ 2 .
Expectation of β1∗
X X X
E(β1∗ ) = E wi Yi = wi E(Yi ) = wi (β0 + β1 Xi )
X X
E(β1∗ ) = β0 wi + β1 wi Xi
For β1∗ to be unbiased:
X X
E(β1∗ ) = β1 ⇒ wi = 0, wi Xi = 1
Variance of β1∗
X X X
Var(β1∗ ) = Var wi Yi = wi2 Var(Yi ) + 2 wi Wj Cov(Yi , Yj )
i<j
X
= wi2 σ 2 (since Var(Yi ) = σ 2 , Cov(Yi , Yj ) = 0 for i ̸= j)
Cov(Yi , Yj ) = Cov(β0 + β1 Xi + ui , β0 + β1 Xj + uj )
Since β0 , β1 , Xi , and Xj are non-random constants,
Cov(Yi , Yj ) = Cov(ui , uj ).
By the classical assumption of no serial correlation (autocorrelation) of er-
rors:
Cov(ui , uj ) = 0 for i ̸= j.
Therefore,
Cov(Yi , Yj ) = 0 for i ̸= j.
9
X
=⇒ Var(β1∗ ) = σ 2 wi2
Adding and substracting Pxi 2 and expanding, we get:
xi
X 2
2 xi xi
=σ wi − P 2 + P 2
xi xi
" 2 2 #
X xi
x
xi
xi
=σ 2
wi − P 2 + Pi 2 + 2 wi − P 2 P 2
xi xi xi xi
2
X σ2 X
xi xi x
=σ 2
wi − P 2 + + 2σ 2
wi − Pi 2
x2i x2i
P P
xi xi
We know that:
1
P 2 is constant (non-random),
xi
therefore:
X xi
xi 1 X
xi
wi − P 2 P 2 = P 2 w i − P 2 xi .
xi xi xi xi
So, it suffices to show:
X xi
wi − xi = 0.
x2i
P
X xi
X 1 X 2
wi − P 2 xi = wi xi − P 2 xi .
xi xi
Simplify the second term
1 X 2
P 2 xi = 1.
xi
Thus, the expression becomes:
X
wi xi − 1.
For X
β1∗ = wi Yi
10
to be unbiased, we require: X
wi Xi = 1.
Since
xi = (Xi − X̄),
this implies: X
wi xi = 1.
X
wi xi − 1 = 1 − 1 = 0.
X xi
x
∴ wi − P 2 P i 2 = 0.
xi xi
Thus: 2
X xi σ2
Var(β1∗ ) =σ 2
wi − P 2 +P 2
xi xi
Minimum Variance Condition
The variance is minimized if:
xi
wi = P 2
xi
Therefore:
σ2
Var(β1∗ ) = P 2 = Var(β̂1 )
xi
xi
Hence, variance of any linear unbiased estimator is minimized if wi = P 2 .
xi
which is exactly the weight used in the OLS estimator, i.e., ki .
11