0% found this document useful (0 votes)
11 views25 pages

Class 6a EEFE 510 - Fall 2024

The document discusses the properties of the least squares (LS) estimator, including its unbiasedness, variance-covariance, and the implications of the Gauss-Markov theorem. It outlines the assumptions of the classical linear model and provides methods for estimating the variance of the LS estimator. Additionally, it covers hypothesis testing using t-tests and examples from a lab context to illustrate these concepts.

Uploaded by

mwpsygvm2n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views25 pages

Class 6a EEFE 510 - Fall 2024

The document discusses the properties of the least squares (LS) estimator, including its unbiasedness, variance-covariance, and the implications of the Gauss-Markov theorem. It outlines the assumptions of the classical linear model and provides methods for estimating the variance of the LS estimator. Additionally, it covers hypothesis testing using t-tests and examples from a lab context to illustrate these concepts.

Uploaded by

mwpsygvm2n
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Announcements:

 Exam 1 has been returned to you. Mean: 41 Median: 40

Today’s Objectives
Part 2: b is random!
 So what are the statistical properties of the LS estimator?
 Properties of b rely on our “starting point” assumptions about the
randomness of 
 MVLUE (Gauss-Markov Theorem)
 Estimating the variance of b, the LS estimator

1
Assumptions of the Classical Linear Model

A.1. Linearity
yi = xi11 + xi22 + …+ xikk + I
A.2. Full Rank
No exact linear relationship among any of the explanatory variables.
A.3. Exogeneity of the independent variables.
E [i | xj1, xj2, … xjk] = 0 for all j. Don’t focus on the word “exogeneity”;
instead know that the conditional mean is zero.
A.4. Homoscedasticity & Nonautocorrelation.
The variance of each i is 2; i is not correlated with j.
A.5. Exogenously generated data
X could include random elements, but the uncorrelatedness of X and  is
crucial.
A.6. Normal distribution
Each disturbance term is normally distributed: i,~ N(0, 2).
NOTE: Highlighted assumptions deal with the error term,  2
Properties of b, the LS estimator: Unbiasedness

Let’s re-write b to put it in terms of the disturbance, .


Why? Because we want to know about b’s randomness, and the only information
we have is on .

Taking expectations, conditional on X,


Assuming X is nonrandom:
Conditional on (random) X:
By Assumption A.3, the second term is 0.
So Good news, b is unbiased!
The interpretation of this result is that for any particular set of observations, X,
the least squares estimator b has expectation .
(If X were random:) If we were to average this result over other possible values
of X, i.e., iterating over X, then the unconditional mean is also .
3
Properties of b, the LS estimator: linearity

Note:

can be rewritten as:

Therefore, b is a linear function of the disturbance term.


It is therefore a Linear estimator and an Unbiased estimator.

4
Properties of b, the LS estimator: Variance-Covariance of b

Remember from Part 0 the definition of a random variable : Var[x] = E[(x – )2]
For a vector x, Var[x] = E[(x - x) (x - x)′]

So the variance of b is similar,

Remember, ,
and E[

5
Properties of b, the LS estimator: Variance-Covariance

So the variance of b is similar,

With E[

Remarks:
1. is read as the “variance-covariance” matrix for b
2. In general, the greater the variation in X, the smaller becomes, and b will has
less variance (i.e., more precise).

6
Other Linear Unbiased Estimators?

To include the LS estimator and all other linear estimators…


Let
Let C = A + D be a k x n matrix that does not depend on y.
Let b0 = Cy be our new linear estimator
Note: We trivially recover the LS estimator for the special case D=0

Taking expectations (for now, let’s ignore conditional expectations)…

 b0 will be biased except in cases when is a null vector.


 For b0 to be unbiased, for all . 
 Implication: Another linear and unbiased estimator is possible, but important
restrictions hold.
With restrictions: 7
Variance of other Linear Unbiased Estimators?

Compare the Cov matrix of b0 with the class of linear, unbiased estimators (A):
i) Requires X = 0 (for unbiasedness), so
ii) b0 –  =

But with our unbiasedness restriction, X = 0 (and = 0),

8
Variance of other Linear Unbiased Estimators?

Compare the Var-Cov matrix of b0 a linear, unbiased estimators to the Var-Cov


matrix of b, our unbiased, least squares estimator:

Or,

9
Variance of other Linear Unbiased Estimators?

What do we know about ?


Remarks:
1. The diagonal elements are sums of squares (i.e.,  0)

2. The variance of each element of b0 is greater than or equal to the variance of


each element of b.

3. If DD is positive semidefinite (or nonnegative semidefinite),


then Var(b0) exceeds Var(b).
(For some q  0, then
Or for if Z = D q, then Z Z  0)
4. If diagonal elements of Var(b0) are equal to diagonal elements of Var(b), then
must be zero.
=> D is a null matrix
=> b0 = b 10
Gauss-Markov Theorem

For a regressor matrix X,


The LS estimator b is the minimum variance linear unbiased estimator of .

So,
When dealing with class of Linear Unbiased Estimators
the LS estimator is best (best = minimum variance).

MVLUE: Minimum Variance Linear Unbiased Estimator


BLUE: Best Linear Unbiased Estimator (Old terminology)

11
Estimating the Variance of the LS Estimator

Recall,
where = unobserved population parameter
=
We also know that ei is an estimate of i.
So by analogy,
…could be a natural choice! (Let’s investigate.)
In other words, can we find that = ?

LS residuals are…
(since MX = 0) Recall

An estimator of will be based on the sum of squared residuals (which is a scalar:

12
Estimating the Variance of the LS Estimator

Expectations:

Remarks:
1. are scalars.
2. The trace of a matrix is equal to the sum of its diagonal elements
3. The expected value of a scalar matrix is equal to its trace! (This means the
expected value of the trace is the trace of the expected value.)
4. Short cut. These expectations should actually be conditional on X, but it
doesn’t affect the outcome.
5. tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC)
6. We’ll use these trace results to find
7. We are hoping to find = or =
13
Estimating the Variance of the LS Estimator

!!

14
Estimating the Variance of the LS Estimator

So, the natural estimator of , is biased.


Note that the bias becomes smaller as n  
Bias =

Consequently,

So, is an unbiased estimator of

With this estimator in hand, we can compute the estimated variance of the LS
estimator:

15
Summary: Estimating the Variance of the LS Estimator

Recall the following logic “chain”,

 is an unobserved population parameter, =


 However, our LS residual, ei, is an estimate of i.
 So by analogy, could be a natural choice (i.e.,)
 But is biased: , not .
 We can show that is an unbiased estimator of

16
Statistical properties (distribution) of

 If were known, then statistical inference on could be based on zk.


 But it’s generally not known. So we use s2 instead of .

 While we know that s2 is an unbiased estimator for , we don’t know its


statistical properties (i.e., we don’t know its distribution).
 That means we have another big step to do. If we can rewrite s2 in terms
of , then we might be able to find its distribution.

 This is a “quadratic form” of a standard normal vector ().


Therefore, it is Chi-Square with rank(M), where rank(M) = trace(M) = n –
K.
So,

17
Independence of b and e (and s2)

Independence of b and e:
Based on our original assumptions of the Classical Linear Regression
Model…
 The distribution of  conditional on X is N(0, I ):
n

 Thus, the distribution of  conditional on X does not depend on X.


 Both b and e are linear functions of : They are jointly normal
conditional on X, but also they are uncorrelated, conditional on X.
 In other words b and e are independently distributed, conditional on X.
 Therefore all functions of e (including s2) are also independent of b.
 We need b to be independent from s2.

18
Recap: Distributions

We showed that
Therefore:
And
We also noted that
From Part 0 – If we have ratio of a Standard Normal variable to a square root of
a Chi-squared variable divided by its d.o.f., what is the distribution of the ratio?
A Student’s t:

tk follows a t distribution with n – K degrees of freedom.

19
Hypothesis testing: a t-Test

Step 1: Construct a null hypothesis: , and then use the t-ratio

Note 1: A “large” deviation of tk from 0 implies is a sign of a failure of the null


hypothesis. The next step specifies how large is too large.
Note 2: If 0, then the test is a probability statement on the estimate being statistically
different from zero.
Step 2: Go to a t-table (or a computer program) and look up the entry for n – K degrees
of freedom. Find the critical value, t(1-)/2(n – K), such that the area in the
t-distribution to the right of t(1-)/2 is (1- )/2.
Prob{ – t(1-)/2(n – K) < tk < t(1-)/2(n – K)} = 1 –  (Often,  is picked to be 0.95)

(1- )/ 2

20
- t(1-)/2(n – K) 0 tk t(1-)/2(n – K)
Hypothesis testing: a t-Test

Step 3: Fail to reject if – t(1-)/2(n – K)< tk < t(1-)/2(n – K). Reject otherwise.
A convenient feature of the t-test is that the critical value does not depend on X.
Therefore, there is no need to calculate critical values for each sample.

A decision rule based on p-Value:


Step 1: Same as above.
Step 2: Calculate p = Prob(t > |tk|)
Step 3: Accept if p > 1- . Reject otherwise (if p < 1- ).
I.e., reject if p < 0.05
Finally, note that a very common test is whether is significantly different from
zero.
Thus,
And . This is standard output from computer programs.

21
An Example from our Lab

gpercap i   1constant i   2 pg i   3 y i   4 pnci   5 puci


  6 ppt i   7 pd i   8 pni   9 psi   10 yeari   i

=?
= – 4.61 Does it also mean ?
YES

22
An Example from our Lab

= – 4.61
What’s the associated with this?
❑0
: = 0. (𝑏k − 𝛽𝑘 ) −0.1206771 − 0
𝑡 𝑝𝑔= = =− 4.61
What’s the conclusion? 𝑆𝐸( 𝑏 k ) 0.0261592
|tpg| = 4.61 > t(1-)/2(n – K) => Reject => 0 with 95% confidence.
Or, “t-stat” “Critical value”
p = Prob(t(n-K) > |tpg|) = 0.000 < (1- )= (1-.95) = 0.05 => Reject
23
An Example from our Lab

What’s the meaning of “F( 9, 26)” and “Prob > F” ?


Another hypothesis test that we’ll explore:
: = … = = 0.
How many “joint” hypotheses?
Nine. (That is, nine “equal signs”.)
We’ll use our Restricted Least Squares estimator with nine restrictions.

24
An Example from our Lab

Can you use the output to find s2?


Two ways:
1. s2 = = 0.000451
2. (Root MSE)2 s2 (0.02133)2 = 0.000455

25

You might also like