0% found this document useful (0 votes)
496 views3 pages

Jeffrey M. Wooldridge-Introductory Econometrics - A Modern Approach-South-Western College Pub (2016) - 113-115

Ekonometrika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
496 views3 pages

Jeffrey M. Wooldridge-Introductory Econometrics - A Modern Approach-South-Western College Pub (2016) - 113-115

Ekonometrika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CHAPTER 3  Multiple Regression Analysis: Estimation 89

Just as the OLS estimates can be obtained for any given sample, so can the standard errors. Since
se 1 b^ j 2 depends on s^ , the standard error has a sampling distribution, which will play a role in Chapter 4.
We should emphasize one thing about standard errors. Because (3.58) is obtained directly from
the variance formula in (3.51), and because (3.51) relies on the homoskedasticity Assumption MLR.5,
it follows that the standard error formula in (3.58) is not a valid estimator of sd 1 b^ j 2 if the errors
exhibit heteroskedasticity. Thus, while the presence of heteroskedasticity does not cause bias in the b^ j,
it does lead to bias in the usual formula for Var 1 b^ j 2 , which then invalidates the standard errors. This is
important because any regression package computes (3.58) as the default standard error for each coef-
ficient (with a somewhat different representation for the intercept). If we suspect heteroskedasticity,
then the “usual” OLS standard errors are invalid, and some corrective action should be taken. We will
see in Chapter 8 what methods are available for dealing with heteroskedasticity.
For some purposes it is helpful to write

s^
se 1 b^ j 2 5 , [3.59]
!nsd 1 xj 2 "1 2 R2j

in which we take sd 1 xj 2 5 "n 21 g i51 1 xij 2 xj 2 2 to be the sample standard deviation where the total
n

sum of squares is divided by n rather than n − 1. The importance of equation (3.59) is that it shows
how the sample size, n, directly affects the standard errors. The other three terms in the formula—s^ ,
sd 1 xj 2 , and R2j —will change with different samples, but as n gets large they settle down to constants.
Therefore, we can see from equation (3.59) that the standard errors shrink to zero at the rate 1/ !n.
This formula demonstrates the value of getting more data: the precision of the b^ j increases as n
increases. (By contrast, recall that unbiasedness holds for any sample size subject to being able to
compute the estimators.) We will talk more about large sample properties of OLS in Chapter 5.

3-5  Efficiency of OLS: The Gauss-Markov Theorem


In this section, we state and discuss the important Gauss-Markov Theorem, which justifies the use
of the OLS method rather than using a variety of competing estimators. We know one justification for
OLS already: under Assumptions MLR.1 through MLR.4, OLS is unbiased. However, there are many
unbiased estimators of the bj under these assumptions (for example, see Problem 13). Might there be
other unbiased estimators with variances smaller than the OLS estimators?
If we limit the class of competing estimators appropriately, then we can show that OLS is best
within this class. Specifically, we will argue that, under Assumptions MLR.1 through MLR.5, the
OLS estimator b^ j for bj is the best linear unbiased estimator (BLUE). To state the theorem, we
need to understand each component of the acronym “BLUE.” First, we know what an estimator is:
it is a rule that can be applied to any sample of data to produce an estimate. We also know what an
unbiased estimator is: in the current context, an estimator, say, b| , of b is an unbiased estimator of b
1 j j
|
if E 1 bj 2 5 bj for any b0, b1, p , bk.
What about the meaning of the term “linear”? In the current context, an estimator b | of b is lin-
j j
ear if, and only if, it can be expressed as a linear function of the data on the dependent variable:

a wijyi, [3.60]
n
| 5
b j
i51

where each wij can be a function of the sample values of all the independent variables. The OLS esti-
mators are linear, as can be seen from equation (3.22).
Finally, how do we define “best”? For the current theorem, best is defined as having the smallest
variance. Given two unbiased estimators, it is logical to prefer the one with the smallest variance (see
Appendix C).

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
90 PART 1  Regression Analysis with Cross-Sectional Data

Now, let b^ 0, b^ 1, c, b ^ k, denote the OLS estimators in model (3.31) under Assumptions MLR.1
through MLR.5. The Gauss-Markov Theorem says that, for any estimator b | that is linear and unbi‑
j
ased, Var 1 b^ j 2 # Var 1 b^ j 2 , and the inequality is usually strict. In other words, in the class of linear
unbiased estimators, OLS has the smallest variance (under the five Gauss-Markov assumptions).
Actually, the theorem says more than this. If we want to estimate any linear function of the bj, then
the corresponding linear combination of the OLS estimators achieves the smallest variance among all
linear unbiased estimators. We conclude with a theorem, which is proven in Appendix 3A.

Theorem Gauss-Markov Theorem


3.4 Under Assumptions MLR.1 through MLR.5, b^ 0, b^ 1, c, b^ k are the best linear unbiased estimators
(BLUEs) of b0, b1, p , bk, respectively.

It is because of this theorem that Assumptions MLR.1 through MLR.5 are known as the
Gauss-Markov assumptions (for cross-sectional analysis).
The importance of the Gauss-Markov Theorem is that, when the standard set of assumptions
holds, we need not look for alternative unbiased estimators of the form in (3.60): none will be better
than OLS. Equivalently, if we are presented with an estimator that is both linear and unbiased, then
we know that the variance of this estimator is at least as large as the OLS variance; no additional cal-
culation is needed to show this.
For our purposes, Theorem 3.4 justifies the use of OLS to estimate multiple regression models. If
any of the Gauss-Markov assumptions fail, then this theorem no longer holds. We already know that
failure of the zero conditional mean assumption (Assumption MLR.4) causes OLS to be biased, so
Theorem 3.4 also fails. We also know that heteroskedasticity (failure of Assumption MLR.5) does not
cause OLS to be biased. However, OLS no longer has the smallest variance among linear unbiased
estimators in the presence of heteroskedasticity. In Chapter 8, we analyze an estimator that improves
upon OLS when we know the brand of heteroskedasticity.

3-6  Some Comments on the Language of Multiple


Regression Analysis
It is common for beginners, and not unheard of for experienced empirical researchers, to report that
they “estimated an OLS model.” While we can usually figure out what someone means by this state-
ment, it is important to understand that it is wrong—on more than just an aesthetic level—and reflects
a misunderstanding about the components of a multiple regression analysis.
The first thing to remember is that ordinary least squares (OLS) is an estimation method, not a
model. A model describes an underlying population and depends on unknown parameters. The linear
model that we have been studying in this chapter can be written—in the population—as
y 5 b0 1 b1x1 1 p 1 bkxk 1 u, [3.61]
where the parameters are the bj. Importantly, we can talk about the meaning of the bj without ever
looking at data. It is true we cannot hope to learn much about the bj without data, but the interpreta-
tion of the bj is obtained from the linear model in equation (3.61).
Once we have a sample of data we can estimate the parameters. While it is true that we have so
far only discussed OLS as a possibility, there are actually many more ways to use the data than we can
even list. We have focused on OLS due to its widespread use, which is justified by using the statisti-
cal considerations we covered previously in this chapter. But the various justifications for OLS rely
on the assumptions we have made (MLR.1 through MLR.5). As we will see in later chapters, under

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
CHAPTER 3  Multiple Regression Analysis: Estimation 91

different assumptions different estimation methods are preferred—even though our model can still be
represented by equation (3.61). Just a few examples include weighted least squares in Chapter 8, least
absolute deviations in Chapter 9, and instrumental variables in Chapter 15.
One might argue that the discussion here is overlay pedantic, and that the phrase “estimating an
OLS model” should be taken as a useful shorthand for “I estimated a linear model by OLS.” This
stance has some merit, but we must remember that we have studied the properties of the OLS estima-
tors under different assumptions. For example, we know OLS is unbiased under the first four Gauss-
Markov assumptions, but it has no special efficiency properties without Assumption MLR.5. We have
also seen, through the study of the omitted variables problem, that OLS is biased if we do not have
Assumption MLR.4. The problem with using imprecise language is that it leads to vagueness on the
most important considerations: what assumptions are being made on the underlying linear model?
The issue of the assumptions we are using is conceptually different from the estimator we wind up
applying.
Ideally, one writes down an equation like (3.61), with variable names that are easy to decipher,
such as

math4 5 b0 1 b1classize4 1 b2math3 1 b3log 1 income 2 [3.62]


1 b4motheduc 1 b5 fatheduc 1 u

if we are trying to explain outcomes on a fourth-grade math test. Then, in the context of equation
(3.62), one includes a discussion of whether it is reasonable to maintain Assumption MLR.4, focus-
ing on the factors that might still be in u and whether more complicated functional relationships are
needed (a topic we study in detail in Chapter 6). Next, one describes the data source (which ideally
is obtained via random sampling) as well as the OLS estimates obtained from the sample. A proper
way to introduce a discussion of the estimates is to say “I estimated equation (3.62) by ordinary least
squares. Under the assumption that no important variables have been omitted from the equation, and
assuming random sampling, the OLS estimator of the class size effect, b1, is unbiased. If the error
term u has constant variance, the OLS estimator is actually best linear unbiased.” As we will see in
Chapters 4 and 5, we can often say even more about OLS. Of course, one might want to admit that
while controlling for third-grade math score, family income, and parents’ education might account for
important differences across students, it might not be enough—for example, u can include motivation
of the student or parents—in which case OLS might be biased.
A more subtle reason for being careful in distinguishing between an underlying population model
and an estimation method used to estimate a model is that estimation methods such as OLS can be
used essentially as an exercise in curve fitting or prediction, without explicitly worrying about an
underlying model and the usual statistical properties of unbiasedness and efficiency. For example, we
might just want to use OLS to estimate a line that allows us to predict future college GPA for a set of
high school students with given characteristics.

Summary
1. The multiple regression model allows us to effectively hold other factors fixed while examining the ef-
fects of a particular independent variable on the dependent variable. It explicitly allows the independ-
ent variables to be correlated.
2. Although the model is linear in its parameters, it can be used to model nonlinear relationships by ap-
propriately choosing the dependent and independent variables.
3. The method of ordinary least squares is easily applied to estimate the multiple regression model. Each
slope estimate measures the partial effect of the corresponding independent variable on the dependent
variable, holding all other independent variables fixed.

Copyright 2016 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

You might also like