0% found this document useful (0 votes)
45 views

1 Omitted Variable Bias: Part I: The Baseline: SLR.1-4 Hold, and Our Estimates Are Unbiased

- The document discusses omitted variable bias in regression analysis. It notes that omitting a key variable that affects the dependent variable can result in biased coefficient estimates. - It provides an example using wage and demographic data to demonstrate how omitting the variable "tenure" biases the estimated coefficient on the variable "female" when estimating determinants of wages. Running a regression with and without tenure included confirms that tenure is an omitted variable that biases the female coefficient. - The document reviews the formula for omitted variable bias and shows that including the previously omitted variable eliminates the bias through an example empirical application.

Uploaded by

Mianda Institute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

1 Omitted Variable Bias: Part I: The Baseline: SLR.1-4 Hold, and Our Estimates Are Unbiased

- The document discusses omitted variable bias in regression analysis. It notes that omitting a key variable that affects the dependent variable can result in biased coefficient estimates. - It provides an example using wage and demographic data to demonstrate how omitting the variable "tenure" biases the estimated coefficient on the variable "female" when estimating determinants of wages. Running a regression with and without tenure included confirms that tenure is an omitted variable that biases the female coefficient. - The document reviews the formula for omitted variable bias and shows that including the previously omitted variable eliminates the bias through an example empirical application.

Uploaded by

Mianda Institute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Introductory Applied Econometrics Andrew Crane-Droesch

EEP/IAS 118 Section #5


Spring 2014 Feb 26 2014

1 Omitted Variable Bias: Part I


Remember that a key assumption needed to get an unbiased estimate of β1 in the simple linear regression
is that E[u|x] = 0. If this assumption does not hold then we can’t expect our estimate β̂1 to be close to the
true value β1 . We call this problem omitted variable bias. That is, due to us not including a key variable
in the model, we have that E[β̂1 ] 6= β1 . The motivation of multiple regression is therefore to take this key
variable out of the error term by including it in our estimation.

2 Omitted Variable Bias: Part II


The formula for omitted variable bias can be a little confusing, so to start we’ll go through a few things
much more slowly. Remember those SLR1-5 assumptions we talked about last time? Prof. Buck stated in
lecture that if SLR1-4 hold for a given model, then our estimates of the β̂ will be unbiased. First we’re going
to take a closer look at what’s going wrong once we start thinking about omitted variables.

SLR4 fails because of an omitted variable: E[u|X] 6= 0

The Baseline: SLR.1-4 hold, and our estimates are unbiased

Population Model:
y = β0 + β1 x + u
Sample Regression:
ŷi = β̂0 + β̂1 xi

What’s the OLS formula for β̂1 ?


Cov(xi , yi )
β̂1 =
V ar(xi )
P
i (x − x̄)(yi − ȳ)
= Pi 2
(x i − x̄)
P i
(x − x̄)yi
= Pi i (See Appendix to these notes)
i i − x̄)xi
(x

We can use what we know about the population model, plug y = β0 + β1 x + u into our formula for β̂1 and
simplify:
P
i (xi − x̄)(β0 + β1 xi + ui )
β̂1 = P
i (xi − x̄)xi
P P P
β0 i (xi − x̄) + β1 i (xi − x̄)xi + i (xi − x̄)ui
= P
i (xi − x̄)xi
P
(x i − x̄)ui
= β1 + Pi
i (xi − x̄)xi
Now, remember that β̂1 is a random variable, so that it has an expected value:

 P  P 
h i
i (xi − x̄)ui i (xi − x̄)ui
E β̂1 = E β1 + P = β1 + E P = β1
i (xi − x̄)xi i (xi − x̄)xi

Aha! So under assumptions SLR.1-4, on average our estimates of β̂1 will be equal to the true population
parameter β1 that we were after the whole time.

2
Reality Check: SLR.4 fails, E [u|X] 6= 0, and our estimates are biased

Population Model:
Sample Regression:
What’s the OLS formula for α̂1 ?

P
Cov(xi , yi ) − x̄)(yi − ȳ)
i (x
α̂1 = = Pi 2
V ar(xi ) (x i − x̄)
P i
(x − x̄)yi
= Pi i
i i − x̄)xi
(x

We can use what we know about the population model, plug y into our formula for α̂1 and simplify:
P
i (xi − x̄)(β
P0
+ β1 xi + β2 zi + ui )
α̂1 =
i (xi − x̄)xi
P P P P
β0 i (xi − x̄) + β1 i (xi − x̄)xi + β2 i (xi − x̄)zi + i (xi − x̄)ui
= P
(xi − x̄)xi
P P i
(x i − x̄)zi (x i − x̄)ui
= β1 + β2 P i + Pi
i (xi − x̄)xi i (xi − x̄)xi

P
(x −x̄)z
There’s an extra term! The second term β2 P i(xii−x̄)xii is a result of our omission of the variable z that affects
i
y. When SLR.1-4 hold, on average our regression estimates will be close to the true parameters. But here,
SLR.1-4 do not hold! If we take the expectation of α̂1 :

 P P 
i (xi − x̄)zi (xi − x̄)ui
E [α̂1 ] = E β1 + β2 P + Pi
(xi − x̄)xi i (xi − x̄)xi
 Pi  P 
(x i − x̄)zi (xi − x̄)ui
= β1 + β2 E P i + E Pi
i (xi − x̄)xi i (xi − x̄)xi
= β1 + β2 ρ 1

If E [α̂1 ] 6= β1 then we say α̂1 is biased. What this means is that on average, our regression estimate is going
to miss the true population parameter by .

3 Example: OVB in Action


In this section, I use the wage data (WAGE1.dta) from your textbook to demonstrate the evils of omitted
variable bias and show you that the OVB formula works. Let’s pretend (!) that this sample of 500

3
people is our whole population of interest, so that when we run our regressions, we are actually revealing the
true parameters instead of just estimates. We’re interested in the relationship between wages and gender,
and our “omitted” variable will be tenure (how long the person has been at his/her job). Suppose our
population model is:
log(wage)i = β0 + β1 f emalei + β2 tenurei + ui (1)
First let’s look at the correlations between our variables and see if we can’t predict how omitting tenure will
bias β̂1 :
. corr lwage female tenure
| lwage female tenure
-------------+---------------------------
lwage | 1.0000
female | -0.3737 1.0000
tenure | 0.3255 -0.1979 1.0000
If we ran the regression:
log(wage)i = α0 + α1 f emalei + ei (2)

...then the information above tells us that α1 β1 . Let’s see if we were right. Below is the Stata output
from running regressions (1) and (2):
. reg lwage female tenure
Source | SS df MS Number of obs = 526
-------------+------------------------------ F( 2, 523) = 67.64
Model | 30.4831298 2 15.2415649 Prob > F = 0.0000
Residual | 117.846622 523 .225328148 R-squared = 0.2055
-------------+------------------------------ Adj R-squared = 0.2025
Total | 148.329751 525 .28253286 Root MSE = .47469
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -.3421323 .042267 -8.09 0.000 -.4251663 -.2590984
tenure | .0192648 .0029255 6.59 0.000 .0135176 .0250119
_cons | 1.688842 .0343675 49.14 0.000 1.621326 1.756357
------------------------------------------------------------------------------
. reg lwage female
Source | SS df MS Number of obs = 526
-------------+------------------------------ F( 1, 524) = 85.04
Model | 20.7120004 1 20.7120004 Prob > F = 0.0000
Residual | 127.617751 524 .243545326 R-squared = 0.1396
-------------+------------------------------ Adj R-squared = 0.1380
Total | 148.329751 525 .28253286 Root MSE = .4935
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -.3972175 .0430732 -9.22 0.000 -.4818349 -.3126001
_cons | 1.81357 .0298136 60.83 0.000 1.755001 1.872139
------------------------------------------------------------------------------
Just to clarify, we “know” that β1 = and α1 = .
This means that our BIAS is equal to:

4
There’s one more parameter missing from our OVB formula. What regression do we have to run to find its
value?
tenure = ρ0 + ρ1 f emale + v (3)
The Stata output from this regression is below:
. reg tenure female
Source | SS df MS Number of obs = 526
-------------+------------------------------ F( 1, 524) = 21.36
Model | 1073.26518 1 1073.26518 Prob > F = 0.0000
Residual | 26327.9839 524 50.244244 R-squared = 0.0392
-------------+------------------------------ Adj R-squared = 0.0373
Total | 27401.249 525 52.1928553 Root MSE = 7.0883
------------------------------------------------------------------------------
tenure | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -2.859373 .618672 -4.62 0.000 -4.074755 -1.643991
_cons | 6.474453 .4282209 15.12 0.000 5.633212 7.315693
------------------------------------------------------------------------------
Just to clarify, our ρ =
Now we can plug all of our parameters into the bias formula to check that it in fact gives us the bias from
leaving out tenure from our wage regression:
α1 = E[α̂1 ] =
β1 + β2 δ 1
=
−.3421323 + (.0192648)(−2.859373)
= −0.397217549

4 OVB Intuition
For further intuition on omitted variable bias, I like to think of an archer. When our MLR1-4 hold, the archer
is aiming the arrow directly at the center of the target—if he/she misses, it’s due to random fluctuations
in the air that push the arrow around, or maybe imperfections in the arrow that send it a little off course.
When MLR1-4 do not all hold, like when we have an omitted variable, the archer is no longer aiming at the
center of the target. There are still puffs of air and feather imperfections that send the arrow off course, but
the course wasn’t even the right one to begin with! The arrow (which you should think of as our β̂) misses
the center of the target (which you should think of as our true β) systematically.
To demonstrate this, I did the following:

ˆ Take a random sample of 150 people out of the 500 that are in WAGE1.dta
ˆ Estimate β̂1 using OLS, controlling for tenure with these 150 people.
ˆ Estimate α̂1 using OLS (NOT controlling for tenure) with these 150 people.
ˆ Repeat 6000 times.

At the end of all of the above, I end up with 6000 biased and 6000 unbiased estimates of β̂1 . I plotted the
kernel density of the biased estimates alongside that of the unbiased estimates. You can see how the biased
distribution is shifted to the left indicating a downward bias!

5
Figure 1. Kernel densities for biased and unbiased estimates.

6
4
Density
2
0

-.6 -.5 -.4 -.3 -.2 -.1


effect of female on ln(wage)

alphahat_1 betahat_1

Take home practice problem: How to sign the bias

Traffic fatalities and primary seatbelt laws. Using data from Anderson (2008) for 49 US states, we can
examine how primary seatbelt laws (an officer can pull you over just for not wearing your seatbelt) impact
annual traffic fatalities. From the paper, I have data on the number of traffic fatalities in 2000, whether or
not the state had a pimary seatbelt law in place, and the total population of the state. In 2000, just 35% of
the 49 states had primary seatbelt laws (the rest had what’s called a secondary seatbelt law). Suppose we
run the following regression:
\ = β̂0 + β̂1 pop + β̂2 primary
f atalities

1. Think of another variable or factor that you think affects traffic fatalities:

2. Is this factor positively or negatively correlated with f atalities? + or −


3. Is this factor positively or negatively correlated with primary? + or −
4. Omitting this factor from our regression will bias β̂1 : UPWARD or DOWNWARD

Here are my results:


\ = 156.002 + 0.1232pop + 17.258primary
f atalities
Whoa! According to our estimates, predicted fatalities increase with the implementation of a primary
seatbelt law. Behavioral explanations aside1 , omitted variables are the likely culprits here. What are some
variables that would induce an upward bias in β̂2 ?
I thought that weather might play a role in this puzzle. States with more “dangerous” weather will have
more traffic fatalities and are also more likely to have a primary seatbelt law:
\ = 219.16 + 0.1204pop − 78.092primary + 68.963precip − 579.91snow
f atalities
1
By this I mean arguments like, “adding safety requirements results in people behaving more recklessly.” While
often valid—even in this particular case—we’re going to keep it simple in this discussion.

6
(Clearly, even this specification with controls for weather has some issues: an additional inch of snow per
year decreases predicted fatalities by 579.91 lives?)

5 Confidence Intervals
The simulation that was shown in section demonstrates something pretty profound: even after designing a
random sample, collecting the data, figuring out the population model, and running regressions, there’s
still a chance your estimates are very far from those of the population. Each random sample
yields a different estimate; if you have 100 random samples, you have 100 different values of β̂1 . What can
you do with them? Confidence intervals use the randomness of our sample estimates to say something useful
about where the true population parameter actually is.
You can think of confidence intervals in two different ways:

1. We can think of a confidence interval as a bound for how wrong our sample estimate is. For example,
if a political poll finds that a proposition will receive 53.2% of the vote, we come to very different
conclusions if the “margin of error” is .5% or 5%.
2. Alternatively, we can think of a confidence interval as a measure of where the true, population value
is likely to be. (The wording here is a little misleading, as you’ll see in a bit.) For example, if the
true average wage for US laborers is $7, then it’s unlikely that we’d find a confidence interval from
our sample like [10,14].

The basics

We can think of a sample mean, x̄ the same way we think about our β̂s: these are both .
We know even more about x̄ from the Central Limit Theorem: For a random sample of a variable {x1 , ..., xN },

the Central Limit Theorem tells us that for very large samples (large N ), the sample average x̄ ∼ N (µ, σx̄2 ).
What this means: if I took 10,000 different random samples of laborers in the US and recorded their wage, I
would end up with 10,000 different sample means {x̄1 , x̄2 , ..., x̄10,000 }. If I plotted a histogram of all of these
sample means, it would look very much like a normal distribution and the center of the distribution would
be very close to the true average wage, µ, in the US. Because it’s easy to get confused when we’re talking
about a random variable X and another random variable x̄, which is the sample mean of X, here’s a table
to keep things straight:
Population Sample
Sample Mean: x̄ = n1 P
P
Mean of X: µX i xi , and E[x̄] = µX
1
Variance of x: V ar(x) or σx2 Sample Variance of x: s2 = n−1 2 2 2
i (xi − x̄) , and E[s ] = σx
2
Sample Variance of x̄: s2x̄ = sn , and E s2x̄ = σx̄2
 
Variance of x̄: V ar(x̄) or σx̄2
Normal distributions are tricky to work with, and it’s easier to standardize normally distributed variables so
that they have a mean of 0 and a variance of 1. Remember our formula to find the expected value and variance
of a transformed variable... If v is normally distributed with expected value E[v] and V ar(v) = σv2 :
 
v − E[v]
E =
σv

7
 
v − E[v]
V ar =
σv
Since we’re interested in the distribution of x̄ (which is normal), we can standardize it just like above so
that: x̄−µ
σx̄ ∼ N (0, 1)

Now we can use what we know about the distribution of standard normal variables to help us say something
meaningful about what the true population mean, µX might be:

ˆ We know that for any standard normal variable v, P r(−1.96 < v < 1.96) = 95%
ˆ We know that x̄−µX
σx̄ is standard normal

But we’re not really interested in the variable x̄−µ


σx̄ . The whole point of this is to learn more about µX ! So
X

we need to do some manipulation of this to isolate µX :

A standard normal distribution looks like this:

If we draw a number z from a standard normal distribution, then we know P r(1.96 < z < 1.96) =
x̄−µ x̄−µ
Above we argued that σx̄ ∼ N (0, 1) which means that the P r(−1.96 < σx̄ < 1.96) =
Just like in lecture, we can rearrange terms to see that P r(x̄ − 1.96σx̄ < µ < x̄ + 1.96σx̄ ) =

How to interpret a confidence interval

The most important thing to remember about a confidence interval is that the is
what’s random, not the .
To make another metaphor out of an archaic sport, I like to think of confidence intervals in the context of a
game of horseshoes:

When to use the Student t distribution:

As you could guess from the table, in practice we do not know what σx̄ is, and we have to estimate it
using sample data with: s2x̄ , which we call the standard error. Using the standard error (which is itself a
random variable) changes the distribution of the sample mean a little bit, and we have to use a Student’s t
distribution instead of a Normal distribution:
x̄ − µ
s2
∼ tn−1

n

The formula for a W% confidence interval is:


    
s s
CIW = x̄ − cW √ , x̄ + cW √
n n
Where cW is found by looking at the t-table for n − 1 degrees of freedom.

8
Step 1. Determine the confidence level.
If we want to be 95% confident that our interval covers the true population parameter, then our confidence
level is 0.95. Pretty straight forward.
Step 2. Compute your estimates of x̄ and s.
Step 3. Find c from the t-table.
The value of c will depend on both the sample size (n) and the confidence level (always use 2-Tailed for
confidence intervals):

ˆ If our confidence level is 80% with a sample size of 10: c80 =


ˆ If our confidence level is 95%, with a sample size of 1000: c95 =

Step 4. Plug everything into the formula and interpret.


Plug our values of x̄, s and c into the given formula for a confidence interval. The trickiest step is the
interpretation:

ˆ Where does the randomness in a confidence interval come from?


ˆ So when we construct a confidence interval, we interpret it by saying:

Example I took a random sample of 121 UCB students’ heights in inches, and found that x̄ = 65 and
s2 = 4. Following the 4 steps above, I can find the 95% confidence interval for x̄:

1. Confidence level was given: 95%


2. x̄ = 65 and s2 = 4 were given.
3. From the t-table, c = 1.987
√ √
4 4
4. The 95% confidence interval is [65 − 1.987 ∗ √121 , 65 + 1.987 ∗ √121 ] = [64.64, 65.36]. This interval has
a 95% chance of covering the true average height of the population.

Practice

You have a random sample of housing prices in the Bay Area. After loading the data into Stata, you look
at summary statistics for the prices you observed:
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 88 293.546 102.7134 111 725

Find a 99% confidence interval for the true average housing price:

9
6 Appendix
Two facts used in the discussion of omitted variable bias:
X X X
(xi − x̄)(yi − ȳ) = (xi − x̄)yi − (xi − x̄)ȳ
i i i
¯
X X
= (xi − x̄)yi − ȳ (xi − x)
i i
X X
= (xi − x̄)yi − ȳ(0) = (xi − x̄)yi
i i

Now
P replace every
P yi in what’s above to an xi , and every ȳ to an x̄, and you can see that the same steps show
2
i (xi − x̄) = i (xi − x̄)xi .

10

You might also like