0% found this document useful (0 votes)
74 views133 pages

Lec 6

This document discusses adding a second variable to a linear regression model. It begins with two examples - one comparing death rates between cigarette and pipe smokers, and another examining gender differences in graduate admission rates at Berkeley. The document then covers reasons for including additional predictors, such as better describing relationships, improving predictions, and controlling for confounding factors. It introduces the concept of controlling for a third variable to estimate the relationship between two variables.

Uploaded by

Carmen Orazzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views133 pages

Lec 6

This document discusses adding a second variable to a linear regression model. It begins with two examples - one comparing death rates between cigarette and pipe smokers, and another examining gender differences in graduate admission rates at Berkeley. The document then covers reasons for including additional predictors, such as better describing relationships, improving predictions, and controlling for confounding factors. It introduces the concept of controlling for a third variable to estimate the relationship between two variables.

Uploaded by

Carmen Orazzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 133

Week 6: Linear Regression with Two Regressors

Brandon Stewart1

Princeton

October 17, 19, 2016

1
These slides are heavily influenced by Matt Blackwell, Adam Glynn and Jens
Hainmueller.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 1 / 132
Where We’ve Been and Where We’re Going...
Last Week
I mechanics of OLS with one variable
I properties of OLS
This Week
I Monday:
F adding a second variable
F new mechanics
I Wednesday:
F omitted variable bias
F multicollinearity
F interactions
Next Week
I multiple regression
Long Run
I probability → inference → regression

Questions?

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 2 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 3 / 132
Why Do We Want More Than One Predictor?

Summarize more information for descriptive inference

Improve the fit and predictive power of our model

Control for confounding factors for causal inference

Model non-linearities (e.g. Y = β0 + β1 X + β2 X 2 )

Model interactive effects (e.g. Y = β0 + β1 X + β2 X2 + β3 X1 X2 )

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 4 / 132
Example 1: Cigarette Smokers and Pipe Smokers

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 5 / 132
Example 1: Cigarette Smokers and Pipe Smokers
Consider the following example from Cochran (1968). We have a random sample
of 20,000 smokers and run a regression using:
Y : Deaths per 1,000 Person-Years.
X1 : 0 if person is pipe smoker; 1 if person is cigarette smoker

We fit the regression and find:

Death
\ Rate = 17 − 4 Cigarette Smoker
What do we conclude?
The average death rate is 17 deaths per 1, 000 person-years for pipe smokers
and 13 (17 - 4) for cigarette smokers.
So cigarette smoking lowers the death rate by 4 deaths per 1,000 person
years.

When we “control” for age (in years) we find:

Death
\ Rate = 14 + 4 Cigarette Smoker + 10 Age
Why did the sign switch? Which estimate is more useful?
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 6 / 132
Example 2: Berkeley Graduate Admissions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 7 / 132
Berkeley gender bias?

Graduate admissions data from Berkeley, 1973


Acceptance rates:
I Men: 8442 applicants, 44% admission rate
I Women: 4321 applicants, 35% admission rate
Evidence of discrimination toward women in admissions?
This is a marginal relationship
What about the conditional relationship within departments?

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 8 / 132
Berkeley gender bias?

Within departments:
Men Women
Dept Applied Admitted Applied Admitted
A 825 62% 108 82%
B 560 63% 25 68%
C 325 37% 593 34%
D 417 33% 375 35%
E 191 28% 393 24%
F 373 6% 341 7%

Within departments, women do somewhat better than men!


How? Women apply to more challenging departments.
Marginal relationships (admissions and gender) 6= conditional
relationship given third variable (department)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 9 / 132
Simpson’s paradox
1
Z=1
0
Y

-1

Z=0
-2
-3

0 1 2 3 4

Overall a positive relationship between Yi and Xi here


But within strata defined by Zi , the opposite
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 10 / 132
Simpson’s paradox

Simpson’s paradox arises in many contexts- particularly where there is


selection on ability
It is a particular problem in medical or demographic contexts, e.g.
kidney stones, low-birth weight paradox.
Cochran’s 1968 study is also a case of Simpson’s paradox, he originally
sought to compare cigarette to cigar smoking, he found that cigar
smokers had higher mortality rates than cigarette smokers, but at any
age level, cigarette smokers had higher mortality than cigar smokers.

Instance of a more general problem called the ecological inference fallacy

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 11 / 132
Basic idea

Old goal: estimate the mean of Y as a function of some independent


variable, X :
E[Yi |Xi ]
For continuous X ’s, we modeled the CEF/regression function with a
line:
Yi = β0 + β1 Xi + ui
New goal: estimate the relationship of two variables, Yi and Xi ,
conditional on a third variable, Zi :

Yi = β0 + β1 Xi + β2 Zi + ui

β’s are the population parameters we want to estimate

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 12 / 132
Why control for another variable

Descriptive
I get a sense for the relationships in the data.
I describe more precisely our quantity of interest
Predictive
I We can usually make better predictions about the dependent variable
with more information on independent variables.
Causal
I Block potential confounding, which is when X doesn’t cause Y , but
only appears to because a third variable Z causally affects both of
them.
I Xi : ice cream sales on day i
I Yi : drowning deaths on day i
I Zi : ??

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 13 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 14 / 132
Regression with Two Explanatory Variables
Example: data from Fish (2002) “Islam and Authoritarianism.” World
Politics. 55: 4-37. Data from 157 countries.
Variables of interest:
I Y : Level of democracy, measured as the 10-year average of Freedom
House ratings
I X1 : Country income, measured as log(GDP per capita in $1000s)
I X2 : Ethnic heterogeneity (continuous) or British colonial heritage
(binary)
With one predictor we ask: Does income (X1 ) predict or explain the
level of democracy (Y )?

With two predictors we ask questions like: Does income (X1 ) predict
or explain the level of democracy (Y ), once we “control” for ethnic
heterogeneity or British colonial heritage (X2 )?

The rest of this lecture is designed to explain what is meant by


“controlling for another variable” with linear regression.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 15 / 132
Simple Regression of Democracy on Income

Let’s look at the bivariate regression


of Democracy on Income: ●● ●● ●● ●●

●●●● ●

7
We have looked at the

● ●
● ● ● ● ●● ●
●● ●


● ●

regression of Democracy on
● ●● ●

6
● ●

● ● ● ●● ● ●
● ●●

● ●
● ●
ybi = βb0 + βb1 x1 ●

Income several times in the



5
● ● ●
● ●
● ● ●●

Democracy
●● ●
● ●

course:
● ● ●
● ● ● ●●●

4
● ● ●
● ●
● ● ● ●

Demo = −1.26 + 1.6 Log (GDP)


\



●●



● ●

3

● ●
● ● ●
● ●● ● ● ● ●
●● ●

ŷi = β̂0 + xi β̂1


● ●
● ● ● ● ●
● ●

2
● ●● ●
● ●
● ●

●● ●
●●●


● ● ● ●● ●●

1
2.0 2.5 3.0 3.5 4.0 4.5

Income

Interpretation: A one percent increase in GDP is associated with a .016


point increase in democracy.
Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 16 / 132
Simple Regression of Democracy on Income
We may want to use more
information in our prediction
But we can use more information in ●
●● ●
●●
● ●
●●
● ●
●●

●●
●●●● ●

7
● ●
●● ● ●

equation.



● ●


● ●● ●
● ●
● ●
●●●



●●


our prediction equation. ●

●●





6
● ●● ●

For example, some countries



● ●




● ●
● ● ●
● ● ●
● ●


● ●
● ●



● ●




● ●
● ●

were originally
someBritish colonies

5


● ●




● ●


● ●
● ●
●●●

Democracy

For example, countries were ●





●●
●●●



and others were not.


● ●

● ●
● ●●
●●

● ●
● ●
●●

4


● ●
● ●

●● ●

originally British colonies and others ●





● ●




● ●





●● ●
● ●

wereFormer British colonies



● ● ●
● ●
● ●

● ●

3
not:
● ●



● ●


● ●
● ●

●●
● ●

● ●


● ●●
●● ●●
● ● ●





● ● ● ● ●

tend to be higher
● ● ● ●





2

● ●
●●




● ●

I Former British colonies tend to ●



●●


Other countries tend to be


●● ●
● ●
●●●



● ● ● ● ●
●● ●●
● ●

1
●● ● ● ● ●

have higher levels of democracy


2.0 2.5 3.0 3.5 4.0 4.5

lower
I Non-colony countries tend to Income

have lower levels of democracy

Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 17 / 132
Adding a Covariate

How do we do this? We can generalize the prediction equation:

ybi = βb0 + βb1 x1i + βb2 x2i

This implies that we want to predict y using the information we have about x1
and x2 , and we are assuming a linear functional form.

Notice that now we write Xji where:


j = 1, ..., k is the index for the explanatory variables
i = 1, ..., n is the index for the observation
we often omit i to avoid clutter

In words:
\ = βb0 + βb1 Log (GDP) + βb2 Colony
Democracy

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 18 / 132
Interpreting a Binary Covariate

Assume X2i indicates whether country i used to be a British colony.

When X2 = 0, the model becomes:

yb = βb0 + βb1 x1 + βb2 0


= βb0 + βb1 x1

When X2 = 1, the model becomes:

yb = βb0 + βb1 x1 + βb2 1


= (βb0 + βb2 ) + βb1 x1

What does this mean? We are fitting two lines with the same slope but
different intercepts.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 19 / 132
Linear Regression with Interaction terms
Inference for Slopes

Regression of Democracy
What does on Income
this mean?
From R, we obtain estimates
βb1 , βb2 :R, we obtain estimates
βb0 , Using
for β̂0 , β̂1 , and β̂2
Coefficients:
Estimate
lm(Democracy ~ Income + BritishColony)
(Intercept) -1.5060
Coefficients:
GDP90LGN
(Intercept)1.7059

7
●● ●● ●● ●●
●●●●● ●

Income BritishColony ● ●

BRITCOL -1.527 0.5881 1.711 0.592 ● ● ● ● ● ●


●● ●
● ●

● ●

6
● ●● ●
● ●

● ● ● ● ● ● ●

● ●

● ● ●
● ● ●

Non-British colonies:

5
● ● ●

● ●
● ●

Non-British colonies:
●●

Democracy

●● ●
● ●
● ● ●
● ● ●
● ●● ●

4
● ● ●
● ● ●
● ● ●
● ●
● ●

ŷi yb==−1.527
βb0 + βb1 x+ 1.711xi
● ●

1 ● ●●

● ● ● ●

3
● ●

● ●
● ● ●
● ●● ●

yb = −1.5 + 1.7 x1
● ● ●
●● ●
● ●
● ● ● ● ●
● ●

2
● ●● ●

British colonies:
● ●
● ●

●● ●
●● ●● ●
●●

1
● ● ● ●●

2.0 2.5 3.0 3.5 4.0 4.5


Former British colonies:
ŷi = −0.935 + 1.711xi Income

yb = (βb0 + βb2 ) + βb1 x1


yb = −.92 + 1.7 x1
Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 20 / 132
What does
Regression this mean?
of Democracy on Income
Our prediction equation is

Ourŷprediction
i = −1.527 + 1.711x
equation is:i + 0.592zi
yb = −1.5 + 1.7 x1 + .58 x2
Where do these quantities
appear
Where on the
do these graph? appear on
quantities

6
the graph?β̂0 = −1.527 is the

4
βb0 =intercept
−1.5 is thefor the prediction
intercept for the

Democracy
line for non-British colonies.
prediction line for non-British

2
colonies.

0
βb1 = 1.7 is the slope for both lines.
β̂1 = 1.711 is the slope for ^
β2

βb2 = .58 is the vertical distance


both lines.

−2
between the two lines for Ex-British 0 1 2 3 4 5

β̂2 =
colonies and0.592 is the respectively
non-colonies vertical Income

distance between the two


lines.
Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 21 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 22 / 132
Linear Regression with Interaction terms
Inference for Slopes

Fitting a regression
Fitting planeplane
a regression

We have considered an
example of multiple regression

7
●● ●● ●● ●●
●●●●● ●

We have considered an example of ●

with one continuous


● ●
● ● ● ● ● ●
●● ●
● ●

multiple regression with one ●

6
● ●● ●
● ●

explanatory variablevariable
and one

● ● ● ● ● ● ●

● ●

continuous explanatory and ●


● ●
● ●

5
● ● ●

binary explanatory
explanatory variable.

● ●
● ●

one binary variable.


●●

Democracy

●● ●
● ●
● ● ●
● ● ●
● ●● ●

4
● ● ●
● ● ●
● ● ●
● ●

This is easy to represent


● ● ●

● ●● ● ● ● ●

3
● ●

This is easy to represent graphically in



● ●

graphically in because
two dimensions
● ● ●
● ●● ● ● ● ●
●● ●
● ●
● ● ● ● ●

two dimensions we can use ●


● ●

2
● ●● ●

because we can use colors toin


● ●
● ●

colors to distinguish the two groups ●● ●● ● ●● ●


●●

1
● ● ● ●●

distinguish
the data. the two groups in 2.0 2.5 3.0 3.5 4.0 4.5

the data. Income

Gov2000: Quantitative Methodology for Political Science I

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 23 / 132
Regression of Democracy on Income

● ● ●● ●


These observations are actually ● ●

located in a three-dimensional ●

space. ● ●

● ● ●

●●
●●● ●

7



●●● ● ● ●●● ●
●● ●
We can try to represent this ● ● ● ●● ● ●

● ●

6
● ● ●
● ●●● ●
●● ●● ● ●
using a 3D scatterplot. ●

● ●● ●


●●

5

● ● ● ● ●●

In this view, we are looking at ●● ●

Democracy
● ●
● ● ● ●●
●●● ● ●

4
● ●
● ●●
the data from the Income side; ●

● ●



● 1.0

3

the two regression lines are ●● ● ●●


●● ● ●


● 0.8
● ● ● ● ●
drawn in the appropriate ●

2
● ●● ● 0.6
● ●
● ● ●
locations. ●● ●
● ●●
● ●

●● ●●
0.4
1

0.2

0.0
0

2.0 2.5 3.0 3.5 4.0 4.5 5.0

Income

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 24 / 132
Regression of Democracy on Income

We can also look at the 3D ●●



●●



● ●● ●●
scatterplot from the British ● ●


● ●

colony side. ●
● ●●





●●
●● ● ●

While the British colonial ●


●●
●●

7
●●
status variable is either 0 or 1, ●


● ●● ●● ●●
● ● ●


● ●

6
there is nothing in the ●
●● ●●
● ●

●●●

● ●
●●

● ● ●
prediction equation that ● ●●

5
● ●
●● ● ●● ●
● ● ●

Democracy
● ●
requires this to be the case. ●● ●

● ●
● ●

4
● ●●● ●
● ● ● ● 5.0
● ● ● ●
●● ●
● ●● ● ● 4.5
In fact, the prediction equation ●

3

● ● ●
● ● ● ●

● ● 4.0

defines a regression plane that ● ●

2
● ●● 3.5


●●
connects the lines when x2 = 0 ● ●
3.0
1

2.5
and x2 = 1. 2.0
0

0.0 0.2 0.4 0.6 0.8 1.0

Colony

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 25 / 132
Regression with two continuous variables

Since we fit a regression plane to the data whenever we have two


explanatory variables, it is easy to move to a case with two
continuous explanatory variables.

For example, we might want to use:


I X1 Income and X2 Ethnic Heterogeneity
I Y Democracy

Democracy
\ = β̂0 + β̂1 Income + β̂2 Ethnic Heterogeneity

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 26 / 132
tting a regression plane
Regression of Democracy on Income

We can plot the points in a 3D


catterplot.
We can plot the points in a 3D
scatterplot.
R returnsR the following
returns: ● ●●

● ● ●●

● ●
●●



●●

stimates: I βb0 = −.71


● ●

7
●● ● ● ●

● ● ● ●
● ● ● ● ●●
● ●
● ●
● ●
● ●
● ● ●

6
● ●● ●
● ● ●
● ●
b1 = 1.6 for Income
I β
β̂0 = −0.717 ●






●● ●







5

●●

Democracy
● ● ●

βb2 = −.6 for Ethnic ● ●


● ● ●
I ●●●● ● ● ●

β̂1 = 1.573
● ●
● ●● ●●
●●●

EthnicHeterogeneity
● ● ● ●

4
● ●● ●

Heterogeneity
● ●● ●


● ●
● ● ●

3

●● ●

−0.550
β̂2 =How
● ●
● ●
●● ● 1.0
does this look ●




0.6
0.8

2
● ●

graphically? ●

● 0.2
0.4

0.0

These estimates define aa


1
2.0 2.5 3.0 3.5 4.0 4.5 5.0
These estimates define
Income

egression plane through the


regression plane through the
data.
ata.

Gov2000: Quantitative Methodology for Political Science I


Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 27 / 132
Interpreting a Continuous Covariate

The coefficient estimates have a similar interpretation in this case as


they did in the Income-British Colony example.

For example, βb1 = 1.6 represents our prediction of the difference in


Democracy between two observations that differ by one unit of
Income but have the same value of Ethnic Heterogeneity.

The slope estimates have partial effect or ceteris paribus


interpretations:

∂(y = β0 + β1 X1 + β2 X2 )
= β1
∂X1

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 28 / 132
What does this
Interpreting mean? Covariate
a Continuous

Again, we can think of this as


definingAgain,
a regression
we can thinkline foras
of this
the relationship between for
defining a regression line

7
the relationship between
Democracy and and Income
Incomeat

6
Democracy at
every level
everyof Ethnic
level of Ethnic

5
Heterogeneity.

Democracy
Heterogeneity.

4
All of these lines are parallel
All of these
since lines aretheparallel

3
they have slope
since they
βb1 =have
1.6 the slope β̂ .
1
Ethnic Heterogeneity = 0

2
Ethnic Heterogeneity = 0.25
Ethnic Heterogeneity = 0.5
The lines shift up or down Ethnic Heterogeneity = 0.75

The lines shift


on up or down
1
based the value of Ethnic
2.0 2.5 3.0 3.5 4.0 4.5
Heterogeneity.
based on the value of Ethnic Income

Heterogeneity.

Stewart (Princeton)
Gov2000: Quantitative Methodology
Week 6: Two Regressors
for Political Science
October 17, 19, 2016
I
29 / 132
More Complex Predictions

We can also use the coefficient estimates for more complex


predictions that involve changing multiple variables simultaneously.
Consider our results for the regression of democracy on X1 income
and X2 ethnic heterogeneity:
I βb0 = −.71
I βb1 = 1.6
I βb2 = −.6

What is the predicted difference in democracy between


I Chile with X1 = 3.5 and X2 = .06
I China with X1 = 2.5 and X2 = .5?
Predicted democracy is
I −.71 + 1.6 · 3.5 − .6 · .06 = 4.8 for Chile
I −.71 + 1.6 · 2.5 − .6 · 0.5 = 3 for China.
Predicted difference is thus: 1.8 or (3.5 − 2.5)βb1 + (.06 − .5)βb2

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 30 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 31 / 132
AJR Example

11
Non-African countries

10
9
Log GDP per capita

8
7

African countries
6
5
4

0 2 4 6 8 10

Strength of Property Rights

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 32 / 132
Basics

Ye olde model:
Ybi = βb0 + βb1 Xi
Zi = 1 to indicate that i is an African country
Zi = 0 to indicate that i is an non-African country
Concern: AJR might be picking up an “African effect”:
I African countries have low incomes and weak property rights
I “Control for” country being in Africa or not to remove this
I Effects are now within Africa or within non-Africa, not between
New model:
Ybi = βb0 + βb1 Xi + βb2 Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 33 / 132
AJR model

##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.65556 0.31344 18.043 < 2e-16 ***
## avexpr 0.42416 0.03971 10.681 < 2e-16 ***
## africa -0.87844 0.14707 -5.973 3.03e-08 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.6253 on 108 degrees of freedom
## (52 observations deleted due to missingness)
## Multiple R-squared: 0.7078, Adjusted R-squared: 0.7024
## F-statistic: 130.8 on 2 and 108 DF, p-value: < 2.2e-16

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 34 / 132
Two lines in one regression

How can we interpret this model?


Plug in two possible values for Zi and rearrange
When Zi = 0:
Ybi = βb0 + βb1 Xi + βb2 Zi
= βb0 + βb1 Xi + βb2 × 0
= βb0 + βb1 Xi
When Zi = 1:
Ybi = βb0 + βb1 Xi + βb2 Zi
= βb0 + βb1 Xi + βb2 × 1
= (βb0 + βb2 ) + βb1 Xi
Two different intercepts, same slope

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 35 / 132
Example interpretation of the coefficients
Let’s review what we’ve seen so far:
Intercept for Xi Slope for Xi
Non-African country (Zi = 0) βb0 βb1
African country (Zi = 1) βb0 + βb2 βb1

In this example, we have:


Ybi = 5.656 + 0.424 × Xi − 0.878 × Zi
We can read these as:
I βb0 : average log income for non-African country (Zi = 0) with property
rights measured at 0 is 5.656
I βb1 : A one-unit increase in property rights is associated with a 0.424
increase in average log incomes for two African countries (or for two
non-African countries)
I βb2 : there is a -0.878 average difference in log income per capita
between African and non-African counties conditional on property
rights
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 36 / 132
General interpretation of the coefficients

Ybi = βb0 + βb1 Xi + βb2 Zi

βb0 : average value of Yi when both Xi and Zi are equal to 0


βb1 : A one-unit change in Xi is associated with a βb1 -unit change in Yi
conditional on Zi
βb2 : average difference in Yi between Zi = 1 group and Zi = 0 group
conditional on Xi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 37 / 132
Adding a binary variable, visually

11
β0 = 5.656
β1 = 0.424
10
9
β2 = -0.878
Log GDP per capita

8
7

β2
6

β0
5

β0 + β 2
4

0 2 4 6 8 10

Strength of Property Rights

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 38 / 132
Adding a continuous variable

Ye olde model:
Ybi = βb0 + βb1 Xi
Zi : mean temperature in country i (continuous)
Concern: geography is confounding the effect
I geography might affect political institutions
I geography might affect average incomes (through diseases like malaria)
New model:
Ybi = βb0 + βb1 Xi + βb2 Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 39 / 132
AJR model, revisited

##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.80627 0.75184 9.053 1.27e-12 ***
## avexpr 0.40568 0.06397 6.342 3.94e-08 ***
## meantemp -0.06025 0.01940 -3.105 0.00296 **
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.6435 on 57 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.6155, Adjusted R-squared: 0.602
## F-statistic: 45.62 on 2 and 57 DF, p-value: 1.481e-12

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 40 / 132
Interpretation with a continuous Z
Intercept for Xi Slope for Xi
Zi = 0 ◦C βb0 βb1
Zi = 21 ◦ C βb0 + βb2 × 21 βb1
Zi = 24 ◦ C βb0 + βb2 × 24 βb1
Zi = 26 ◦ C βb0 + βb2 × 26 βb1

In this example we have:


Ybi = 6.806 + 0.406 × Xi + −0.06 × Zi
βb0 : average log income for a country with property rights measured
at 0 and a mean temperature of 0 is 6.806
βb1 : A one-unit change in property rights is associated with a 0.406
change in average log incomes conditional on a country’s mean
temperature
βb2 : A one-degree increase in mean temperature is associated with a
-0.06 change in average log incomes conditional on strength of
property rights
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 41 / 132
General interpretation

Ybi = βb0 + βb1 Xi + βb2 Zi

The coefficient βb1 measures how the predicted outcome varies in Xi


for a fixed value of Zi .
The coefficient βb2 measures how the predicted outcome varies in Zi
for a fixed value of Xi .

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 42 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 43 / 132
Fitted values and residuals

Where do we get our hats? βb0 , βb1 , βb2


To answer this, we first need to redefine some terms from simple
linear regression.
Fitted values for i = 1, . . . , n:

Ybi = βb0 + βb1 Xi + βb2 Zi

Residuals for i = 1, . . . , n:

ubi = Yi − Ybi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 44 / 132
Least squares is still least squares

How do we estimate βb0 , βb1 , and βb2 ?


Minimize the sum of the squared residuals, just like before:
n
X
(βb0 , βb1 , βb2 ) = arg min (Yi − b0 − b1 Xi − b2 Zi )2
b0 ,b1 ,b2 i=1

The calculus is the same as last week, with 3 partial derivatives


instead of 2
Let’s start with a simple recipe and then rigorously show that it holds

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 45 / 132
OLS estimator recipe using two steps
“Partialling out” OLS recipe:
1 Run regression of Xi on Zi :

Xbi = δb0 + δb1 Zi


2 Calculate residuals from this regression:

rbxz,i = Xi − Xbi
3 Run a simple regression of Yi on residuals, rbxz,i :

Ybi = βb0 + βb1 rbxz,i

Estimate of βb1 will be the same as running:

Ybi = βb0 + βb1 Xi + βb2 Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 46 / 132
Regression property rights on mean temperature

##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.95678 0.82015 12.140 < 2e-16 ***
## meantemp -0.14900 0.03469 -4.295 6.73e-05 ***
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 1.321 on 58 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.2413, Adjusted R-squared: 0.2282
## F-statistic: 18.45 on 1 and 58 DF, p-value: 6.733e-05

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 47 / 132
Regression of log income on the residuals

## (Intercept) avexpr.res
## 8.0542783 0.4056757

## (Intercept) avexpr meantemp


## 6.80627375 0.40567575 -0.06024937

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 48 / 132
Residual/partial regression plot

Useful for plotting the conditional relationship between property rights and
income given temperature:

10
Log GDP per capita

6
-3 -2 -1 0 1 2 3

Residuals(Property Right ~ Mean Temperature)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 49 / 132
Deriving the Linear Least Squares Estimator

In simple regression, we chose (βb0 , βb1 ) to minimize the sum of the


squared residuals
We use the same principle for picking (βb0 , βb1 , βb2 ) for regression with
two regressors (xi and zi ):
n
X n
X
(β̂0 , β̂1 , β̂2 ) = argmin ubi2 = argmin (yi − ŷi )2
β̃0 ,β̃1 ,β̃2 i=1 β̃0 ,β̃1 ,β̃2 i=1
Xn
= argmin (yi − β̃0 − xi β̃1 − zi β̃2 )2
β̃0 ,β̃1 ,β̃2 i=1

(The same works more generally for k regressors, but this is done
more easily with matrices as we will see next week)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 50 / 132
Deriving the Linear Least Squares Estimator

We want to minimize the following quantitity with respect to (β̃0 , β̃1 , β̃2 ):
n
X
S(β̃0 , β̃1 , β̃2 ) = (yi − β̃0 − β̃1 xi − β̃2 zi )2
i=1

Plan is conceptually the same as before


1 Take the partial derivatives of S with respect to β̃0 , β̃1 and β̃2 .
2 Set each of the partial derivatives to 0 to obtain the first order
conditions.
3 Substitute β̂0 , β̂1 , β̂2 for β̃0 , β̃1 , β̃2 and solve for β̂0 , β̂1 , β̂2 to obtain
the OLS estimator.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 51 / 132
First Order Conditions
Setting the partial derivatives equal to zero leads to a system of 3 linear equations
in 3 unknowns: β̂0 , β̂1 and β̂2
n
∂S X
= (yi − β̂0 − β̂1 xi − β̂2 zi ) = 0
∂ β̃0 i=1
n
∂S X
= xi (yi − β̂0 − β̂1 xi − β̂2 zi ) = 0
∂ β̃1 i=1
n
∂S X
= zi (yi − β̂0 − β̂1 xi − β̂2 zi ) = 0
∂ β̃2 i=1

When will this linear system have a unique solution?


More observations than predictors (i.e. n > 2)
x and z are linearly independent, i.e.,
I neither x nor z is a constant
I x is not a linear function of z (or vice versa)
Wooldridge calls this assumption no perfect collinearity
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 52 / 132
The OLS Estimator
The OLS estimator for (β̂0 , β̂1 , β̂2 ) can be written as

β̂0 = ȳ − β̂1 x̄ − β̂2 z̄


Cov (x, y )Var (z) − Cov (z, y )Cov (x, z)
β̂1 =
Var (x)Var (z) − Cov (x, z)2
Cov (z, y )Var (x) − Cov (x, y )Cov (z, x)
β̂2 =
Var (x)Var (z) − Cov (x, z)2

For (β̂0 , β̂1 , β̂2 ) to be well-defined we need:

Var (x)Var (z) 6= Cov (x, z)2

Condition fails if:


1 If x or z is a constant (⇒ Var (x)Var (z) = Cov (x, z) = 0)
2 One explanatory variable is an exact linear function of another
(⇒ Cor (x, z) = 1 ⇒ Var (x)Var (z) = Cov (x, z)2 )

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 53 / 132
“Partialling Out” Interpretation of the OLS Estimator
Assume Y = β0 + β1 X + β2 Z + u. Another way to write the OLS estimator is:
Pn
rˆxz,i yi
β̂1 = Pi n 2
i rˆxz,i

where rˆxz,i are the residuals from the regression of X on Z :


X = λ + δZ + rxz
In other words, both of these regressions yield identical estimates βˆ1 :

y = γˆ0 + βˆ1 rˆxz and y = βˆ0 + βˆ1 x + βˆ2 z

δ is correlation between X and Z . What is our estimator βˆ1 if δ = 0?


Pn Pn
rˆxz,i yi (xi − x̄) yi
rxz = x − λ̂ = xi − x̄ so β̂1 = Pi n 2 = Pi n
i (xi − x̄)
rˆ 2
i xz,i

That is, same as the simple regresson of Y on X alone.


Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 54 / 132
Origin of the Partial Out Recipe
Assume Y = β0 + β1 X + β2 Z + u. Another way to write the OLS estimator is:
Pn
rˆxz,i yi
β̂1 = Pi n 2
i rˆxz,i

where rˆxz,i are the residuals from the regression of X on Z :

X = λ + δZ + rxz

In other words, both of these regressions yield identical estimates βˆ1 :

y = γˆ0 + βˆ1 rˆxz and y = βˆ0 + βˆ1 x + βˆ2 z

δ measures the correlation between X and Z .


Residuals rˆxz are the part of X that is uncorrelated with Z . Put differently,
rˆxz is X , after the effect of Z on X has been partialled out or netted out.
Can use same equation with k explanatory variables; rˆxz will then come from
a regression of X on all the other explanatory variables.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 55 / 132
OLS assumptions for unbiasedness

When we have more than one independent variable, we need the


following assumptions in order for OLS to be unbiased:

1 Linearity
Yi = β0 + β1 Xi + β2 Zi + ui
2 Random/iid sample
3 No perfect collinearity
4 Zero conditional mean error

E[ui |Xi , Zi ] = 0

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 56 / 132
New assumption

Assumption 3: No perfect collinearity


(1) No explanatory variable is constant in the sample and (2) there are no
exactly linear relationships among the explanatory variables.

Two components
1 Both Xi and Zi have to vary.
2 Zi cannot be a deterministic, linear function of Xi .
Part 2 rules out anything of the form:

Zi = a + bXi

Notice how this is linear (equation of a line) and there is no error, so


it is deterministic.
What’s the correlation between Zi and Xi ? 1!

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 57 / 132
Perfect collinearity example (I)

Simple example:
I Xi = 1 if a country is not in Africa and 0 otherwise.
I Zi = 1 if a country is in Africa and 0 otherwise.
But, clearly we have the following:

Zi = 1 − Xi

These two variables are perfectly collinear.


What about the following:
I Xi = income
I Zi = Xi2
Do we have to worry about collinearity here?
No! Because while Zi is a deterministic function of Xi , it is not a
linear function of Xi .

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 58 / 132
R and perfect collinearity

R, and all other packages, will drop one of the variables if there is
perfect collinearity:

##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.71638 0.08991 96.941 < 2e-16 ***
## africa -1.36119 0.16306 -8.348 4.87e-14 ***
## nonafrica NA NA NA NA
## ---
## Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
##
## Residual standard error: 0.9125 on 146 degrees of freedom
## (15 observations deleted due to missingness)
## Multiple R-squared: 0.3231, Adjusted R-squared: 0.3184
## F-statistic: 69.68 on 1 and 146 DF, p-value: 4.87e-14

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 59 / 132
Perfect collinearity example (II)

Another example:
I Xi = mean temperature in Celsius
I Zi = 1.8Xi + 32 (mean temperature in Fahrenheit)

## (Intercept) meantemp meantemp.f


## 10.8454999 -0.1206948 NA

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 60 / 132
OLS assumptions for large-sample inference

For large-sample inference and calculating SEs, we need the two-variable


version of the Gauss-Markov assumptions:
1 Linearity
Yi = β0 + β1 Xi + β2 Zi + ui
2 Random/iid sample
3 No perfect collinearity
4 Zero conditional mean error

E[ui |Xi , Zi ] = 0

5 Homoskedasticity
var[ui |Xi , Zi ] = σu2

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 61 / 132
Inference with two independent variables in large samples

We have our OLS estimate βb1


We have an estimate of the standard error for that coefficient, SE
c [βb1 ].
Under assumption 1-5, in large samples, we’ll have the following:

βb1 − β1
∼ N(0, 1)
c [βb1 ]
SE

The same holds for the other coefficient:

βb2 − β2
∼ N(0, 1)
c [βb2 ]
SE

Inference is exactly the same in large samples!


Hypothesis tests and CIs are good to go
The SE’s will change, though

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 62 / 132
OLS assumptions for small-sample inference
For small-sample inference, we need the Gauss-Markov plus Normal errors:
1 Linearity
Yi = β0 + β1 Xi + β2 Zi + ui
2 Random/iid sample
3 No perfect collinearity
4 Zero conditional mean error

E[ui |Xi , Zi ] = 0

5 Homoskedasticity
var[ui |Xi , Zi ] = σu2

6 Normal conditional errors

ui ∼ N(0, σu2 )

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 63 / 132
Inference with two independent variables in small samples
Under assumptions 1-6, we have the following small change to our
small-n sampling distribution:

βb1 − β1
∼ tn−3
c [βb1 ]
SE

The same is true for the other coefficient:

βb2 − β2
∼ tn−3
c [βb2 ]
SE

Why n − 3?
I We’ve estimated another parameter, so we need to take off another
degree of freedom.
small adjustments to the critical values and the t-values for our
hypothesis tests and confidence intervals.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 64 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 65 / 132
Red State Blue State

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 66 / 132
Red and Blue States

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 67 / 132
Rich States are More Democratic

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 68 / 132
But Rich People are More Republican

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 69 / 132
Paradox Resolved

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 70 / 132
If Only Rich People Voted, it Would Be a Landslide

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 71 / 132
A Possible Explanation

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 72 / 132
References

Acemoglu, Daron, Simon Johnson, and James A. Robinson. “The colonial


origins of comparative development: An empirical investigation.”
American Economic Review. 91(5). 2001: 1369-1401.
Fish, M. Steven. ”Islam and authoritarianism.” World politics 55(01).
2002: 4-37.
Gelman, Andrew. Red state, blue state, rich state, poor state: why
Americans vote the way they do. Princeton University Press, 2009.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 73 / 132
Where We’ve Been and Where We’re Going...
Last Week
I mechanics of OLS with one variable
I properties of OLS
This Week
I Monday:
F adding a second variable
F new mechanics
I Wednesday:
F omitted variable bias
F multicollinearity
F interactions
Next Week
I multiple regression
Long Run
I probability → inference → regression

Questions?

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 74 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 75 / 132
Remember This?
@(A--B?(.C)D*EF<3GH* 5:(--/'(:*<?*EF3GH*
!"#$%&'(%)$* 3$4/(-#"$#--*
I-680,)%'*!$J#.#$'#************* 98(::B9(80:#*!$J#.#$'#***
+(,(*+#-'./0%)$* 5)$-/-,#$'6* EK*($"*!LH" E,*($"*NH*

1(./(%)$*/$*2* 1(./(%)$*/$*2* 1(./(%)$*/$*2* 1(./(%)$*/$*2*

7($")8*9(80:/$;* 7($")8*9(80:/$;* 7($")8*9(80:/$;*

</$#(./,6*/$* </$#(./,6*/$* </$#(./,6*/$*


=(.(8#,#.-* =(.(8#,#.-* =(.(8#,#.-*

>#.)*5)$"/%)$(:* >#.)*5)$"/%)$(:* >#.)*5)$"/%)$(:*


?#($* ?#($* ?#($*

M)8)-C#"(-%'/,6* M)8)-C#"(-%'/,6*

O).8(:/,6*)J*G..).-*

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 76 / 132
Unbiasedness revisited

True model:
Yi = β0 + β1 Xi + β2 Zi + ui
Assumptions 1-4 ⇒ we get unbiased estimates of the coefficients
What happens if we ignore the Zi and just run the simple linear
regression with just Xi ?
Misspecified model:

Yi = β0 + β1 Xi + ui∗ ui∗ = β2 Zi + ui

OLS estimates from the misspecified model:

Ybi = β̃0 + β̃1 Xi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 77 / 132
Omitted Variable Bias: Simple Case
True Population Model:

Voted Republican = β0 + β1 Watch Fox News + β2 Strong Republican + u

Underspecified Model that we use:

Voted Republican = β̃0 + β̃1 Watch Fox News

Q: Which statement is correct?


1 β1 > β̃1
2 β1 < β̃1
3 β1 = β̃1
4 Can’t tell
Answer: β̃1 is upward biased since being a strong republican is positively
correlated with both watching fox news and voting republican. We have
β1 < β̃1 .
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 78 / 132
Omitted Variable Bias: Simple Case
True Population Model:

Survival = β0 + β1 Hospitalized + β2 Health + u

Under-specified Model that we use:

Survival = β̃0 + β̃1 Hospitalized

Q: Which statement is correct?


1 β1 > β̃1
2 β1 < β̃1
3 β1 = β̃1
4 Can’t tell
Answer: The negative coefficient β̃1 is downward biased compared to the
true β1 so β1 > β̃1 . Being hospitalized is negatively correlated with health,
and health is positively correlated with survival.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 79 / 132
Omitted Variable Bias: Simple Case
True Population Model:

Y = β0 + β1 X1 + β2 X2 + u

Underspecified Model that we use:

ỹ = β̃0 + β̃1 x1

We can show that the relationship between β̃1 and β̂1 is:

β̃1 = β̂1 + β̂2 · δ̃

where:
δ̃ is the slope of a regression of x2 on x1 . If δ̃ > 0 then cor(x1 , x2 ) > 0 and if
δ̃ < 0 then cor(x1 , x2 ) < 0.
β̂2 is from the true regression and measures the relationship between x2 and
y , conditional on x1 .
Q. When will β̃1 = β̂1 ?
A. If δ̃ = 0 or β̂2 = 0.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 80 / 132
Omitted Variable Bias: Simple Case
We take expectations to see what the bias will be:

β̃1 = β̂1 + β̂2 · δ̃


E [β̃1 | X ] = E [β̂1 + β̂2 · δ̃ | X ]
= E [β̂1 | X ] + E [β̂2 | X ] · δ̃ (δ̃ nonrandom given x)
= β1 + β2 · δ̃ (given assumptions 1-4)

So

Bias[β̃1 | X ] = E [β̃1 | X ] − β1 = β2 · δ̃

So the bias depends on the relationship between x2 and x1 , our δ̃, and the
relationship between x2 and y , our β2 .

Any variable that is correlated with an included X and the outcome Y is


called a confounder.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 81 / 132
Omitted Variable Bias: Simple Case

Direction of the bias of β̃1 compared to β1 is given by:


cov(X1 , X2 ) > 0 cov(X1 , X2 ) < 0 cov(X1 , X2 ) = 0
β2 > 0 Positive bias Negative Bias No bias
β2 < 0 Negative bias Positive Bias No bias
β2 = 0 No bias No bias No bias

Further points:
Magnitude of the bias matters too
If you miss an important confounder, your estimates are biased and
inconsistent.
In the more general case with more than two covariates the bias is
more difficult to discern. It depends on all the pairwise correlations.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 82 / 132
Including an Irrelevant Variable: Simple Case

True Population Model:

y = β0 + β1 x1 + β2 x2 + u where β2 = 0

and Assumptions I–IV hold.


Overspecified Model that we use:

ỹ = β̃0 + β̃1 x1 + β˜2 x2

Q: Which statement is correct?


1 β1 > β̃1
2 β1 < β̃1
3 β1 = β̃1
4 Can’t tell

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 83 / 132
Including an Irrelevant Variable: Simple Case

Recall: Given Assumptions I–IV, we have:

E [β̂j ] = βj

for all values of βj . So, if β2 = 0, we get

E [β̂0 ] = β0 , E [β̂1 ] = β1 , E [β̂2 ] = 0

and thus including the irrelevant variable does not generally affect the
unbiasedness. The sampling distribution of β̂2 will be centered about zero.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 84 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 85 / 132
Sampling variance for simple linear regression

Under simple linear regression, we found that the distribution of the


slope was the following:

σu2
var(βb1 ) = Pn 2
i=1 (Xi − X )

Factors affecting the standard errors (the square root of these


sampling variances):
I The error variance σu2 (higher conditional variance of Yi leads to bigger
SEs) Pn
I The total variation in Xi : i=1 (Xi − X )2 (lower variation in Xi leads
to bigger SEs)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 86 / 132
Sampling variation for linear regression with two covariates
Regression with an additional independent variable:

σ2
var(βb1 ) = Pnu
(1 − R12 ) i=1 (Xi − X )2

Here, R12 is the R 2 from the regression of Xi on Zi :

Xbi = δb0 + δb1 Zi

Factors now affecting the standard errors:


I The error variance (higher conditional variance of Yi leads to bigger
SEs)
I The total variation of Xi (lower variation in Xi leads to bigger SEs)
I The strength of the relationship between Xi and Zi (stronger
relationships mean higher R12 and thus bigger SEs)
What happens with perfect collinearity? R12 = 1 and the variances are
infinite.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 87 / 132
Multicollinearity

Definition
Multicollinearity is defined to be high, but not perfect, correlation between
two independent variables in a regression.

With multicollinearity, we’ll have R12 ≈ 1, but not exactly.


The stronger the relationship between Xi and Zi , the closer the R12
will be to 1, and the higher the SEs will be:

σ2
var(βb1 ) = Pnu
(1 − R12 ) i=1 (Xi − X )2

Given the symmetry, it will also increase var(βb2 ) as well.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 88 / 132
Intuition for multicollinearity

Remember the OLS recipe:


I βb1 from regression of Yi on rbxz,i
I rbxz,i are the residuals from the regression of Xi on Zi
Estimated coefficient:
Pn
rbxz,i Yi
β1 = Pi=1
b
n 2
i=1 rbxz,i

When Zi and Xi have a strong relationship, then the residuals will


have low variation
We explain away a lot of the variation in Xi through Zi .
Low variation in an independent variable (here, rbxz,i ) high SEs
Basically, there is less residual variation left in Xi after “partialling
out” the effect of Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 89 / 132
Effects of multicollinearity
No effect on the bias of OLS.
Only increases the standard errors.
Really just a sample size problem:
I If Xi and Zi are extremely highly correlated, you’re going to need a
much bigger sample to accurately differentiate between their effects.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 90 / 132
How Do We Detect Multicollinearity?
The best practice is to directly compute Cor(X1 , X2 ) before running your
regression.

But you might (and probably will) forget to do so. Even then, you can
detect multicollinearity from your regression result:
I Large changes in the estimated regression coefficients when a predictor

variable is added or deleted


I Lack of statistical significance despite high R 2
I Estimated regression coefficients have an opposite sign from predicted
A more formal indicator is the variance inflation factor (VIF):
1
VIF (βj ) =
1 − Rj2

which measures how much V [β̂j | X ] is inflated compared to a


(hypothetical) uncorrelated data. (where Rj2 is the coefficient of
determination from the partialing out equation)
In R, vif() in the car package.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 91 / 132
So How Should I Think about Multicollinearity?

Multicollinearity does NOT lead to bias; estimates will be unbiased


and consistent.
Multicollinearity should in fact be seen as a problem of
micronumerosity, or “too little data.” You can’t ask the OLS
estimator to distinguish the partial effects of X1 and X2 if they are
essentially the same.
If X1 and X2 are almost the same, why would you want a unique β1
and a unique β2 ? Think about how you would interpret that?

Relax, you got way more important things to worry about!


If possible, get more data
Drop one of the variables, or combine them
Or maybe linear regression is not the right tool

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 92 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 93 / 132
Why Dummy Variables?

A dummy variable (a.k.a. indicator variable, binary variable, etc.) is a


variable that is coded 1 or 0 only.
We use dummy variables in regression to represent qualitative information
through categorical variables such as different subgroups of the sample (e.g.
regions, old and young respondents, etc.)
By including dummy variables into our regression function, we can easily
obtain the conditional mean of the outcome variable for each category.
I E.g. does average income vary by region? Are Republicans smarter
than Democrats?
Dummy variables are also used to examine conditional hypothesis via
interaction terms
I E.g. does the effect of education differ by gender?

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 94 / 132
How Can I Use a Dummy Variable?

Consider the easiest case with two categories. The type of electoral
system of country i is given by:
Xi ∈ {Proportional, Majoritarian}

For this we use a single dummy variable which is coded like:



1 if country i has a Majoritarian Electoral System
Di =
0 if country i has a Proportional Electoral System

Hint: Informative variable names help (e.g. call it MAJORITARIAN)

Let’s regress GDP on this dummy variable and a constant:


Y = β0 + β1 D + u

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 95 / 132
Example: GDP per capita on Electoral System
R Code
> summary(lm(REALGDPCAP ~ MAJORITARIAN, data = D))

Call:
lm(formula = REALGDPCAP ~ MAJORITARIAN, data = D)

Residuals:
Min 1Q Median 3Q Max
-5982 -4592 -2112 4293 13685

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7097.7 763.2 9.30 1.64e-14 ***
MAJORITARIAN -1053.8 1224.9 -0.86 0.392
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 5504 on 83 degrees of freedom


Multiple R-squared: 0.008838, Adjusted R-squared: -0.003104
F-statistic: 0.7401 on 1 and 83 DF, p-value: 0.3921

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 96 / 132
Example: GDP per capita on Electoral System
R Code
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7097.7 763.2 9.30 1.64e-14 ***
MAJORITARIAN -1053.8 1224.9 -0.86 0.392

R Code
> gdp.pro <- D$REALGDPCAP[D$MAJORITARIAN == 0]
> summary(gdp.pro)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1116 2709 5102 7098 10670 20780

> gdp.maj <- D$REALGDPCAP[D$MAJORITARIAN == 1]


> summary(gdp.maj)
Min. 1st Qu. Median Mean 3rd Qu. Max.
530.2 1431.0 3404.0 6044.0 11770.0 18840.0

So this is just like a difference in means two sample t-test!

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 97 / 132
Example: GDP per capita on Electoral System
R Code
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7097.7 763.2 9.30 1.64e-14 ***
MAJORITARIAN -1053.8 1224.9 -0.86 0.392

R Code
> gdp.pro <- D$REALGDPCAP[D$MAJORITARIAN == 0]
> summary(gdp.pro)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1116 2709 5102 7098 10670 20780

> gdp.maj <- D$REALGDPCAP[D$MAJORITARIAN == 1]


> summary(gdp.maj)
Min. 1st Qu. Median Mean 3rd Qu. Max.
530.2 1431.0 3404.0 6044.0 11770.0 18840.0

So this is just like a difference in means two sample t-test!

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 98 / 132
Example: GDP per capita on Electoral System
R Code
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7097.7 763.2 9.30 1.64e-14 ***
MAJORITARIAN -1053.8 1224.9 -0.86 0.392

R Code
> gdp.pro <- D$REALGDPCAP[D$MAJORITARIAN == 0]
> summary(gdp.pro)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1116 2709 5102 7098 10670 20780

> gdp.maj <- D$REALGDPCAP[D$MAJORITARIAN == 1]


> summary(gdp.maj)
Min. 1st Qu. Median Mean 3rd Qu. Max.
530.2 1431.0 3404.0 6044.0 11770.0 18840.0

So this is just like a difference in means two sample t-test!

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 99 / 132
Dummy Variables for Multiple Categories
More generally, let’s say X measures which of m categories each unit
i belongs to. E.g. the type of electoral system or region of country i
is given by:
I Xi ∈ {Proportional, Majoritarian} so m = 2
I Xi ∈ {Asia, Africa, LatinAmerica, OECD, Transition} so m = 5

To incorporate this information into our regression function we usually


create m − 1 dummy variables, one for each of the m − 1 categories.

Why not all m? Including all m category indicators as dummies would


violate the no perfect collinearity assumption:

Dm = 1 − (D1 + · · · + Dm−1 )

The omitted category is our baseline case (also called a reference


category) against which we compare the conditional means of Y for
the other m − 1 categories.
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 100 / 132
Example: Regions of the World
Consider the case of our “polytomous” variable world region with
m = 5:
Xi ∈ {Asia, Africa, LatinAmerica, OECD, Transition}
This five-category classification can be represented in the regression
equation by introducing m − 1 = 4 dummy regressors:

Category D1 D2 D3 D4
Asia 1 0 0 0
Africa 0 1 0 0
LatinAmerica 0 0 1 0
OECD 0 0 0 1
Transition 0 0 0 0

Our regression equation is:

Y = β0 + β1 D1 + β2 D2 + β3 D3 + β4 D4 + u

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 101 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 102 / 132
Why Interaction Terms?

Interaction terms will allow you to let the slope on one variable vary
as a function of another variable
Interaction terms are central in regression analysis to:
I Model and test conditional hypothesis (do the returns to education
vary by gender?)
I Make model of the conditional expectation function more realistic by
letting coefficients vary across subgroups

We can interact:
I two or more dummy variables
I dummy variables and continuous variables
I two or more continuous variables

Interactions often confuses researchers and mistakes in use and


interpretation occur frequently (even in top journals)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 103 / 132
Return to the Fish Example

Data comes from Fish (2002), “Islam and Authoritarianism.”


Basic relationship: does more economic development lead to more
democracy?
We measure economic development with log GDP per capita
We measure democracy with a Freedom House score, 1 (less free) to
7 (more free)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 104 / 132
Let’s see the data

7
6 Non-Muslim
Democracy

5
4
3
2

Muslim
1

2.0 2.5 3.0 3.5 4.0 4.5

Log GDP per capita

Fish argues that Muslim countries are less likely to be democratic no


matter their economic development
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 105 / 132
Controlling for Religion Additively

7
6 Non-Muslim
Democracy

5
4
3
2

Muslim
1

2.0 2.5 3.0 3.5 4.0 4.5

Log GDP per capita

But the regression is a poor fit for Muslim countries


Can we allow for different slopes for each group?
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 106 / 132
Interactions with a binary variable

Let Zi be binary
In this case, Zi = 1 for the country being Muslim
We can add another covariate to the baseline model that allows the
effect of income to vary by Muslim status.
This covariate is called an interaction term and it is the product of
the two marginal variables of interest: incomei × muslimi
Here is the model with the interaction term:

Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 107 / 132
Two lines in one regression

Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi

How can we interpret this model?


We can plug in the two possible values of Zi
When Zi = 0:
Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi
= βb0 + βb1 Xi + βb2 × 0 + βb3 Xi × 0
= βb0 + βb1 Xi

When Zi = 1:
Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi
= βb0 + βb1 Xi + βb2 × 1 + βb3 Xi × 1
= (βb0 + βb2 ) + (βb1 + βb3 )Xi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 108 / 132
Example interpretation of the coefficients
Intercept for Xi Slope for Xi
Non-Muslim country (Zi = 0) βb0 βb1
Muslim country (Zi = 1) βb0 + βb2 βb1 + βb3
7
6
Democracy

5
4
3
2
1

2.0 2.5 3.0 3.5 4.0 4.5

Log GDP per capita

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 109 / 132
General interpretation of the coefficients

βb0 : average value of Yi when both Xi and Zi are equal to 0


βb1 : a one-unit change in Xi is associated with a βb1 -unit change in Yi
when Zi = 0
βb2 : average difference in Yi between Zi = 1 group and Zi = 0 group
when Xi = 0
βb3 : change in the effect of Xi on Yi between Zi = 1 group and Zi = 0

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 110 / 132
Lower order terms
Principle of Marginality: Always include the marginal effects
(sometimes called the lower order terms)
Imagine we omitted the lower order term for muslim:
6
Democracy

4
2
0

0 1 2 3 4

Log GDP per capita

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 111 / 132
Omitting lower order terms

Ybi = βb0 + βb1 Xi + 0 × Zi + βb3 Xi Zi

Intercept for Xi Slope for Xi


Non-Muslim country (Zi = 0) βb0 βb1
Muslim country (Zi = 1) βb0 + 0 βb1 + βb3

Implication: no difference between Muslims and non-Muslims when


income is 0
Distorts slope estimates.
Very rarely justified.
Yet for some reason people keep doing it.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 112 / 132
Interactions with two continuous variables

Now let Zi be continuous


Zi is the percent growth in GDP per capita from 1975 to 1998
Is the effect of economic development for rapidly developing countries
higher or lower than for stagnant economies?
We can still define the interaction:

incomei × growthi

And include it in the regression:

Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 113 / 132
Interpretation

With a continuous Zi , we can have more than two values that it can
take on:
Intercept for Xi Slope for Xi
Zi =0 βb0 βb1
Zi = 0.5 βb0 + βb2 × 0.5 βb1 + βb3 × 0.5
Zi =1 βb0 + βb2 × 1 βb1 + βb3 × 1
Zi =5 βb0 + βb2 × 5 βb1 + βb3 × 5

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 114 / 132
General interpretation

Ybi = βb0 + βb1 Xi + βb2 Zi + βb3 Xi Zi

The coefficient βb1 measures how the predicted outcome varies in Xi


when Zi = 0.
The coefficient βb2 measures how the predicted outcome varies in Zi
when Xi = 0
The coefficient βb3 is the change in the effect of Xi given a one-unit
change in Zi :
∂E [Yi |Xi , Zi ]
= β1 + β3 Zi
∂Xi
The coefficient βb3 is the change in the effect of Zi given a one-unit
change in Xi :
∂E [Yi |Xi , Zi ]
= β2 + β3 Xi
∂Zi

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 115 / 132
Additional Assumptions

Interaction effects are particularly susceptible to model dependence. We


are making two assumptions for the estimated effects to be meaningful:
1 Linearity of the interaction effect
2 Common support (variation in X throughout the range of Z )

We will talk about checking these assumptions in a few weeks.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 116 / 132
Example: Common Support
Chapman 2009 analysis
example and reanalysis from Hainmueller, Mummolo, Xu 2016

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 117 / 132
Example: Common Support
Chapman 2009 analysis
example and reanalysis from Hainmueller, Mummolo, Xu 2016

UN authorization = 0

50


● ●● ●
● ●



● ●● ● ● ●● ●
●● ● ●
● ● ●●
● ●
● ●● ● ●● ●
● ●
●●● ● ● ● ●● ●
● ●●
0 ● ●●
●●●

●●

● ●
●● ●
●●●
●●●
●●






●●●
● ● ●

●● ●
●●
● ●●
● ●●
●●

●●
● ●
● ●●●● ●
● ● ●
●●


● ●●● ●


● ● ●




●● ●
●●
●● ●●
● ●
●● ●

−50

rallies

UN authorization = 1

50


● ●

0 ●

−50

−1.00 −0.75 −0.50 −0.25 0.00


US affinity with UN Security Council

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 117 / 132
Summary for Interactions
Do not omit lower order terms (unless you have a strong theory that tells
you so) because this usually imposes unrealistic restrictions.
Do not interpret the coefficients on the lower terms as marginal effects (they
give the marginal effect only for the case where the other variable is equal to
zero)
Produce tables or figures that summarize the conditional marginal effects of
the variable of interest at plausible different levels of the other variable; use
correct formula to compute variance for these conditional effects (sum of
coefficients)
In simple cases the p-value on the interaction term can be used as a test
against the null of no interaction, but significant tests for the lower order
terms rarely make sense.
Further Reading: Brambor, Clark, and Golder. 2006. Understanding Interaction
Models: Improving Empirical Analyses. Political Analysis 14 (1): 63-82.
Hainmueller, Mummolo, Xu. 2016. How Much Should We Trust Estimates from
Multiplicative Interaction Models? Simple Tools to Improve Empirical Practice.
Working Paper
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 118 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 119 / 132
Polynomial terms

Polynomial terms are a special case of the continuous variable


interactions.
For example, when X1 = X2 in the previous interaction model, we get
a quadratic:

Y = β0 + β1 X1 + β2 X2 + β3 X1 X2 + u
Y = β0 + (β1 + β2 )X1 + β3 X1 X1 + u
Y = β0 + β˜1 X1 + β˜2 X12 + u

This is called a second order polynomial in X1


A third order polynomial is given by:
Y = β0 + β1 X1 + β2 X12 + β3 X13 + u

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 120 / 132
dratic age effect
Polynomial Example: Income and Age

25
Let’s look at data from the
define U.S.
xi toandbe agethe
for
●● ● ●● ●● ● ●● ●

●● ●● ●●● ●● ● ●●● ● ●● ●
● ●●● ●●● ●●●●● ●●
●● ●● ●●
●●●●
● ●● ●●●● ● ● ●
● ●●● ●● ●●● ●● ●●● ●● ●● ●●
●●●
●●● ● ●

examine ● ● ● ●●
● ●●
● ●●
●● ●●● ●●●
●●
●● ●

●●

● ●●
●●●●●●●
●●●●●
●●● ●


●●●
● ●

●●●
●●

●●
●●●●●●●
●●
● ●
●●●●●
●●● ●
●●●●●●●
●●
● ●
●●●●●● ●
●●

● ●●

●● ●
● ●

idual i and yi tobetween


be income

●● ● ● ●

20

●● ●●● ●●●●
●● ●●●●
●● ●
●● ● ●●●●● ●● ● ● ● ●●
●●●●
● ●●● ●
●● ●●● ●
●● ● ●● ●● ●● ●
relationship Y: income ●
●●
●● ●●●●●●●●●●●●
●●
●●● ●●
● ●●
● ●● ●● ● ●
● ●●●● ●
●● ●●●●
● ●
●●●
●● ● ●●● ●
● ●●●● ● ● ●● ●●●● ●●●
●●●

● ●●
● ●● ●●● ● ● ● ●●
● ●● ● ● ●
and X: age
gory,
● ●● ●●●● ● ● ●● ●●● ● ●● ● ●
●●
● ●●
●●●●●● ● ●●●●● ● ●●
●● ●● ● ●● ●● ●●
● ●● ●
●● ●●● ● ●● ●● ● ●● ●●● ● ●●● ●●● ● ●● ●
● ●●
●●●● ●
●●●●●
●●● ●●
● ●●●● ●● ● ● ●●

15
● ● ●●●
● ●● ● ●● ● ● ● ●●● ●●●●●●●●
●●●● ● ●●●

jitter(income)
●● ● ● ●●● ●●●●
●●
● ●●●
● ● ●●● ●● ● ●
● ●●● ● ●
● ●●● ●●●●●●●
●● ●
● ●● ● ● ● ●●● ●●● ● ●

We see that a simple linear ●●●●● ● ● ●● ● ●


●● ● ● ●●● ●● ●
● ● ● ● ●●
● ●● ●
●●● ●
●●● ●
● ● ● ●● ●
● ●
● ●
● ● ● ● ● ● ●
● ●● ● ●● ●●
specification does2not fit the ● ● ● ● ● ●● ● ● ● ●● ● ●

ŷi = + xvery + xi β̂2

10
●●● ● ● ● ● ●

β̂0data i β̂1well:
● ● ● ●
● ●
● ● ● ● ●● ● ●●●
● ●●
● ● ●●● ● ●● ●● ●● ●
●● ● ●● ● ●●● ● ●

Y = β0 + β1 X1 + u ●● ● ● ● ● ● ● ● ●●

produces a much better fit


● ●
●●● ● ● ● ●
●●

5
● ● ● ● ●
● ● ● ●● ● ●● ●● ● ●●●● ● ● ●
● ●● ● ● ●●
A second order polynomial in ● ●
●●
●●● ●
● ● ● ●● ●
● ● ● ●

e data. age fits the data a lot better:


●● ● ●
●●● ●●●● ● ● ● ●
● ● ● ● ● ● ●

0
Y = β0 + β1 X1 + β2 X12 + u 20 40 60 80

jitter(age)

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 121 / 132
dratic age effect
Polynomial Example: Income and Age

Y = β0 + β1 X1 + β2 X12 + u
Is β1 the marginal effect of age

25
define onx income?
to be age for
●● ● ●● ●● ● ●● ●●
●● ●●●●● ● ●●● ● ●● ●
● ●●● ●●● ●●● ●●
●●●●
● ●●
●● ●●
●●●●
● ● ●●●●
● ●●● ● ●

i
● ●●● ●● ●●● ●● ●●● ●● ●●●●●
●●● ● ●
● ●● ● ● ● ●● ●
● ● ● ●●● ●●
●●●●●●● ●●● ● ●● ● ● ●●● ●● ● ●● ● ●
●●● ●●● ●●●●●●●●●
●● ●● ●
● ●●
●● ●●● ●●●
●●
●● ●
●●
● ●●●●

●●●●● ●●●●● ●●
●●●●● ●●●● ● ●
● ●●

idual i and y to be income



●● ● ● ●
No! The marginal effect of age

20
●●●●
● ●●●●
●●
●●
●●●●●●●●
●● ●●●●

●● ●● ●●●●
●● ●
●● ● ●●
●●●
● ●
● ●●
●●●●● ● ●
●● ● ●● ● ● ● ● ●●

dependsi on the level of age:


●● ●●●●●●●●●●●●●●● ●● ● ●● ●● ● ● ●● ●● ● ●●● ●
● ●●●● ●
●● ●●● ●
● ●●●● ● ● ●● ●●●● ●●●
●●●


● ●● ●●●● ● ● ● ● ●●
● ●● ● ● ●

gory, ∂X = βb1 + 2 βb2 X1


● ●● ●●●● ● ● ●● ●●● ● ●● ● ●
●●
● ●●
●●●●●● ● ●●●●● ● ●●
●● ●● ● ●● ●● ●●
● ●● ●
∂Y ●● ●●●
● ●
●●
●●●●● ●
● ●●●●●
●●
●●

● ●●
● ●●●●●● ●●● ● ●●●
●● ● ● ●●● ● ●● ●
●●

15
● ● ●●●
● ●● ● ●● ● ● ● ●●● ●●●● ●●●●●●●●●
●●●

jitter(income)
●● ● ● ●●● ●●●●●●
● ●●●● ● ●●● ●● ● ●
1 ● ●●● ● ●
● ●●● ●●●●●●● ●● ●
● ●● ● ● ● ●●● ●●● ● ●
Here the effect of age changes ●●●●● ● ● ●● ● ● ● ●● ●
●●● ● ● ● ● ●● ●
●● ● ● ●●● ●● ●
● ● ● ● ●● ●●● ● ● ●●●●● ● ●● ● ●
monotonically from positive to ● ●● ● ● ● ● ●
● ● ● ● ●●●

xi2 β̂2
● ● ● ●● ● ●

ŷi = + xi β̂with

10
●●● ● ● ● ● ●

β̂0negative 1 +income.
● ● ● ● ● ●
● ● ● ● ●● ● ●●●
● ●●
● ● ●●● ● ●● ●● ●● ●
●● ● ●● ● ●●● ● ●
●● ● ● ● ● ●●
If β2 > 0 we get a U-shape, ● ● ●

producesand ifaβmuch better fit


● ●
●●● ● ● ● ●
●●

5
● ● ● ● ●
● ● ● ●● ● ●● ●● ● ●●●
● ● ● ●

2 < 0 we get an ●
●●
● ●●●

● ● ● ●


● ● ● ●● ● ●

e data.
●● ● ● ● ●● ●

inverted U-shape. 0
●●● ●
●●● ● ● ● ● ● ● ● ● ● ● ●

Maximum/Minimum occurs at 20 40 60 80
β1
| 2β 2
|. Here turning point is at jitter(age)

X1 = 50.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 122 / 132
Higher Order Polynomials

Approximating data generated with a sine function. Red line is a first degree
polynomial, green line is second degree, orange line is third degree and blue is
fourth degree
Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 123 / 132
Conclusion

In this brave new world with 2 independent variables:


1 β’s have slightly different interpretations
2 OLS still minimizing the sum of the squared residuals
3 Small adjustments to OLS assumptions and inference
4 Adding or omitting variables in a regression can affect the bias and
the variance of OLS
5 We can optionally consider interactions, but must take care to
interpret them correctly

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 124 / 132
Next Week

OLS in its full glory


Reading:
I Practice up on matrices
I Fox Chapter 9.1-9.4 (skip 9.1.1-9.1.2) Linear Models in Matrix Form
I Aronow and Miller 4.1.2-4.1.4 Regression with Matrix Algebra
I Optional: Fox Chapter 10 Geometry of Regression
I Optional: Imai Chapter 4.3-4.3.3
I Optional: Angrist and Pischke Chapter 3.1 Regression Fundamentals

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 125 / 132
1 Two Examples
2 Adding a Binary Variable
3 Adding a Continuous Covariate
4 Once More With Feeling
5 OLS Mechanics and Partialing Out
6 Fun With Red and Blue
7 Omitted Variables
8 Multicollinearity
9 Dummy Variables
10 Interaction Terms
11 Polynomials
12 Conclusion
13 Fun With Interactions

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 126 / 132
Fun With Interactions

Remember that time I mentioned people doing strange things with


interactions?

Brooks and Manza (2006). “Social Policy Responsiveness in Developed


Democracies.” American Sociological Review.
Breznau (2015) “The Missing Main Effect of Welfare State Regimes: A
Replication of ‘Social Policy Responsiveness in Developed Democracies.”’
Sociological Science.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 127 / 132
Original Argument

Public preferences shape welfare state trajectories over the long term
Democracy empowers the masses, and that empowerment helps
define social outcomes
Key model is interaction between liberal/non-liberal and public
preferences on social spending
but. . . they leave out a main effect.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 128 / 132
Omitted Term

They omit the marginal term for liberal/non-liberal


This forces the two regression lines to intersect at public preferences
= 0.
They mean center so the 0 represents the average over the entire
sample

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 129 / 132
What Happens?

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 130 / 132
Moral of the Story

Seriously, don’t omit lower order terms.

<drops mic>

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 131 / 132
References

Acemoglu, Daron, Simon Johnson, and James A. Robinson. “The colonial


origins of comparative development: An empirical investigation.”
American Economic Review. 91(5). 2001: 1369-1401.
Fish, M. Steven. ”Islam and authoritarianism.” World politics 55(01).
2002: 4-37.
Gelman, Andrew. Red state, blue state, rich state, poor state: why
Americans vote the way they do. Princeton University Press, 2009.

Stewart (Princeton) Week 6: Two Regressors October 17, 19, 2016 132 / 132

You might also like