0% found this document useful (0 votes)

62 views19 pages

Understanding Instrumental Variables in Econometrics

Uploaded by

ivanmrn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views19 pages

Understanding Instrumental Variables in Econometrics

Uploaded by

ivanmrn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Old School IV

Master Joshway

ASSA Continuing Education: January 2020

Instruments Ready?
Organizing IV
I tell the IV story in two iterations, first with constant e§ects,
then with heterogeneous potential outcomes.

• The constant e§ects framework focuses on selection bias and essential

IV mechanics
• Many reasons to instrument ... here’s one:
• The regression we want is long, say:

yi = a + rsi + Ai0 g + vi = a + rsi + h i (1)

Since si and h i = Ai0 g + vi are correlated,

Cov (yi , si )
6= r
V (si )
• The short regression su§ers from "ability bias"
• IV recovers long-regression r without observing Ai
• Why go long? Must be a causal story!

IV Goes Long (in pursuit of causal e§ects)

• Potential earnings modeled as a function of ability:

y0i = a + h i = a + Ai0 g + vi ,
where we’re happy to assume E [vi si ] = 0
• Moving from s − 1 to s years of schooling yields constant returns:

ys ,i − ys −1,i = r,
making eq. (1) into a causal model
• A valid instrument, zi , is:
1 correlated with si
2 uncorrelated with h i = Ai0 g + vi and hence with y0i

• zi is excluded from the causal model of interest

• Given these assumptions, we have:
Cov (yi , zi ) Cov (yi , zi )/V (zi ) ”RF ”
r= = = (2)
Cov (si , zi ) Cov (si , zi )/V (zi ) ”1st”
Abraham (Wald) Meets Jacob (Bernoulli)
• Repeat our long regression, equation (1):

yi = a + rsi + Ai0 g + vi = a + rsi + h i

• By linear CEF, the RF for a Bernoulli (dummy) instrument, zi , is

Cov (yi , zi )
= E [ y i | z i = 1 ] − E [ y i | zi = 0 ] ,
V ( zi )
Cov (si ,zi )
with an analogous formula for V ( zi )
. This shows:

E [yi |zi ] = a + rE [si |zi ] (4)

Solving (4) for r yields (3)

Angrist and Krueger (1991): Compulsory IV

• Children born in late-quarters start school younger, so are kept in

school longer by birthday-based compulsory schooling laws
• There’s a powerful first stage supporting this
• Late-quarter births have more years of schooling
• This is driven by high school and not college, consistent with the
AK-91 CSL story

• Mean schooling and wages by YOB/QOB appear in Figure 4.1.1

• The IV estimator is the sample analog of (2); with a dummy

instrument this becomes the sample analog of (3)

• AK-91 Wald IV for the economic returns to schooling compares

average schooling and earnings for early- and later-quarter births
• The instrument here is Zi = 1[QOBi = 1]
996 QUARTERLY JOURNAL OF ECONOMICS

TABLE III
PANEL A: WALD ESTIMATES FOR 1970 CENSUS-MEN BORN 1920-1929a

(1) (2) (3)

Born in Born in 2nd, Difference
1st quarter 3rd, or 4th (std. error)
of year quarter of year (1) - (2)

in (wkly. wage) 5.1484 5.1574 -0.00898

(0.00301)
Education 11.3996 11.5252 -0.1256
(0.0155)
Wald est. of return to education 0.0715
(0.0219)
OLS return to education' 0.0801
(0.0004)

Panel B: Wald Estimates for 1980 Census-Men Born 1930-1939

(1) (2) (3)

Born in Born in 2nd, Difference
1st quarter 3rd, or 4th (std. error)
of year quarter of year (1) - (2)

in (wkly. wage) 5.8916 5.9027 -0.01110

(0.00274)
Education 12.6881 12.7969 -0.1088
(0.0132)
Wald est. of return to education 0.1020
(0.0239)
OLS return to education 0.0709
(0.0003)

a. The sample size is 247,199 in Panel A, and 327,509 in Panel B. Each sample consists of males born in the
United States who had positive earnings in the year preceding the survey. The 1980 Census sample is drawn
from the 5 percent sample, and the 1970 Census sample is from the State, County, and Neighborhoods 1 percent
samples.
b. The OLS return to education was estimated from a bivariate regression of log weekly earnings on years of
education.

estimate in this case because unobserved earnings determinants

(e.g., ability) are likely to be uniformly distributed across people
born on different dates of the year.15
The last row of each panel in Table III provides the OLS
Two-Stage Least Squares (2SLS)
15. We note that our procedure will slightly understate the return to education
because first-quarter births, whose birthdays occur midterm, are more likely to
attend some schooling beyond their last year completed. Consequently, the differ-
• We do IV by doing 2SLS
ence in years of school attended between first and later quarters of birth is less than
the difference in years of school completed. Since the difference in completed
education rather than the difference in years of school attended appears in the
denominator of the Wald estimator, our estimate is biased downward. In practice,
• This accommodates covariates (controls, Xi ) and multiple
however, this is a small bias because the difference in completion rates is small.

instruments (dummy or otherwise):

This content downloaded from [Link] on Fri, 17 May 2019 [Link] UTC
All use subject to [Link]
yi = a0 Xi + rsi + h i , (5)

The first stage and reduced form are

si = Xi0 p 10 + p 11
0
zi + x 1i = ŝi + x 1i (6)
0 0
yi = Xi p 20 + p 21 zi + x 2i (7)
• The 2SLS "second stage" is obtained by substituting (6) into (5):

yi = a0 Xi + r[Xi0 p 10 + p 11
0
zi ] + rx 1i + h i (8)
= a0 Xi + rŝi + rx 1i + h i
= a0 Xi + rŝi + x 2i
2SLS Notes
• 2SLS subs ŝi for si in (5):

yi = a0 Xi + rŝi + [h i + rx 1i ], (9)

• Because ŝi and x 2i are uncorrelated, OLS estimation of (9) identifies r

• In practice, let Stata ivregress do it
• Likewise, we get the RF this way

yi = a0 Xi + r[Xi0 p 10 + p 11 0
zi ] + rx 1i + h i (10)
0 0
= Xi [a + rp 10 ] + rp 11 zi + [rx 1i + h i ]
= Xi0 p 20 + p 21
0
zi + x 2i
• 2SLS implicitly computes the ratio of RF to 1st for each IV:
p 21
=r
p 11
In old-school SEMs, the sample analog of this ratio is an Indirect
Least Squares estimator of r

2SLS in AK-91

• We now see that it’s the QOB × YOB first stage and reduced form
that are plotted in Figure 4.1.1
• The corresponding 2SLS estimates appear in Table 4.1.1

• 2SLS matches the QOB earnings pattern (RF) to the QOB pattern in
schooling (first stage):
p 21 = rp 11
The key p-p-p-pattern here is ...

• Covariates include year-of-birth and state-of-birth dummies, as well as

linear and quadratic functions of age in quarters
2SLS is a many-splendored thing
• 2SLS is IV where the instrument is ŝi∗ , the residual from a regression
of ŝi on Xi :
Cov (yi , ŝi∗ ) Cov (yi , ŝi∗ )
=
V (ŝi∗ ) Cov (si , ŝi∗ )
• One-instrument 2SLS is IV where the instrument is z̃i , the residual
from a regression of zi on the covs, Xi :
Cov (yi , ŝi∗ ) Cov (yi , z̃i )
=
V (ŝi∗ ) Cov (si , z̃i )
• One-instrument 2SLS is ILS:
Cov (yi , ŝi∗ ) Cov (yi , ŝi∗ )
=
V (ŝi∗ ) Cov (si , ŝi∗ )
Cov (yi , z̃i ) p 21
= =
Cov (si , z̃i ) p 11
• Over-identified 2SLS is a weighted average of these just-identified
IV=ILS estimates (MHE 4.5.1)

2SLS Mistakes

• 2SLS . . . so simple a fool can do it . . .

• and many do!

• What can go wrong?

• As explained in MHE 4.6.1, three mistakes stubbornly persist:

• Manual 2SLS
• Covariate ambivalence
• Forbidden regressions (from the left and the right)

• These are the bitter fruit of attempts to "improve" upon orthodox

2SLS protocols

• 2SLS is already awesome: let Stata do it!

Group Work

Wald Serves in Vietnam

• Key variables

zi = randomly assigned draft-eligibility in 1970-72 draft lotteries

di = a dummy indicating Vietnam-era veterans
yi = earnings after service
• The causal e§ect of Vietnam-era military service is the draft-eligibility
RF divided by the draft-eligibility first stage
• di is also a dummy, so the first stage is a di§ in probs:

Cov (di , zi )
= E [ d i | zi = 1 ] − E [ d i | zi = 0 ]
V ( zi )
= P [ d i = 1 | zi = 1 ] − P [ d i = 1 | zi = 0 ]

• Angrist (1990), Figures 1-2 and MHE Table 4.1.3

• Updated: Angrist, Chen, and Song (2011)
Multiple groups and 2SLS

• More to the draft lottery than draft-eligibility: Angrist and Chen

(2011), Figure 1
• Let ri = j 2 {1, ..., J } denote lottery numbers. Draft-eligibility Wald
uses 1[ri < 195] as an instrument in a just-identified setup
• Using fine-grained info on ri , we have

E [yi |ri ] = a + rP [di = 1|ri ], (11)

since P [di = 1|ri ] = E [di |ri ]. So we can estimate r by fitting:

ȳj = a + rp̂j + h̄ j ; j = 1, ..., J (12)

• E¢cient GLS for this grouped constant-e§ects linear model is

weighted least squares, weighted by V (h̄ j )
s2
• V (h̄ j ) = nh under homoskedasticity
j

Visual IV, Grouping, and GLS

• Equation (12) in action: Angrist (1990), Figure 3.

• This illustrates visual instrumental variables (VIV)

• GLS applied to equation (12) is 2SLS

• The instruments here are lottery-number indicators. Define
Zi ≡ {rji = 1[ri = j ]; j = 1, ..., J − 1}
• The first stage for di on Zi plus a constant is saturated, so fitted
values are cond. means, p̂j , repeated nj times for each j
• The second stage slope estimate is therefore weighted least squares on
the grouped equation, (12), weighted by nj
• Because GLS is e¢cient, 2SLS is also the e¢cient linear combination
of the underlying just-identified IV (Wald) estimates
• That’s why we call Figure 3 "VIV"
• Sargan/Hansen overid tests the fit of this line

• Fig. 3 also illustrates two-sample IV : ȳj from one smpl, p̂j from
another (details in AK 1992, 1995; Inoue and Solon, 2010)
There’s Weakness in Numbers
(of instruments)

2SLS is Biased, Yo
• OLS estimates are unbiased and consistent for the corresponding pop
reg (maybe not the reg you want, but nicely estimated)
• 2SLS estimates are consistent for causal FX but biased
• Endogenous var. is vector x; dep. var. is vector y ; no covs:

y = bx + h (13)

The N ×q matrix of instruments is Z , with first-stage

x = Zp + x (14)

Outcome error h i is correlated with x i . Instruments are uncorrelated

with x i by construction and with h i by assumption
• The 2SLS estimator is
" # −1 0 " # −1 0
b
b2SLS = x 0 PZ x x PZ y = b + x 0 PZ x x PZ h

where PZ = Z (Z 0 Z )−1 Z 0 produces fitted values

Bias and First-stage F

• A Bekker (1994) approximation generates:

shx 1
E [b
b2SLS − b] ≈ (15)
s2x F + 1

where " #
F ≡ (1/s2x )E p 0 Z 0 Z p /q
is the "population first stage F"
s
• As F gets small, the bias of 2SLS approaches hx2
sx
s s
• The bias of the OLS estimator is hx , which also equals shx2 if p = 0
s2x x

• 2SLS estimates are therefore said to be "biased towards OLS

estimates" when the first stage is weak
• The bias of 2SLS vanishes as F increases, as it should when p 6= 0
and sample size grows

First-stage F (cont.)

• Bias grows as the number of instruments grows (if the instruments

are weak)
• Adding instruments with no e§ect on the first-stage R-squared, the
model sum of squares, E (p 0 Z 0 Z p ), and the residual variance, s2x , are
fixed while q increases
• From this we learn that the addition of weak instruments decreases F
and therefore increases bias

• Holding the first-stage sum of squares fixed, bias is least in the

just-ID case when the number of instruments is as low as it can get

• 2SLS bias is a consequence of first-stage estimation error. We’d like

to use b
xpop = Z p as an instrument since these fits are uncorrelated
with second stage resids
• In practice, we use b
x = PZ x = Z p + PZ x
• 2SLS bias arises from the corr between PZ x and h
IV Without Bias or Tears
• The reduced form is unbiased: if the relationship you’re after is
invisible in the reduced form, then it ain’t there!
• In just-identified models, the p-value for the reduced-form e§ect of the
instrument is approximately the p-value from the second stage
• Chernozhukov and Hansen (2008) develop reduced-form-based
inference for over-identified models

• LIML is approximately median-unbiased for constant-e§ects (but

beware heteroskedasticity)
• Just-identified 2SLS is approximately unbiased (proof:
just-ID=LIML)
• The just-ID and LIML sampling distributions have no o¢cial moments,
yet their medians are where they should be
• Split-sample IV (SSIV) and jackknife IV (JIVE) are Bekker-unbiased
(Angrist and Krueger 1995; Angrist, Imbens, and Krueger 1999)
• Updates include Hausman, Newey, Woutersen, Chao, and Swanson
(2012), many others

Monte Carlo for Many-Weak

yi = bxi + h i
q
xi = Â pj zij + x i
j =1

with b = 1, p 1 = 0.1, p j = 0 8j > 1, joint normal errors with

corr (h i , x i ) = .8, where the instruments, zij , are independent, standard
normals. The sample size is 1000.
• Figure 4.6.1: OLS, just identified IV (q=1, labeled IV; F=11.1),
2SLS (q=2, labeled 2SLS; F=6.0), LIML (q=2)
• Figure 4.6.2: OLS, 2SLS, and LIML with q=20 (1 good instrument,
19 worthless; F=1.51)
• Figure 4.6.3: OLS, 2SLS, and LIML with q=20 but p j = 0;
j = 1, ..., 20 (all 20 worthless; F=1.0)
• Quarter of birth estimates of the returns to schooling (reprise):
Table 4.6.2
Welcome to the Machine

New Models and Methods

• Belloni, Chen, Chernozhukov, and Hansen (2012) use machine
learning to pick a few instruments when you’re blessed w/an
abundance thereof
• The leading ML method here is lasso, a type of "regularized
regression", minimizing
l
min E [Yi − Â bj xj ]2 +
nÂ
|bj | (16)
{b j } j j
| {z } | {z }
squared error penalty term

where l is a user-chosen penalty

• Lasso favors lower-dimensional "sparse" models and small coe¢cients
• The absolute value inside the penalty term causes lasso to drop some
regressors, while shrinking others
• Post-lasso runs conventional OLS on the regressors lasso retains
• BCH (2012) discuss the theory behind a post-lasso 2SLS first stage
• Sounds promising!
What have we found? The same old fears . . .

• Sims based on AK91 with 180 instruments (QOBYOB; QOBPOB)

and even 1530 (QOB*YOB*POB) show that LIML and SSIV beat
ML for bias and MAE
• Lasso for instrument selection faces two challenges
• 2SLS is (still) biased, yo
• 2SLS w/a lassoed first stage is pretesting

• Details
• The good behavior of lasso is predicated on the assumptions of
"approximate sparsity," which implies the sample grows relative to the
number of first-stage parameters
• The Bekker sequence reveals the finite sample behavior of 2SLS, SSIV,
LIML etc. by fixing the number of obs/parameter; Bekker isn’t sparse
• Hall, et al. (1996) show the dangers of test-based solutions to the
weak instruments problem (Andrews, Stock, and Sun 2019 update this)
• Better to use a Bekker-unbiased estimator from the get-go (Angrist
and Frandsen 2019)

Tables and Figures

- -

A Average Education by Quarter of Birth (first stage)

A.
13.2
4

13.1 4
2 2
4 3
3
13 4
3 1 3
3
12 9
12.9 1

Years of Education
4 2 4 2
12.8 3 2
4 2 1
4 3
3
3
12.7 4 1
4 2
1 1
12.6
A.3 Average Education2 by Quarter
1
of Birth (first stage)
12.5 2 1 2 1

12.4

12.3
1
12.2
30 31 32 33 34 35 36 37 38 39

Year of Birth

B. Average Weekly Wage by Quarter of Birth (reduced form)

5.94

5.93

B. Average Weekly Wage by Quarter of Birth (reduced form)

5.92 4
3 3
arnings

3 4 3 4
3 3 4
5.91 3 4 3 2 3 4
4 4 2
2
Log Weekly Ea

2 4
5.9 1 2
2 4 3
2 1 2
2
5.89 1 1
1 1
1 2 1
5.88 1
1

5.87

5.86
30 31 32 33 34 35 36 37 38 39

Year of Birth

Figure 4.1.1: Graphical depiction of Örst stage and reduced form for IV estimates of the economic return to
schooling using quarter of birth (from Angrist and Krueger 1991).

Table 4.1.1
2SLS estimates of the economic returns to schooling
“38332_Angrist” — 10/18/2008 — 17:36 — page 124

OLS 2SLS
(1) (2) (3) (4) (5) (6) (7) (8)
Years of education .071 .067 .102 .13 .104 .108 .087 .057
(.0004) (.0004) (.024) (.020) (.026) (.020) (.016) (.029)
Exogenous Covariates
Age (in quarters) !
Age (in quarters) squared !
9 year-of-birth dummies ! ! ! ! !
50 state-of-birth dummies ! ! ! ! !
Instruments
dummy for QOB = 1 ! ! ! ! ! !
dummy for QOB = 2 ! ! ! !
dummy for QOB = 3 ! ! ! !
QOB dummies interacted with ! !
year-of-birth dummies
(30 instruments total)
Notes: The table reports OLS and 2SLS estimates of the returns to schooling using the Angrist and Krueger (1991)
1980 census sample. This sample includes native-born men, born 1930–39, with positive earnings and nonallocated
values for key variables. The sample size is 329,509. Robust standard errors are reported in parentheses. QOB denotes
quarter of birth.
-

Table 4.1.3

IV Estimates of the Effects of Military Service on the Earnings of White Men born in 1950

Earnings Veteran Status Wald

Estimate of
Earnings Mean Eligibility Inelig. Eligibility Veteran
year Effect Mean Effect Effect

(1) (2) (3) (4) (5)

1981 16,461 -435.8 .182 .159 -2,741
(210.5) (. 040) (1,324)
1971 3,338 -325.9 -2050
(46.6) (293)
1969 2,299 -2.0
(34.5)

Note: Adapted from Table 5 in Angrist and Krueger (1999) and author tabulations. Standard errors are shown in
parentheses. Earnings data are from Social Security administrative records. Figures are in nominal dollars. Veteran
status data are from the Survey of Program Participation. There are about 13,500 individuals in the sample.
-

0.15

0.1

0.05

‐0.05

‐0.1

‐0.15

‐0.2

‐0.25 estimate
estimate + 1.96*se
‐0.3
estimate ‐ 1.96*se
‐0.35
1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006

Figure 1. Draft‐lottery Estimates of Vietnam‐era Service Effects on ln(Earnings) for White Men Born 1950‐52

102 AMERICAN ECONOMIC JOURNAL: APPLIED ECONOMICS APRIL 2011

Panel A. Whites

0.45 Year of Birth

1950 1951

1952 1953
P(Veteran|RSN)

0.35

0.25

0.15

0.05
1 50 100 150 200 250 300 365
RSN
Panel B. Nonwhites

0.45 Year of Birth

1950 1951

1952 1953
P(Veteran|RSN)

0.35

0.25

0.15

0.05
1 50 100 150 200 250 300 365
RSN

Figure 1. The Conditional Probability of Military Services by Random Sequence Number

Notes: This figure plots the probability of Vietnam-era military service against draft lottery numbers. Data are from
the 2000 census.
-

4.7. APPENDIX 115

1
.75
.5
.25
0

0 .5 1 1.5 2 2.5

OLS IV
2SLS LIML

Figure 4.6.1: Distribution of the OLS, IV, 2SLS, and LIML estimators. IV uses one instrument, while 2SLS
and LIML use two instruments.

V (si )
where ! ! V (si )! V (S j )
: Solving for # 1 , we have
116 CHAPTER 4. INSTRUMENTAL VARIABLES IN ACTION
-

116 CHAPTER 4. INSTRUMENTAL VARIABLES IN ACTION

1
.75
.5
.25
0

0 .5 1 1.5 2 2.5

OLS 2SLS
LIML

Figure 4.6.2: Distribution of the OLS, 2SLS, and LIML estimators with 20 instruments

-
1
.75
.5
.25
0

0 .5 1 1.5 2 2.5

OLS 2SLS
Figure 4.6.3: Distribution of the OLS, 2SLS, and LIML estimators with 20 worthless instruments
LIML

Figure 4.6.3: Distribution of the OLS, 2SLS, and LIML estimators with 20 worthless instruments
-
214 Chapter 4

Table 4.6.2
Alternative IV estimates of the economic returns to schooling
(1) (2) (3) (4) (5) (6)
2SLS .105 .435 .089 .076 .093 .091
(.020) (.450) (.016) (.029) (.009) (.011)
LIML .106 .539 .093 .081 .106 .110
(.020) (.627) (.018) (.041) (.012) (.015)
F-statistic 32.27 .42 4.91 1.61 2.58 1.97
(excluded instruments)
Controls
Year of birth ! ! ! ! ! !
State of birth ! !
Age, age squared ! ! !
Excluded instruments
Quarter-of-birth dummies ! !
Quarter of birth*year of birth ! ! ! !
Quarter of birth*state of birth ! !
Number of excluded instruments 3 2 30 28 180 178
Notes: The table compares 2SLS and LIML estimates using alternative sets of instru-
ments and controls. The age and age squared variables measure age in quarters. The OLS
estimate corresponding to the models reported in columns 1–4 is .071; the OLS estimate
corresponding to the models reported in columns 5 and 6 is .067. Data are from the Angrist
and Krueger (1991) 1980 census sample. The sample size is 329,509. Standard errors are
reported in parentheses.

The first column in the table reports 2SLS and LIML esti-
-
mates of a model using three quarter-of-birth dummies as
instruments, with year-of-birth dummies as covariates. The
OLS estimate for180this specification is 0.071, while
Instruments (QOB*YOB; POB*YOB; Average F=2.5)
the 2SLS
1530 Instruments (QOB*YOB*POB; Average F=1.7)
estimate is a [Link], IVs atStandard
0.105. MedianThe first-stage
abs. Median abs. Avg. IVs F-statistic Standard isMedian abs. Median abs.
Estimator
over 32, well out (1) of (2)
retained
the danger
Bias deviation
(3) zone. (5)Not surprisingly,
dev.
(4)
error retained
(6)
Bias
(7) (8) the (9)
deviation dev. error
(10)

OLS LIML estimate is almost 0.107 identical

0.0004 to
0.0003 2SLS
0.1070 in this case.
2SLS
Post-lasso IV (CV penalty)
Angrist and Krueger
180
74.0
0.0403
0.0390
(1991)
0.0108
0.0120
experimented
0.0075
0.0082
0.0397
0.0384
with
1530
99.0
models
0.0611
0.0559
that 0.0032
0.0046
0.0084 0.0059
0.0611
0.0560
Post-lasso IV (plug-in include age and age
penalty, IVs selected)* 2.1 squared
0.0143 measured
0.0346 0.0218 in0.0279
quarters 1.6 as0.0149
additional
0.0367 0.0224 0.0271

Split-Sample IV
controls. These 63.1
Post-lasso SSIV (CV penalty)
controls
180 -0.0009
-0.0015
are
0.0237
0.0258
meant
0.0158
0.0172
to0.0173
pick 63.0
0.0158
up omitted
1530 -0.0001
-0.0013
age 0.0183 0.0115
0.0164
0.0280
0.0112
0.0183
effects
Post-lasso SSIV (plug-in that might2.1 confound
penalty, IVs selected)** -0.0724 the quarter-of-birth
1.3168 0.0274 0.0287 3.4 instruments.
0.0197 0.0504 0.0228 0.0292
Post-lasso ( IV choice split only, CV penalty) 63.1 0.0429 0.0144 0.0097 0.0431 63.0 0.0460 0.0141 0.0093 0.0459

LIML
The addition of 180 age -0.0016
and age 0.0185
squared
0.0123
reduces1530the-0.0034
0.0124
number 0.0117
of 0.0079 0.0083
instruments to two,
Post-lasso LIML (CV penalty) 74.0 since 0.0152
0.0222 age in0.0102 quarters, 0.0220 year 99.0 of0.0484 birth,0.0094and 0.0066 0.0483
Post-lasso LIML (plug-in penalty, IVs selected)* 2.1 0.0126 0.0347 0.0221 0.0273 1.6 0.0138 0.0366 0.0221 0.0257
quarter
Pretested LIML (t => 3.12 offorbirth
for 180, t=>2.3 1530) are
18 linearly
0.0222 dependent.
0.0236 0.0148 As shown
0.0238 153 in0.0385
column 0.0163 2, 0.0111 0.0393

the
Random forest first stage, 2SLSfirst-stage
using RF fits as instruments (min leaf size=1)drops to 0.4 when age and age
F-statistic 0.0611 squared
0.0047 0.0030 0.0612 −1
are included as controls, a sure sign of trouble. But
Random forest 2SLS, min leaf size = 800
Random forest first stage, SSIV using RF fits as instruments (min leaf size =1)
0.0567
-0.0003
the 0.0158
2SLS 0.0109 0.0567
0.0065 0.0045
0.0108 0
Random forest SSIV, min leaf size = 800 -0.0005 0.0158 0.0104 0.0103
1
Notes: The table describes simulation results for 999 Monte Carlo estimates of the economic returns to schooling using simulated samples constructed from the
Angrist and Krueger (1991) census sample of men born 1930-39 (N=329,509). The causal effect of schooling is calibrated to 0.1; the OLS estimand is 0.207. The
instruments used to compute the estimates described by columns 1-5 consist of 30 quarter-of-birth-by-year-of-birth and 150 quarter-of-birth-by-state-of-birth
interactions (average F-stat = 2.5, average concentration parameter = 270). The instruments used to compute the estimates described by columns 6-10 are quarter-
of-birth-by-year-of-birth-by-state-of-birth interactions (average F-stat = 1.7, average concentration parameter = 1050). All models include saturated year of birth by
state of birth controls. Columns 1 and 6 report the average number of instruments retained by lasso. Post-lasso estimates are computed as described in the
appendix. Split-Sample IV uses first stage coefficients estimated in one half-sample to construct a cross-sample fitted value used for IV in the other. Sample-
splitting procedures average results from complementary splits. Post-lasso with an IV-choice split only uses post-lasso in half the sample to pick instruments, doing
2SLS with these and own-sample fitted values in the other half. "Post-lasso LIML" is LIML using the instrument set selected by a post-lasso first stage. "Pretested
LIML" estimates are computed using conventional LIML, retaining only instruments with a first-stage t-statistic in the upper decile of t-statistics for the full set of
“38332_Angrist” — 10/18/2008 — 17:36 — page 214
instruments. Simulation sets choose lasso penalties once, using the original AK91 data. Random forest routines are described in the appendix.
*The plug-in penalty generates a lasso first stage that includes no instruments in 11 simulation runs with 180 instruments and in 57 simulation runs with 1530
instruments. Statistics reported in these rows are for runs completed.
**Post-lasso SSIV with a plug-in penalty picks zero instruments in 670 of 180-instrument runs, and in 893 of 1530-instrument runs. Statistics reported in these rows
are for runs completed.

Evaluating Policies with Instrumental Variables
No ratings yet
Evaluating Policies with Instrumental Variables
32 pages
Understanding Instrumental Variables in IV-Estimation
No ratings yet
Understanding Instrumental Variables in IV-Estimation
64 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
51 pages
Stata IV Estimation for Education Analysis
No ratings yet
Stata IV Estimation for Education Analysis
11 pages
Stata IV Estimation for Education Analysis
No ratings yet
Stata IV Estimation for Education Analysis
11 pages
GMM Estimation in EViews 4 Tutorial
No ratings yet
GMM Estimation in EViews 4 Tutorial
14 pages
Compulsory Schooling and Returns To Education: A Re-Examination
No ratings yet
Compulsory Schooling and Returns To Education: A Re-Examination
20 pages
Understanding Endogeneity and IVs
No ratings yet
Understanding Endogeneity and IVs
22 pages
CH 1 Instrumental Variable Regression 2025
No ratings yet
CH 1 Instrumental Variable Regression 2025
40 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
24 pages
Econometrics Problem Set: IV and 2SLS Analysis
No ratings yet
Econometrics Problem Set: IV and 2SLS Analysis
6 pages
Instrumental Variables and 2SLS Explained
No ratings yet
Instrumental Variables and 2SLS Explained
9 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
29 pages
OLS Estimator and Variance in Regression
No ratings yet
OLS Estimator and Variance in Regression
7 pages
Estimating GMM Models in EViews Guide
No ratings yet
Estimating GMM Models in EViews Guide
12 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
7 pages
Robust Estimators and IV Testing Explained
No ratings yet
Robust Estimators and IV Testing Explained
9 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
6 pages
Instrumental Variables in Causal Inference
No ratings yet
Instrumental Variables in Causal Inference
3 pages
Instrumental Variables in Economics
No ratings yet
Instrumental Variables in Economics
27 pages
Understanding Instrumental Variables in Causal Inference
No ratings yet
Understanding Instrumental Variables in Causal Inference
48 pages
Relative Asymptotic Bias in TSLS Methods
No ratings yet
Relative Asymptotic Bias in TSLS Methods
30 pages
Instrumental Variables and Endogeneity
No ratings yet
Instrumental Variables and Endogeneity
38 pages
Cap 1-3 Hsiao Analysis of Panel Data
No ratings yet
Cap 1-3 Hsiao Analysis of Panel Data
57 pages
Understanding Endogeneity in Econometrics
No ratings yet
Understanding Endogeneity in Econometrics
18 pages
GMM Testing in Econometrics Analysis
No ratings yet
GMM Testing in Econometrics Analysis
25 pages
Bien Noi Sinh - Endogeneity
No ratings yet
Bien Noi Sinh - Endogeneity
24 pages
Endogeneity Testing in EViews 5
No ratings yet
Endogeneity Testing in EViews 5
8 pages
Wald Estimator in IV Analysis
No ratings yet
Wald Estimator in IV Analysis
5 pages
Understanding Instrumental Variables in Economics
No ratings yet
Understanding Instrumental Variables in Economics
19 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
27 pages
IV Estimation: Concepts and Methods
No ratings yet
IV Estimation: Concepts and Methods
7 pages
Instrumental Variables in Finance Analysis
No ratings yet
Instrumental Variables in Finance Analysis
92 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
62 pages
Stock Watson 3U ExerciseSolutions Chapter04 Students PDF
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter04 Students PDF
8 pages
Instrument Variable Regression Explained
No ratings yet
Instrument Variable Regression Explained
8 pages
IV Estimation of Heterogeneous Effects
No ratings yet
IV Estimation of Heterogeneous Effects
17 pages
Introduction to Panel Data Analysis
No ratings yet
Introduction to Panel Data Analysis
13 pages
Selected End of Chapter 15 Solutions
No ratings yet
Selected End of Chapter 15 Solutions
4 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
5 pages
Understanding Instrumental Variables in Causality
No ratings yet
Understanding Instrumental Variables in Causality
48 pages
Understanding Ordinary Least Squares (OLS)
No ratings yet
Understanding Ordinary Least Squares (OLS)
65 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
52 pages
Panel Data Inference: Small Groups Analysis
No ratings yet
Panel Data Inference: Small Groups Analysis
33 pages
Instrumental Variables in Causal Inference
No ratings yet
Instrumental Variables in Causal Inference
13 pages
Instrumental Variable Methods in Evaluation
No ratings yet
Instrumental Variable Methods in Evaluation
34 pages
MIT 14.32 Econometrics Final Review
No ratings yet
MIT 14.32 Econometrics Final Review
5 pages
Understanding Panel Data Analysis Techniques
No ratings yet
Understanding Panel Data Analysis Techniques
14 pages
Pooled vs Panel Data Explained
No ratings yet
Pooled vs Panel Data Explained
16 pages
Endogeneity and IV Regression Analysis
No ratings yet
Endogeneity and IV Regression Analysis
38 pages
EC312 Advanced Econometrics Outline
No ratings yet
EC312 Advanced Econometrics Outline
5 pages
Labor Supply Function Analysis and Estimation
No ratings yet
Labor Supply Function Analysis and Estimation
19 pages
Solution Manual For Introductory Econometrics A Modern Approach 4th Edition by Jeffrey M Wooldridge
No ratings yet
Solution Manual For Introductory Econometrics A Modern Approach 4th Edition by Jeffrey M Wooldridge
61 pages
Econometrics: An Empiricist's Guide
No ratings yet
Econometrics: An Empiricist's Guide
233 pages
Understanding Instrumental Variables in Econometrics
No ratings yet
Understanding Instrumental Variables in Econometrics
8 pages
Heckman Sample Selection Model Explained
No ratings yet
Heckman Sample Selection Model Explained
8 pages
Instrumental Variables in Finance Analysis
No ratings yet
Instrumental Variables in Finance Analysis
92 pages
Instrumental Variables in Econometrics
No ratings yet
Instrumental Variables in Econometrics
18 pages
Weekly ERGM Learning Plan
No ratings yet
Weekly ERGM Learning Plan
1 page
R Codes for Spatial Bootstrapping Models
No ratings yet
R Codes for Spatial Bootstrapping Models
87 pages
Weekly Coding and Reading Schedule
No ratings yet
Weekly Coding and Reading Schedule
1 page
Matrix Algebra
No ratings yet
Matrix Algebra
21 pages
Elements of Probability Theory
No ratings yet
Elements of Probability Theory
12 pages
Stata 15 Data Analysis Cheat Sheet
No ratings yet
Stata 15 Data Analysis Cheat Sheet
6 pages
Hanyu Jiaocheng 2-1 Eng PDF
57% (7)
Hanyu Jiaocheng 2-1 Eng PDF
182 pages
Elements of Dynamic Optimization PDF
No ratings yet
Elements of Dynamic Optimization PDF
173 pages
Causal Effect Identifiability in Binary Outcomes
No ratings yet
Causal Effect Identifiability in Binary Outcomes
11 pages
Discrete Choice Methods Overview
No ratings yet
Discrete Choice Methods Overview
388 pages
Impact of Industrial Policy on Employment
No ratings yet
Impact of Industrial Policy on Employment
49 pages
Dynamic GMM Estimation Insights
No ratings yet
Dynamic GMM Estimation Insights
18 pages
Digital Economy's Role in Reducing Agricultural Carbon Emissions
No ratings yet
Digital Economy's Role in Reducing Agricultural Carbon Emissions
19 pages
Political Economy Problem Set Analysis
No ratings yet
Political Economy Problem Set Analysis
3 pages
Corporate Governance and REIT Investor Info
No ratings yet
Corporate Governance and REIT Investor Info
36 pages
Nonfinancial Performance Measures and Promotion-Based Incentives
No ratings yet
Nonfinancial Performance Measures and Promotion-Based Incentives
36 pages
Applied Econometrics End-Sem Exam 2024
No ratings yet
Applied Econometrics End-Sem Exam 2024
7 pages
Islamic vs Conventional Stocks Resilience
No ratings yet
Islamic vs Conventional Stocks Resilience
16 pages
Lesson 1
No ratings yet
Lesson 1
83 pages
Two-Stage Least Squares Method Explained
No ratings yet
Two-Stage Least Squares Method Explained
6 pages
Compulsory Licensing and Domestic Invention
100% (1)
Compulsory Licensing and Domestic Invention
44 pages
Institutional Ownership's Impact on Firm Performance
No ratings yet
Institutional Ownership's Impact on Firm Performance
22 pages
Instrumental Variables and Panel Models Analysis
100% (1)
Instrumental Variables and Panel Models Analysis
4 pages
Weak Instruments in IV Regression Analysis
No ratings yet
Weak Instruments in IV Regression Analysis
30 pages
Political Corruption's Impact on Accounting
No ratings yet
Political Corruption's Impact on Accounting
65 pages
Airports and Economic Development Impact
No ratings yet
Airports and Economic Development Impact
17 pages
2022 QE Notes for Econometrics Exam
No ratings yet
2022 QE Notes for Econometrics Exam
68 pages
Linking Customer Behaviors To Cash Flow Level & Volatility: Implications For Marketing Practices
No ratings yet
Linking Customer Behaviors To Cash Flow Level & Volatility: Implications For Marketing Practices
52 pages
D.M in Bangaladesh Ss
No ratings yet
D.M in Bangaladesh Ss
34 pages
Model Specification and Data Issues
No ratings yet
Model Specification and Data Issues
13 pages
Understanding Panel Data Regression
No ratings yet
Understanding Panel Data Regression
5 pages
Media Impact on Board Gender Diversity
No ratings yet
Media Impact on Board Gender Diversity
17 pages
GMM Estimation PDF
No ratings yet
GMM Estimation PDF
35 pages
Econometrics Beat - Dave Giles' Blog - ARDL Modelling in EViews 9
No ratings yet
Econometrics Beat - Dave Giles' Blog - ARDL Modelling in EViews 9
26 pages
Alam Et. Al 2019
No ratings yet
Alam Et. Al 2019
27 pages
Impact Assessment with IV and ESR Methods
No ratings yet
Impact Assessment with IV and ESR Methods
54 pages
Human Capital Theory Overview
No ratings yet
Human Capital Theory Overview
21 pages
Difference-in-Differences in Econometrics
No ratings yet
Difference-in-Differences in Econometrics
2 pages