[Stata] Instrumental Variables Regression:
ivregress, ivreg2
Table of Contents
What are Instrumental Variables?
Stata Commands for Instrumental Variables
Instrumental Variable Regression with Stata: ivreg2
o Step 1. Specify your regression model
o Step 2. Adding Control Variables
o Step 3. Testing Endogeneity: ivendog command
o Step 4. Testing heteroskedasticity
o Reference
What are Instrumental Variables?
Ref: Filippo Pisello
Instrumental variable regression is a statistical method used when you suspect that there’s a
hidden bias affecting the relationship between your variables. It’s like having a sneaky
confounder that you can’t measure directly, but you know it’s there, messing with your
results. So, you bring in an instrumental variable—a kind of secret agent—to help you
uncover the true effect of your variable of interest.
Imagine you’re studying the effect of a new counseling program (treatment) on reducing
stress levels (outcome) among social workers. However, you suspect that those who choose to
participate might already be more motivated or less stressed, which could bias your results.
Step 1: Find Your Instrument You need an instrumental variable that’s related to the
likelihood of participating in the program but not directly related to stress levels. Let’s say
you find that social workers who live closer to the counseling center are more likely to
participate. Proximity to the center becomes your instrumental variable.
Step 2: First Stage Regression You first run a regression with the instrumental variable
(proximity) predicting the treatment (participation in the program). This gives you the
predicted values of treatment, which are free from the bias of the unmeasured confounder
(motivation or initial stress levels).
Step 3: Second Stage Regression Next, you use these predicted values from the first stage as
your ‘clean’ treatment variable to predict the outcome (stress levels). This second regression
tells you the effect of the counseling program on stress levels, without the bias introduced by
the unmeasured confounder.
Suppose you have data on social workers’ stress levels and their participation in the
counseling program. You also know how far each social worker lives from the center.
1. First Stage: You find that living closer to the center significantly predicts higher
participation.
2. Second Stage: Using the predicted participation from the first stage, you find that
participating in the counseling program leads to lower stress levels.
By using the instrumental variable of proximity, you’ve managed to isolate the effect of the
counseling program on stress levels, accounting for the potential bias of self-selection into the
program.
Stata Commands for Instrumental Variables
ivregress: ivregress is a built-in command provided by Stata for instrumental variables
regression.
ivreg2: As a user-created command, ivreg2 extends the functionality of ivregress. It
provides additional features, such as testing for endogeneity, weak instruments, and
overidentification.
ivprobit: Instrumental variables and two-stage least squares for binary outcome
ivpoisson: Instrumental variables and two-stage least squares for count outcome
xtivreg: Instrumental variables and two-stage least squares for panel-data models
o XTOVERID: Stata module to calculate tests of overidentifying restrictions
after xtreg, xtivreg, xtivreg2, xthtaylor
ivreghdfe: Extended instrumental variable regressions with multiple levels of fixed
effects
iverg2h: Stata module to perform instrumental variables estimation using
heteroskedasticity-based instruments
imperfectiv: Stata module to estimate bounds with “Imperfect Instrumental Variables”
(Nevo and Rosen, 2012)
ivmediate: Stata module to perform causal mediation analysis in instrumental-
variables regressions
SPATIAL_HAC_IV: Stata module to estimate an instrumental variable regression,
adjusting standard errors for spatial correlation, heteroskedasticity, and autocorrelation
PARIV: Stata module to perform nearly-collinear robust instrumental-variables
regression
CQIV: Stata module to perform censored quantile instrumental variables regression
Tests
IVHETTEST: Stata module to perform Pagan-Hall and related heteroskedasticity tests
after IV
TESTEX: Stata module for a statistical test of the exclusion restriction of an
instrumental variable (IV)
WEAKIV: Stata module to perform weak-instrument-robust tests and confidence
intervals for instrumental-variable (IV) estimation of linear, probit and tobit models
IVENDOG: Stata module to calculate Durbin-Wu-Hausman endogeneity test after
ivreg
IVTREATREG: Stata module to estimate binary treatment models with idiosyncratic
average effect
UNDERID: Stata module producing postestimation tests of under- and over-
identification after linear IV estimation
IVDESC: Stata module to profile compliers and non-compliers for instrumental
variable analysis
Instrumental Variable Regression with Stata: ivreg2
You can load the nlswork.dta dataset from the default Stata Press website using
the webuse command:
Stata
webuse nlswork, clear
Step 1. Specify your regression model
For example, let’s say we want to estimate the effect of years of education, instrumenting
education with mother’s education.
outcome: wages (ln_wage)
predictor: years of education (grade)
instrumental variable: mother’s education (msp)
Stata
ssc install ivreg2
ivreg2 ln_wage (grade = msp)
ln_wage (dependent variable):
o The coefficient for grade is approximately 0.2313, and it is statistically
significant (p-value < 0.001). This suggests that grade has a positive effect
on ln_wage.
Identification Tests:
o The Anderson canonical correlation LM statistic tests for
underidentification. The p-value is 0.0001, indicating that the model is not
underidentified.
o The Cragg-Donald Wald F statistic tests for weak identification. The p-value
is also 0.0001, suggesting that the instrument is not weak.
o Since the equation is exactly identified (no overidentification), the Sargan
statistic reports a p-value of 0.000.
Instrumentation:
o grade is instrumented by msp.
The basic command for ivreg2 is as follows. Please replace y with your dependent
variable, x1 with your endogenous regressor, z1 with your instrument, and x2 with other
control variables.
Stata
ivreg2 y (x1 = z1) x2
Some key options:
robust: robust standard errors
cluster(varname): clustered standard errors
first: report first-stage regression estimates
savefirst: save first-stage estimates
ffirst: use F-statistic form of first-stage output
Step 2. Adding Control Variables
To add control variables, simply include them after the dependent variable. For instance, to
control for experience and tenure:
Stata
ivreg2 ln_wage ttl_exp tenure (grade = msp)
Step 3. Testing Endogeneity: ivendog command
The Wu-Hausman F test and the Durbin-Wu-Hausman chi-sq test are used to test for
endogeneity of a regressor (in this case, the variable grade). You can perform this test by
simply putting the ivendog command (developed by Baum et al. 2007) after ivreg2 command.
Stata
ivendog
1. Wu-Hausman F Test:
o Null Hypothesis (H0): The regressor (in this case, grade) is exogenous (i.e.,
not correlated with the error term).
o The test statistic is 8.77348, and the associated p-value is 0.00306.
o Since the p-value is less than 0.05, we reject the null hypothesis.
o Interpretation: There is evidence to suggest
that grade is endogenous (correlated with the error term) in the regression
model.
2. Durbin-Wu-Hausman Chi-Square Test:
o This test is another way to assess endogeneity.
o Null Hypothesis (H0): The regressor (again, grade) is exogenous.
o The test statistic is 8.77170, and the associated p-value is also 0.00306.
o Similar to the Wu-Hausman F test, the p-value is less than 0.05, leading us to
reject the null hypothesis.
o Conclusion: The evidence supports the idea that grade is endogenous.
In summary, both tests indicate that grade is likely endogenous in your regression model.
This means that there may be omitted variables or other issues affecting the relationship
between grade and the dependent variable. Researchers often address endogeneity by using
instrumental variables or other econometric techniques.
Step 4. Testing heteroskedasticity
The Pagan-Hall general test statistic is used to test for heteroskedasticity in the context of
instrumental variables (IV) estimation. You can perform this test by simply putting the
ivhettest command (developed by Schaffer 2023) after ivreg2 command.
Stata
ssc install ivhettest, replace
ivhettest
Null Hypothesis (H0): The disturbance (error term) is homoskedastic (i.e., the
variance of the error term is constant across observations).
Since the p-value is 0.5596, which is much greater than 0.05, we fail to reject the null
hypothesis. This means that there is no statistical evidence to suggest the presence of
heteroskedasticity in the model; the assumption of homoskedasticity is not violated.
In simpler terms, the test indicates that the variance of the error terms in your IV regression
model is consistent across different levels of the instrumental variables, and there’s no need to
adjust for heteroskedasticity based on this test.
Reference
Instrumental Variables Regressions
Instrumental Variable Regression | DATA with STATA (ubc.ca)
Instrumental Variables (slides).pdf
Stata How-to: Instrumental Variables using 2SLS.pdf
andrewproctor.github.io/assets/StataSeminar4.pdf
mayoral.iae-csic.org/IV_2015/StataIV_baum.pdf
fmwww.bc.edu/EC-C/S2014/823/EC823.S2014.nn02.slides.pdf