Ermeregress
Ermeregress
com
eregress — Extended linear regression
Description
eregress fits a linear regression model that accommodates any combination of endogenous co-
variates, nonrandom treatment assignment, and endogenous sample selection. Continuous, binary, and
ordinal endogenous covariates are allowed. Treatment assignment may be endogenous or exogenous.
A probit or tobit model may be used to account for endogenous sample selection.
xteregress fits a random-effects linear regression model that accommodates endogenous covari-
ates, treatment, and sample selection in the same way as eregress and also accounts for correlation
of observations within panels or within groups.
Quick start
Regression of y on x with continuous endogenous covariate y2 modeled by x and z
eregress y x, endogenous(y2 = x z)
As above, but adding continuous endogenous covariate y3 modeled by x and z2
eregress y x, endogenous(y2 = x z) endogenous(y3 = x z2)
Regression of y on x with binary endogenous covariate d modeled by x and z
eregress y x, endogenous(d = x z, probit)
Regression of y on x with endogenous treatment recorded in trtvar and modeled by x and z
eregress y x, entreat(trtvar = x z)
Regression of y on x with exogenous treatment recorded in trtvar
eregress y x, extreat(trtvar)
Random-effects regression of y on x using xtset data
xteregress y x
Regression of y on x with endogenous sample-selection indicator selvar modeled by x and z
eregress y x, select(selvar = x z)
As above, but adding endogenous covariate y2 modeled by x and z2
eregress y x, select(selvar = x z) endogenous(y2 = x z2)
As above, but adding endogenous treatment recorded in trtvar and modeled by x and z3
eregress y x, select(selvar = x z) endogenous(y2 = x z2) ///
entreat(trtvar = x z3)
As above, but with random effects and without endogenous treatment
xteregress y x, select(selvar = x z) endogenous(y2 = x z2)
1
2 eregress — Extended linear regression
Menu
eregress
Statistics > Endogenous covariates > Models adding selection and treatment > Linear regression
xteregress
Statistics > Longitudinal/panel data > Endogenous covariates > Models adding selection and treatment > Linear
regression (RE)
Syntax
Basic linear regression with endogenous covariates
eregress depvar indepvars , endogenous(depvarsen = varlisten ) options
Linear regression combining random effects, endogenous covariates, treatment, and selection
xteregress depvar indepvars if in , extensions options
eregress — Extended linear regression 3
extensions Description
Model
endogenous(enspec) model for endogenous covariates; may be repeated
entreat(entrspec) model for endogenous treatment assignment
extreat(extrspec) exogenous treatment
select(selspec) probit model for selection
tobitselect(tselspec) tobit model for selection
options Description
Model
noconstant suppress constant term
offset(varnameo ) include varnameo in model with coefficient constrained to 1
constraints(numlist) apply specified linear constraints
SE/Robust
vce(vcetype) vcetype may be oim, robust, cluster clustvar, opg, bootstrap,
or jackknife
Reporting
level(#) set confidence level; default is level(95)
nocnsreport do not display constraints
display options control columns and column formats, row spacing, line width,
display of omitted variables and base and empty cells, and
factor-variable labeling
Integration
intpoints(#) set the number of integration (quadrature) points for integration over
four or more dimensions; default is intpoints(128)
triintpoints(#) set the number of integration (quadrature) points for integration over
three dimensions; default is triintpoints(10)
reintpoints(#) set the number of integration (quadrature) points for
random-effects integration; default is reintpoints(7)
reintmethod(intmethod) integration method for random effects; intmethod may be
mvaghermite (the default) or ghermite
Maximization
maximize options control the maximization process; seldom used
collinear keep collinear variables
coeflegend display legend instead of statistics
enspec is depvarsen = varlisten , enopts
where depvarsen is a list of endogenous covariates. Each variable in depvarsen specifies an
endogenous covariate model using the common varlisten and options.
entrspec is depvartr = varlisttr , entropts
where depvartr is a variable indicating treatment assignment. varlisttr is a list of covariates
predicting treatment assignment.
4 eregress — Extended linear regression
extrspec is tvar , extropts
where tvar is a variable indicating treatment assignment.
selspec is depvars = varlists , selopts
where depvars is a variable indicating selection status. depvars must be coded as 0, indicating
that the observation was not selected, or 1, indicating that the observation was selected. varlists
is a list of covariates predicting selection.
tselspec is depvars = varlists , tselopts
where depvars is a continuous variable. varlists is a list of covariates predicting depvars . The
censoring status of depvars indicates selection, where a censored depvars indicates that the
observation was not selected and a noncensored depvars indicates that the observation was
selected.
enopts Description
Model
probit treat endogenous covariate as binary
oprobit treat endogenous covariate as ordinal
povariance estimate a different variance for each level of a binary or an ordinal
endogenous covariate
pocorrelation estimate different correlations for each level of a binary or an ordinal
endogenous covariate
nomain do not add endogenous covariate to main equation
nore do not include random effects in model for endogenous covariate
noconstant suppress constant term
nore is available only with xteregress.
entropts Description
Model
povariance estimate a different variance for each potential outcome
pocorrelation estimate different correlations for each potential outcome
nomain do not add treatment indicator to main equation
nointeract do not interact treatment with covariates in main equation
nore do not include random effects in model for endogenous treatment
noconstant suppress constant term
offset(varnameo ) include varnameo in model with coefficient constrained to 1
nore is available only with xteregress.
extropts Description
Model
povariance estimate a different variance for each potential outcome
pocorrelation estimate different correlations for each potential outcome
nomain do not add treatment indicator to main equation
nointeract do not interact treatment with covariates in main equation
eregress — Extended linear regression 5
selopts Description
Model
nore do not include random effects in selection model
noconstant suppress constant term
offset(varnameo ) include varnameo in model with coefficient constrained to 1
nore is available only with xteregress.
tselopts Description
Model
∗
ll(varname | #) left-censoring variable or limit
∗
ul(varname | #) right-censoring variable or limit
main add censored selection variable to main equation
nore do not include random effects in tobit selection model
noconstant suppress constant term
offset(varnameo ) include varnameo in model with coefficient constrained to 1
∗
You must specify either ll() or ul().
nore is available only with xteregress.
indepvars, varlisten , varlisttr , and varlists may contain factor variables; see [U] 11.4.3 Factor variables.
depvar, indepvars, depvarsen , varlisten , depvartr , varlisttr , tvar, depvars , and varlists may contain time-series
operators; see [U] 11.4.4 Time-series varlists.
bootstrap, by, collect, jackknife, and statsby are allowed with eregress and xteregress. rolling and
svy are allowed with eregress. See [U] 11.1.10 Prefix commands.
Weights are not allowed with the bootstrap prefix; see [R] bootstrap.
vce() and weights are not allowed with the svy prefix; see [SVY] svy.
fweights, iweights, and pweights are allowed with eregress; see [U] 11.1.6 weight.
reintpoints() and reintmethod() are available only with xteregress.
collinear and coeflegend do not appear in the dialog box.
See [U] 20 Estimation and postestimation commands for more capabilities of estimation commands.
Options
Model
endogenous(enspec), entreat(entrspec), extreat(extrspec), select(selspec),
tobitselect(tselspec); see [ERM] ERM options.
noconstant, offset(varnameo ), constraints(numlist); see [R] Estimation options.
SE/Robust
vce(vcetype); see [ERM] ERM options.
Reporting
level(#), nocnsreport; see [R] Estimation options.
display options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels,
allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(% fmt), pformat(% fmt),
sformat(% fmt), and nolstretch; see [R] Estimation options.
6 eregress — Extended linear regression
Integration
intpoints(#), triintpoints(#), reintpoints(#), reintmethod(intmethod); see [ERM] ERM
options.
Maximization
maximize options: difficult, technique(algorithm spec), iterate(#), no log, trace,
gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#),
nrtolerance(#), nonrtolerance, and from(init specs); see [R] Maximize.
The default technique for eregress is technique(nr). The default technique for xteregress
is technique(bhhh 10 nr 2).
Setting the optimization type to technique(bhhh) resets the default vcetype to vce(opg).
The following options are available with eregress and xteregress but are not shown in the dialog
box:
[ERM] Intro 6 covers random-effects models for panel data and other grouped data. It discusses
xteregress and the other ERM commands for panel data.
[ERM] Intro 7 discusses interpretation of results. You can interpret coefficients from eregress
and xteregress in the usual way, but this introduction goes beyond the interpretation of
coefficients. We demonstrate how to find answers to interesting questions by using margins. If
your model includes an endogenous covariate or an endogenous treatment, the use of margins
differs from its use after other estimation commands, so we strongly recommend reading this
intro if you are fitting these types of models.
[ERM] Intro 8 will be helpful if you are familiar with heckman, ivregress, etregress,
xtreg, or xtivreg and other commands that address endogenous covariates, sample selection,
nonrandom treatment assignment, or random effects. This introduction is a Rosetta stone that
maps the syntax of those commands to the syntax of eregress and xteregress.
[ERM] Intro 9 walks you through an example that gives insight into the concepts of endogenous
covariates, treatment assignment, and sample selection while fitting models with eregress
that address these complications. This intro also demonstrates how to interpret results by using
margins and estat teffects.
Additional examples are presented in [ERM] Example 1a–[ERM] Example 9. For examples using
eregress, see
[ERM] Example 1a Linear regression with continuous endogenous covariate
[ERM] Example 2a Linear regression with binary endogenous covariate
[ERM] Example 2b Linear regression with exogenous treatment
[ERM] Example 2c Linear regression with endogenous treatment
For examples using xteregress, see
[ERM] Example 7 Random-effects regression with continuous endogenous covariate
[ERM] Example 8a Random-effects regression with constraint and endogenous covariate
[ERM] Example 8b Random-effects, endogenous covariate, and endogenous sample selection
See Examples in [ERM] Intro for an overview of all the examples. All examples may be interesting
because they handle complications in the same way.
eregress and xteregress fit many models discussed in the literature. For example, eregress
can fit the linear regression model with endogenous sample selection (Heckman 1976), the linear
regression model with an endogenous treatment (Heckman 1978; Maddala 1983), and the linear
regression model with a tobit selection equation (Amemiya 1985; Wooldridge 2010, sec. 19.7).
eregress also supports the linear regression model with endogenous regressors and endogenous
sample selection discussed in Wooldridge (2010, sec 19.6) along with the tobit selection regression
with endogenous regressors discussed in Wooldridge (2010, sec 19.7).
For panel data, xteregress can fit the linear regression model with random effects discussed in
Baltagi (2013, chap. 2) and Wooldridge (2020, chap. 14). The xteregress command can also fit the
linear regression model with an endogenous treatment and random effects discussed in Drukker (2016)
and the linear regression model with random effects and endogenous covariates discussed in Balt-
agi (2013). Roodman (2011) investigated linear regression models with endogenous covariates and
endogenous sample selection and demonstrated how multiple observational data complications could
be addressed with a triangular model structure. He and Tamás Bartus showed how random effects
could be used in the triangular model structure in Bartus and Roodman (2014). Roodman’s work has
been used to model processes like the effect of aphid infestations and virus outbreaks on crop yields
(Elbakidze, Lu, and Eigenbrode 2011) and the effect of calorie intake per day on food security in
poor neighborhoods (Maitra and Rao 2014).
8 eregress — Extended linear regression
Stored results
eregress stores the following in e():
Scalars
e(N) number of observations
e(N selected) number of selected observations
e(N nonselected) number of nonselected observations
e(k) number of parameters
e(k cat#) number of categories for the #th depvar, ordinal
e(k eq) number of equations in e(b)
e(k eq model) number of equations in overall model test
e(k dv) number of dependent variables
e(k aux) number of auxiliary parameters
e(df m) model degrees of freedom
e(ll) log likelihood
e(N clust) number of clusters
e(chi2) χ2
e(p) p-value for model test
e(n quad) number of integration points for multivariate normal
e(n quad3) number of integration points for trivariate normal
e(rank) rank of e(V)
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) eregress
e(cmdline) command as typed
e(depvar) names of dependent variables
e(tsel ll) left-censoring limit for tobit selection
e(tsel ul) right-censoring limit for tobit selection
e(wtype) weight type
e(wexp) weight expression
e(title) title in estimation output
e(clustvar) name of cluster variable
e(offset#) offset for the #th depvar, where # is determined by equation order in output
e(chi2type) Wald; type of model χ2 test
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. err.
e(opt) type of optimization
e(which) max or min; whether optimizer is to perform maximization or minimization
e(ml method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(properties) b V
e(estat cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
Matrices
e(b) coefficient vector
e(cat#) categories for the #th depvar, ordinal
e(Cns) constraints matrix
e(ilog) iteration log (up to 20 iterations)
e(gradient) gradient vector
e(V) variance–covariance matrix of the estimators
e(V modelbased) model-based variance
Functions
e(sample) marks estimation sample
eregress — Extended linear regression 9
Note that results stored in r() are updated when the command is replayed and will be replaced when
any r-class command is run after the estimation command.
xteregress stores the following in e():
Scalars
e(N) number of observations
e(N g) number of groups
e(N selected) number of selected observations
e(N nonselected) number of nonselected observations
e(k) number of parameters
e(k cat#) number of categories for the #th depvar, ordinal
e(k eq) number of equations in e(b)
e(k eq model) number of equations in overall model test
e(k dv) number of dependent variables
e(k aux) number of auxiliary parameters
e(df m) model degrees of freedom
e(ll) log likelihood
e(N clust) number of clusters
e(chi2) χ2
e(p) p-value for model test
e(n quad) number of integration points for multivariate normal
e(n quad3) number of integration points for trivariate normal
e(n requad) number of integration points for random effects
e(g min) smallest group size
e(g avg) average group size
e(g max) largest group size
e(rank) rank of e(V)
e(ic) number of iterations
e(rc) return code
e(converged) 1 if converged, 0 otherwise
Macros
e(cmd) xteregress
e(cmdline) command as typed
e(depvar) names of dependent variables
e(tsel ll) left-censoring limit for tobit selection
e(tsel ul) right-censoring limit for tobit selection
e(ivar) variable denoting groups
e(title) title in estimation output
e(clustvar) name of cluster variable
e(offset#) offset for the #th depvar, where # is determined by equation order in output
e(chi2type) Wald; type of model χ2 test
e(vce) vcetype specified in vce()
e(vcetype) title used to label Std. err.
e(reintmethod) integration method for random effects
e(opt) type of optimization
e(which) max or min; whether optimizer is to perform maximization or minimization
e(ml method) type of ml method
e(user) name of likelihood-evaluator program
e(technique) maximization technique
e(properties) b V
e(estat cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsok) predictions allowed by margins
e(marginsnotok) predictions disallowed by margins
e(asbalanced) factor variables fvset as asbalanced
e(asobserved) factor variables fvset as asobserved
10 eregress — Extended linear regression
Matrices
e(b) coefficient vector
e(cat#) categories for the #th depvar, ordinal
e(Cns) constraints matrix
e(ilog) iteration log (up to 20 iterations)
e(gradient) gradient vector
e(V) variance–covariance matrix of the estimators
e(V modelbased) model-based variance
Functions
e(sample) marks estimation sample
Note that results stored in r() are updated when the command is replayed and will be replaced when
any r-class command is run after the estimation command.
Introduction
A linear regression of outcome yi on covariates xi may be written as
yi = xi β + i
where the error i is normal with mean 0 and variance σ 2 . The log likelihood is
N
X
wi ln φ yi − xi β, σ 2
lnL =
i=1
eregress — Extended linear regression 11
If you are willing to take our word for some derivations and notation, the following is complete.
Longer explanations and derivations for some terms and functions are provided in Methods and
formulas of [ERM] eprobit. For example, we need the two-sided probability function Φ∗d that is
discussed in Introduction in [ERM] eprobit.
If you are interested in all the details, we suggest you read Methods and formulas of [ERM] eprobit
in its entirety before reading this section. Here we mainly show how the complications that arise in
ERMs are handled in a linear regression framework.
Endogenous covariates
Continuous endogenous covariates
The vector zci contains variables from xi and other covariates that affect wci . For the model to
be identified, zci must contain one extra exogenous covariate not in xi for each of the endogenous
regressors in wci . The unobserved errors i and ci are multivariate normal with mean 0 and covariance
σ01c
σ2
Σ=
σ1c Σc
where
ri = [ yi − xi wci − zci Ac ]
The model for the outcome can be formulated with or without different variance and correlation
parameters for each level of wbi . Level-specific parameters are obtained by specifying povariance
or pocorrelation in the endogenous() option.
If the variance and correlation parameters are not level specific, we have
The windbji vectors are defined in Binary and ordinal endogenous covariates in [ERM] eprobit. The
binary and ordinal endogenous errors b1i , . . . , bBi and outcome error i are multivariate normal
with mean 0 and covariance
Σb σ1b
Σ=
σ01b σ 2
From here, we discuss the model with ordinal endogenous covariates. The results for binary
endogenous covariates are similar.
Using results from Likelihood for multiequation models in [ERM] eprobit, we can write the joint
density of yi and wbi using the conditional density of b1i , . . . , bBi on i .
Define
ri = yi − (xi β + windb1i βb1 + · · · + windbBi βbB )
Let
σ01b
µb|1,i = ri = [ eb1i . . . ebBi ]
σ2
σ1b σ01b
Σb|1 = Σb −
σ2
For j = 1, . . . , B and h = 0, . . . , Bj , let
−∞ h=0
cbjih = κbjh − zbji αbj − ebji h = 1, . . . , Bj − 1
∞ h = Bj
Let
li = [ lb1i . . . lbBi ]
ui = [ ub1i . . . ubBi ]
So, the log likelihood for this model is
N
X
wi ln Φ∗B (li , ui , Σb|1 )φ ri , σ 2
lnL =
i=1
eregress — Extended linear regression 13
The expected value of yi conditional on wbi can be calculated using the techniques discussed in
Predictions using the full model in [ERM] eprobit postestimation.
When the endogenous ordinal variables are different treatments, holding the variance and correlation
parameters constant over the treatment levels is a constrained form of the potential-outcome model. In
an unconstrained potential-outcome model, the variance of the outcome and the correlations between
the outcome and the treatments—the endogenous ordinal regressors wbi —vary over the levels of each
treatment.
In this unconstrained model, there is a different potential-outcome error for each level of each
treatment. For example, when the endogenous treatment variable w1 has three levels (0, 1, and 2) and
the endogenous treatment variable w2 has four levels (0, 1, 2, and 3), the unconstrained model has
12 = 3 × 4 outcome errors. So there are 12 outcome error variance parameters. Because there is a
different correlation between each potential outcome and each endogenous treatment, there are 2 × 12
correlation parameters between the potential outcomes and the treatments in this example model.
We denote the number of different combinations of values for the endogenous treatments wbi by
M , and we denote the vector of values in each combination by vj (j ∈ {1, 2, . . . , M }). Letting
kwp be the number of levels of endogenous ordinal treatment variable p ∈ {1, 2, . . . , B} implies that
M = kw1 × kw2 × · · · × kwB .
Denoting the outcome errors 1i , . . ., M i , we have
For j = 1, . . . , M , the endogenous errors b1i , . . . , bBi and outcome error ji are multivariate
normal with 0 mean and covariance
Σb σj1b
Σj =
σ0j1b σj2
Now let
M
X
σi,b = 1(wbi = vj )σj
j=1
M
!
X σj1b σ0j1b
Σi,b|1 = 1(wbi = vj ) Σb −
j=1
σj2
As in the other case, the expected value of yi conditional on wbi can be calculated using the
techniques discussed in Predictions using the full model in [ERM] eprobit postestimation.
14 eregress — Extended linear regression
Treatment
In the potential-outcomes framework, the treatment ti is a discrete variable taking T values,
indexing the T potential outcomes of the outcome yi : y1i , . . . , yT i .
When we observe treatment ti with levels v1 , . . . , vT , we have
T
X
yi = 1(ti = vj )yji
j=1
So for each observation, we observe only the potential outcome associated with that observation’s
treatment value.
For exogenous treatments, our approach is equivalent to the regression adjustment treatment-effect
estimation method. See [TE] teffects intro advanced. We do not model the treatment assignment
process. The formulas for the treatment effects and potential-outcome means (POMs) are equivalent
to what we provide here for endogenous treatments. The treatment effect on the treated for xi for an
exogenous treatment is equivalent to what we provide here for the endogenous treatment when the
correlation parameter between the outcome and treatment errors is set to 0. The average treatment
effects (ATEs) and POMs for exogenous treatments are estimated as predictive margins in an analogous
manner to what we describe here for endogenous treatments. We can also obtain different variance
parameters for the different exogenous treatment groups by specifying povariance in extreat().
From here, we assume an endogenous treatment ti . As in Treatment in [ERM] eprobit, we model
the treatment assignment process with a probit or ordered probit model, and we call the treatment
assignment error ti . A linear regression of yi on exogenous covariates xi and endogenous treatment
ti taking values v1 , . . . , vT has the form
y1i = xi β1 + 1i
..
.
yT i = xi βT + T i
T
X
yi = 1(ti = vj )yji
j=1
This model can be formulated with or without different variance and correlation parameters for each
potential outcome. Potential-outcome specific parameters are obtained by specifying povariance or
pocorrelation in the entreat() option.
If the variance and correlation parameters are not potential-outcome specific, for j = 1, . . . , T ,
ji and ti are bivariate normal with mean 0 and covariance
2
σ σρ1t
Σ=
σρ1t 1
The treatment is exogenous if ρ1t = 0. Note that we did not specify the structure of the correlations
between the potential-outcome errors. We do not need information about these correlations to estimate
POMs and treatment effects because all covariates and the outcome are observed in observations from
each group.
From here, we discuss a model with an ordinal endogenous treatment. The results for binary
treatment models are similar.
eregress — Extended linear regression 15
As in Binary and ordinal endogenous covariates, using the results from Likelihood for multiequation
models in [ERM] eprobit, we can write the joint density of yi and ti using the conditional density
of the treatment error ti on the outcome errors i1 , . . . , T i .
Define
ri = yi − xi βj if t i = vj
N
X n ρ1t ρ1t o
lnL = wi ln Φ∗1 lti − ri , uti − ri , 1 − ρ21t φ ri , σ 2
i=1
σ σ
where lti and uti are the limits for the treatment probability given in Treatment in [ERM] eprobit.
The treatment effect yji − y1i is the difference in the outcome for individual i if the individual
receives the treatment ti = vj and what the difference would have been if the individual received the
control treatment ti = v1 instead.
The conditional POM for treatment group j is
For treatment group j , the treatment effect on the treated (TET) in group h for covariates xi is
Remembering that the outcome errors and the treatment error ti are multivariate normal, for
j = 1, . . . , T , we can decompose ji such that
We can take the expectation of these conditional predictions over the covariates to get population
average parameters. The estat teffects or margins command is used to estimate the expectations
as predictive margins once the model is estimated with eregress. The POM for treatment group j is
For treatment group j , the average treatment effect on the treated (ATET) in treatment group h is
ATETjh = E (yji − y1i |ti = vh ) = E {TETj (xi , ti = vh )|ti = vh }
In Predictions using the full model in [ERM] eprobit postestimation, we discuss how the conditional
mean of i is calculated.
If the variance and correlation parameters are potential-outcome specific, for j = 1, . . . , T , ji
and ti are bivariate normal with mean 0 and covariance
2
σj σj ρjt
Σj =
σj ρjt 1
Now define
T
X
ρi = 1(ti = vj )ρjt
j=1
T
X
σi = 1(ti = vj )σj
j=1
The definitions for the potential-outcome means and treatment effects are the same as in the case
where the variance and correlation parameters did not vary by potential outcome. For the treatment
effect on the treated (TET) of group j in group h, we have
TETj (xi , ti = vh ) = E (yji − y1i |xi , ti = vh )
= xi βj − xi β1 + E (ji |xi , ti = vh ) − E (1i |xi , ti = vh )
The outcome errors and the treatment error ti are multivariate normal, so for j = 1, . . . , T , we can
decompose ji such that
ji = σj ρj ti + ψji
where ψji has mean 0 and is independent of ti .
It follows that
TETj (xi , ti = vh ) = E (yji − y1i |xi , ti = vh )
= xi βj − xi β1 + (σj ρj − σ1 ρ1 )E (ti |xi , ti = vh )
The mean of ti conditioned on ti and the exogenous covariates xi can be determined using
the formulas discussed in Predictions using the full model in [ERM] eprobit postestimation. It is
nonzero. So the treatment effect on the treated will be equal only to the treatment effect under
an exogenous treatment or when the correlation and variance parameters are identical between the
potential outcomes.
As in the other case, we can take the expectation of these conditional predictions over the
covariates to get population-averaged parameters. The estat teffects or margins command is
used to estimate the expectations as predictive margins once the model is fit with eregress.
eregress — Extended linear regression 17
yi = xi β + i > 0
si = 1 (zsi αs + si > 0)
where xi are covariates that affect the outcome and zsi are covariates that affect selection. The
outcome yi is observed if si = 1 and is not observed if si = 0. The unobserved errors i and si are
normal with mean 0 and covariance
2
σ σρ1s
Σ=
σρ1s 1
As in the previous section, using the results from Likelihood for multiequation models in [ERM] epro-
bit, we can write the joint density of yi and si using the conditional density of the selection error
si on the outcome error i .
For the selection indicator si , we have lower and upper limits
−∞ si = 0 −zsi αs si = 0
(
lsi = usi =
−z α − ρ1s (y − x β) s = 1
si s σ i i i ∞ si = 1
N
X X
wi lnΦ∗1 lsi , usi , 1 − si ρ21s + wi ln φ yi − xi β, σ 2
lnL =
i=1 i∈S
yi = xi β + i > 0
18 eregress — Extended linear regression
We observe the selection indicator si , which indicates the censoring status of the latent selection
variable s?i ,
s?i = zsi αs + si
li s?i ≤ li
si = s?i li < s?i < ui
ui s?i ≥ ui
where zsi are covariates that affect selection and li and ui are fixed lower and upper limits.
The outcome yi is observed when s?i is not censored (li < s?i < ui ). The outcome yi is not
observed when s?i is left-censored (s?i ≤ li ) or s?i is right-censored (s?i ≥ ui ). The unobserved errors
i and si are normal with mean 0 and covariance
σ2 σ1s
σ1s σs2
where S is the set of observations for which yi is observed, L is the set of observations where s?i
is left-censored, and U is the set of observations where s?i is right-censored. The lower and upper
limits for selection — lli , uli , lui , and uui — are defined in Tobit endogenous sample selection in
[ERM] eprobit.
When si is not a covariate in xi , we use the standard conditional mean formula,
E(yi |xi ) = xi β
Otherwise, we use
σ1s
E(yi |xi , si , zsi ) = xi β + (si − zsi αs )
σs2
eregress — Extended linear regression 19
Random effects
For a linear regression with random effects, we observe panel data. For panel i = 1, . . . , N and
observation j = 1, . . . , Ni , a linear regression of outcome yij on covariates xij may be written as
The random effect ui is normal with mean 0 and variance σu2 . It is independent of the observation-level
error ij , which is normal with mean 0 and variance σ 2 .
We derive the likelihood by using the conditional density of yij on the random effect ui and the
marginal density of ui . Multiplying them together, we have the joint density, which is integrated over
ui .
Let
lij (u) = φ yij − xij β − u, σ 2
We can approximate this integral using Gauss–Hermite quadrature. For q -point Gauss–Hermite
quadrature, let the abscissa and weight pairs be denoted by (aki , wki ), k = 1, . . . , q . The Gauss–
Hermite quadrature approximation is then
Z ∞ q
X
f (x) exp(−x2 ) dx ≈ wki f (aki )
−∞ k=1
Combinations of features
Extended linear regression models that involve multiple features can be formulated using the
techniques discussed in Likelihood for multiequation models in [ERM] eprobit. Essentially, the
density of the observed endogenous covariates can be written in terms of the unobserved normal
errors. The observed endogenous and exogenous covariates determine the range of the errors, and the
joint density can be evaluated as multivariate normal probabilities and densities.
20 eregress — Extended linear regression
Confidence intervals
The estimated variances will always be nonnegative, and the estimated correlations will always fall
in (−1, 1). To obtain confidence intervals that accommodate these ranges, we must use transformations.
We use the log transformation to obtain the confidence intervals for variance parameters and
the atanh transformation to obtain confidence intervals for correlation parameters. For details, see
Confidence intervals in [ERM] eprobit.
References
Amemiya, T. 1985. Advanced Econometrics. Cambridge, MA: Harvard University Press.
Baltagi, B. H. 2013. Econometric Analysis of Panel Data. 5th ed. Chichester, UK: Wiley.
Bartus, T., and D. Roodman. 2014. Estimation of multiprocess survival models with cmp. Stata Journal 14: 756–777.
Drukker, D. M. 2016. A generalized regression-adjustment estimator for average treatment effects from panel data.
Stata Journal 16: 826–836.
Elbakidze, L., L. Lu, and S. Eigenbrode. 2011. Evaluating vector-virus-yield interactions for peas and lentils under
climatic variability: A limited dependent variable analysis. Journal of Agricultural and Resource Economics 36:
504–520. https://doi.org/10.22004/ag.econ.119177.
Heckman, J. 1976. The common structure of statistical models of truncation, sample selection and limited dependent
variables and a simple estimator for such models. Annals of Economic and Social Measurement 5: 475–492.
. 1978. Dummy endogenous variables in a simultaneous equation system. Econometrica 46: 931–959.
https://doi.org/10.2307/1909757.
. 1979. Sample selection bias as a specification error. Econometrica 47: 153–161.
https://doi.org/10.2307/1912352.
Keshk, O. M. G. 2003. Simultaneous equations models: What are they and how are they estimated. Program in
Statistics and Methodology, Department of Political Science, Ohio State University.
https://polisci.osu.edu/sites/polisci.osu.edu/files/Simultaneous Equations.pdf.
Maddala, G. S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge: Cambridge University
Press.
Maddala, G. S., and L.-F. Lee. 1976. Recursive Models with Qualitative Endogenous Variables. Annals of Economic
and Social Measurement 5: 525–545.
Maitra, C., and P. Rao. 2014. An empirical investigation into measurement and determinants of food security in slums
of Kolkata. School of Economics Discussion Paper No. 531, School of Economics, University of Queensland.
espace.library.uq.edu.au/view/UQ:352184.
Roodman, D. 2011. Fitting fully observed recursive mixed-process models with cmp. Stata Journal 11: 159–206.
White, H. L., Jr. 1996. Estimation, Inference and Specification Analysis. Cambridge: Cambridge University Press.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.
. 2014. Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory
variables. Journal of Econometrics 182: 226–234. https://doi.org/10.1016/j.jeconom.2014.04.020.
. 2020. Introductory Econometrics: A Modern Approach. 7th ed. Boston: Cengage.
eregress — Extended linear regression 21
Also see
[ERM] eregress postestimation — Postestimation tools for eregress and xteregress
[ERM] eregress predict — predict after eregress and xteregress
[ERM] predict advanced — predict’s advanced features
[ERM] predict treatment — predict for treatment statistics
[ERM] estat teffects — Average treatment effects for extended regression models
[ERM] Intro 9 — Conceptual introduction via worked example
[R] heckman — Heckman selection model
[R] ivregress — Single-equation instrumental-variables regression
[R] regress — Linear regression
[SVY] svy estimation — Estimation commands for survey data
[TE] etregress — Linear regression with endogenous treatment effects
[XT] xtheckman — Random-effects regression with sample selection
[XT] xtreg — Fixed-, between-, and random-effects and population-averaged linear models
[XT] xtivreg — Instrumental variables and two-stage least squares for panel-data models
[U] 20 Estimation and postestimation commands