Class 10 Multilevel Models
Class 10 Multilevel Models
Announcements
Final class!
Papers due today
Topics:
Presentations
Multilevel models
EHA: Shared Frailty
EHA: Heterogeneous Diffusion Models.
Multilevel Data
Simple example: 2-level data
Class
Class
Class
Class
Class
Class
Level 1
Class 1
S1
S2
Class 2
S3
S1
S2
Class 3
S3
S1
S2
S3
Solutions:
Each has benefits, disadvantages
1.
2.
3.
4.
5.
6.
OLS regression
Aggregation (between effects model)
Robust Standard Errors
Robust Cluster Standard Errors
Dummy variables (Fixed Effects Model)
Random effects models
Yij = j + X ij + ij
For i cases within j groups
Therefore j is a separate intercept for each group
It is equivalent to solely at within-group variation:
Yij Y j = ( X ij X j ) + ij j
X-bar-sub-j is mean of X for group j, etc
Model is within group because all variables are
centered around mean of each group.
Yij = 0 + j + ij
Where is the main intercept
Zeta () is a random effect for each group
Allowing each of j groups to have its own intercept
Assumed to be independent & normally distributed
within = 0.0220
between = 0.0371
overall = 0.0240
Assumes
normal uj,
uncorrelated
with X vars
Number of obs
Number of groups
=
=
27807
26
511
1069.5
2154
Wald chi2(7)
Prob > chi2
625.50
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038709
.0008152
-4.75
0.000
-.0054688
-.0022731
male |
.0978732
.0229632
4.26
0.000
.0528661
.1428802
dmar |
.0030441
.0252075
0.12
0.904
-.0463618
.05245
demp | -.0737466
.0252831
-2.92
0.004
-.1233007
-.0241926
educ |
.0857407
.0061501
13.94
0.000
.0736867
.0977947
incomerel |
.0090308
.0059314
1.52
0.128
-.0025945
.0206561
ses |
.131528
.0134248
9.80
0.000
.1052158
.1578402
_cons |
5.924611
.1287468
46.02
0.000
5.672272
6.17695
-------------+---------------------------------------------------------------sigma_u | .59876138
SD of u (intercepts); SD of e; intra-class correlation
sigma_e | 1.8701896
rho | .09297293
(fraction of variance due to u_i)
Ex:
egen meanvar1 = mean(var1), by(groupid)
egen withinvar1 = var1 meanvar1
Include mean (aggregate) & within variable in model.
Number of obs
Number of groups
=
=
27807
26
511
1069.5
2154
LR chi2(8)
Prob > chi2
620.41
0.0000
=
=
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------meanage |
.0268506
.0239453
1.12
0.262
-.0200812
.0737825
withinage |
-.003903
.0008156
-4.79
0.000
-.0055016
-.0023044
male |
.0981351
.0229623
4.27
0.000
.0531299
.1431403
dmar |
.003459
.0252057
0.14
0.891
-.0459432
.0528612
demp | -.0740394
.02528
-2.93
0.003
-.1235873
-.0244914
educ |
.0856712
.0061483
13.93
0.000
.0736207
.0977216
incomerel |
.008957
.0059298
1.51
0.131
-.0026651
.0205792
ses |
.131454
.0134228
9.79
0.000
.1051458
.1577622
_cons |
4.687526
.9703564
4.83
0.000
2.785662
6.58939
Yij = 1 + 1 j + 2 X ij + 2 j X ij + ij
Which can be written as:
Yij = (1 + 1 j ) + ( 2 + 2 j )X ij + ij
Where zeta-1 is a random intercept component
Zeta-2 is a random slope component.
Wald chi2(7)
=
625.75
Log likelihood = -56919.098
Number of obs
Number of groups
=
=
27807
26
511
1069.5
2154
0.0000
-----------------------------------------------------------------------------supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038662
.0008151
-4.74
0.000
-.0054638
-.0022687
male |
.0978558
.0229613
4.26
0.000
.0528524
.1428592
dmar |
.0031799
.0252041
0.13
0.900
-.0462193
.0525791
demp | -.0738261
.0252797
-2.92
0.003
-.1233734
-.0242788
educ |
.0857707
.0061482
13.95
0.000
.0737204
.097821
incomerel |
.0090639
.0059295
1.53
0.126
-.0025578
.0206856
ses |
.1314591
.0134228
9.79
0.000
.1051509
.1577674
_cons |
5.924237
.118294
50.08
0.000
5.692385
6.156089
-----------------------------------------------------------------------------[remainder of output cut off] Note: xtmixed yields identical results to xtreg , mle
Non-zero SD
indicates that
intercepts vary
Model fits a
different slope &
intercept for
each group!
4
6
highest educational level attained
Random Coefficients
Why bother with random coefficients?
1. A solution for clustering (non-independence)
Usually people just use random intercepts, but slopes may be
an issue also
3. Better predictions
Attention to group-specific random effects can yield better
predictions (e.g., slopes) for each group
Rather than just looking at average slope for all groups.
Random Coefficients
4. Multilevel models explicitly put attention on
levels of causality
Higher level / contextual effects versus individual /
unit-level effects
A technology for separating out between/within
NOTE: this can be done w/out random effects
But it goes hand-in-hand with clustered data
Ex: Classrooms
Is it student SES, or contextual class/school SES?
Yij = 1 + 1 j + 2 X ij + 2 j X ij + ij
However, it is common to separate levels:
Level 1 equation
Yij = 1 + 2 X ij + ij
Intercept equation
1 = 1 + u1 j
Gamma = constant
2 = 2 + u2 j
Slope Equation
u = random effect
Rules:
1. Specify an OLS model, just like normal
2. Consider which OLS coefficients should have a
random component
These could be the intercept or any X (slope) coefficient
Cross-Level Interactions
Does context (i.e., level-2) influence the effect
of level-1 variables?
Example: Effect of poverty on homelessness
Does it interact with welfare state variables?
Cross-level interactions
Idea: specify a level-2 variable that affects a
level-1 slope
Level 1 equation
Yij = 1 + 2 X ij + ij
Intercept equation
Cross-level interaction:
1 = 1 + u1 j
2 = 2 + 3 Z j + u2 j
Cross-level Interactions
Cross-level interaction in single-equation
form:
Random Coefficient Model with cross-level interaction
Yij = 1 + 1 j + 2 X ij + 2 j X ij + 3X ij Z j + ij
Stata strategy: manually compute cross-level
interaction variables
Ex: Poverty*WelfareState, Gender*SingleSexSchool
Then, put interaction variable in the fixed model
Cross-level Interactions
Pro-environmental attitudes
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
Mixed-effects ML regression
Group variable: country
supportenv |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------age | -.0038786
.0008148
-4.76
0.000
-.0054756
-.0022817
male |
.1006206
.0229617
4.38
0.000
.0556165
.1456246
dmar |
.0041417
.025195
0.16
0.869
-.0452395
.0535229
demp | -.0733013
.0252727
-2.90
0.004
-.1228348
-.0237678
educ |
-.035022
.0297683
-1.18
0.239
-.0933668
.0233227
income_dev |
.0081591
.005936
1.37
0.169
-.0034753
.0197934
inc_meanXeduc|
.0265714
.0064013
4.15
0.000
.0140251
.0391177
ses |
.1307931
.0134189
9.75
0.000
.1044926
.1570936
_cons |
5.892334
.107474
54.83
0.000
5.681689
6.102979
------------------------------------------------------------------------------
Cross-level Interactions
Random part of output (contd from last slide)
. xtmixed supportenv age male dmar demp educ income_dev inc_meanXeduc ses ||
country: income_mean , mle cov(unstr)
-----------------------------------------------------------------------------Random-effects Parameters |
Estimate
Std. Err.
[95% Conf. Interval]
-----------------------------+-----------------------------------------------country: Unstructured
|
sd(income~n) |
.5419256
.2095339
.253995
1.156256
sd(_cons) |
2.326379
.8679172
1.11974
4.8333
corr(income~n,_cons) | -.9915202
.0143006
-.999692
-.7893791
-----------------------------+-----------------------------------------------sd(Residual) |
1.869388
.0079307
1.853909
1.884997
-----------------------------------------------------------------------------LR test vs. linear regression:
chi2(3) = 2124.20
Prob > chi2 = 0.0000
Random components:
Income_mean slope allowed to have random variation
Interceps (cons) allowed to have random variation
cov(unstr) allows for the possibility of correlation between
random slopes & intercepts generally a good idea.
Panel Data
Panel data is a multilevel structure
Cases measured repeatedly over time
Measurements are nested within cases
Person 1
Person 2
Person 3
Person 4
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
T1 T2 T3 T4 T5
Panel Data
Issue: panel data may involve clustering
across cases & time
Good news: Statas xt commands were
made for this
Allow specification of both ID and TIME clusters
Ex: xtreg var1 var2 var3, mle i(countryid) t(year)