Multilevel Analysis Techniques and Applications 3rd Edition Joop J. Hox All Chapter Instant Download
Multilevel Analysis Techniques and Applications 3rd Edition Joop J. Hox All Chapter Instant Download
com
https://ebookgate.com/product/multilevel-analysis-
techniques-and-applications-3rd-edition-joop-j-
hox/
https://ebookgate.com/product/multilevel-analysis-techniques-and-
applications-second-edition-joop-hox/
https://ebookgate.com/product/multilevel-analysis-techniques-and-
applications-quantitative-methodology-series-1st-edition-j-j-hox/
https://ebookgate.com/product/an-introduction-to-multilevel-
modeling-techniques-mlm-and-sem-approaches-using-mplus-3rd-
edition-ronald-h-heck/
https://ebookgate.com/product/trace-environmental-quantitative-
analysis-principles-techniques-and-applications-2nd-ed-edition-
loconto/
Trace Environmental Quantitative Analysis Principles
Techniques and Applications 1st Edition Paul R. Loconto
https://ebookgate.com/product/trace-environmental-quantitative-
analysis-principles-techniques-and-applications-1st-edition-paul-
r-loconto/
https://ebookgate.com/product/nondestructive-evaluation-theory-
techniques-and-applications-1st-edition-peter-j-shull-author/
https://ebookgate.com/product/multilevel-analysis-of-individuals-
and-cultures-1st-edition-fons-j-r-van-de-vijver/
https://ebookgate.com/product/trace-environmental-quantitative-
analysis-principles-techniques-and-applications-second-edition-
paul-r-loconto-author/
https://ebookgate.com/product/geometry-and-its-applications-3rd-
edition-walter-j-meyer/
This an outstanding introduction to the topic of multilevel modeling. The new edition
is even more detailed with key chapter revisions and additions, all combined with
insightful computer-based examples and discussions. It is an excellent resource for
anyone wanting to learn about multilevel analysis.
—George A. Marcoulides, University of California, Santa Barbara
This is a comprehensive book that takes the reader from the basics of multilevel
modeling through to advanced extensions into models used for meta-analysis and
survival analysis. It also describes the links with structural equation modeling and
other latent models such as path models and factor analysis models. The book offers a
great exposition of both the models and the estimation methods used to fit them and is
accessible and links each chapter well to available software for the models described.
The book also covers topics such as Bayesian estimation and power calculations in
the multilevel setting. This edition is a valuable addition to the multilevel modeling
literature.
—William Browne, Centre for Multilevel Modelling, University of Bristol
This book has been a staple in my research diet. The author team is at the developing
edge of multilevel modeling and as they state about multilevel analysis in general,
‘both the statistical techniques and the software tools are evolving rapidly.’ Their book
is the perfect melding of being an introduction to multilevel modeling as well as a
researcher’s resource when it comes to the recent advances (e.g., Bayesian multilevel
modeling, bootstrap estimation). It’s clearly written. With a light and unpretentious
voice, the book narrative is not only accessible, it is also inviting.
—Todd D. Little, Director and Founder, Institute for Measurement, Methodology,
Analysis, and Policy, Texas Tech University; Director and Founder of Stats Camp
Multilevel Analysis
Applauded for its clarity, this accessible introduction helps readers apply multilevel
techniques to their research. The book also includes advanced extensions, making it useful
as both an introduction for students and as a reference for researchers. Basic models and
examples are discussed in nontechnical terms with an emphasis on understanding the
methodological and statistical issues involved in using these models. The estimation and
interpretation of multilevel models is demonstrated using realistic examples from various
disciplines including psychology, education, public health, and sociology. Readers are
introduced to a general framework on multilevel modeling which covers both observed and
latent variables in the same model, while most other books focus on observed variables. In
addition, Bayesian estimation is introduced and applied using accessible software.
Mirjam Moerbeek is Associate Professor of Statistics for the Social Sciences at Utrecht
University, the Netherlands.
iii
Quantitative Methodology Series
This series presents methodological techniques to investigators and students. The goal is
to provide an understanding and working knowledge of each method with a minimum of
mathematical derivations. Each volume focuses on a specific method (e.g. factor analysis,
multilevel analysis, structural equation modeling).
Proposals are invited from interested authors. Each proposal should consist of: a brief
description of the volume’s focus and intended market; a table of contents with an outline of
each chapter; and a curriculum vitae. Materials may be sent to Dr. George A. Marcoulides,
University of California – Santa Barbara, [email protected].
Published titles
Marcoulides • Modern Methods for Business Research
Marcoulides/Moustaki • Latent Variable and Latent Structure Models
Heck • Studying Educational and Social Policy: Theoretical Concepts and
Research Methods
van der Ark/Croon/Sijtsma • New Developments in Categorical Data
Analysis for the Social and Behavioral Sciences
Duncan/Duncan/Strycker • An Introduction to Latent Variable Growth
Curve Modeling: Concepts, Issues, and Applications, Second Edition
Cardinet/Johnson/Pini • Applying Generalizability Theory Using EduG
Creemers/Kyriakides/Sammons • Methodological Advances in Educational
Effectiveness Research
Heck/Thomas/Tabata • Multilevel Modeling of Categorical Outcomes
Using IBM SPSS
Heck/Thomas/Tabata • Multilevel and Longitudinal Modeling with IBM
SPSS, Second Edition
McArdle/Ritschard • Contemporary Issues in Exploratory Data Mining in
the Behavioral Sciences
Heck/Thomas • An Introduction to Multilevel Modeling Techniques: MLM
and SEM Approaches Using Mplus, Third Edition
Hox/Moerbeek/van de Schoot • Multilevel Analysis: Techniques and
Applications, Third Edition
Multilevel Analysis
Third Edition
and by Routledge
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
The right of Joop J. Hox, Mirjam Moerbeek, and Rens van de Schoot to be
identified as authors of this work has been asserted by them in accordance
with sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
Preface xi
vii
viii Contents
7. The Multilevel Generalized Linear Model for Categorical and Count Data 130
7.1 Ordered Categorical Data 130
7.2 Count Data 139
7.3 Explained Variance in Ordered Categorical and Count Data 146
7.4 The Ever-Changing Latent Scale, Again 147
7.5 Software 147
This book is intended as an introduction to multilevel analysis for students and researchers.
The term ‘multilevel’ refers to a hierarchical or nested data structure, usually subjects within
organizational groups, but the nesting may also consist of repeated measures within subjects,
or respondents within clusters, as in cluster sampling. The expression multilevel model is
used as a generic term for all models for nested data. Multilevel analysis is used to examine
relations between variables measured at different levels of the multilevel data structure.
This book presents two types of multilevel model in detail: the multilevel regression model
and the multilevel structural equation model. Although multilevel analysis is used in many
research fields, the examples in this book are mainly from the social and behavioral sciences.
In the past decades, multilevel analysis software has become available that is both
powerful and accessible, either as special packages or as part of a general software package.
In addition, several handbooks have been published, including the earlier editions of this
book. There is a continuing interest in multilevel analysis, as evidenced by the appearance
of several reviews and monographs, applications in different fields ranging from psychology
and sociology to education and medicine, a thriving Internet discussion list with more than
1400 subscribers, and a biennial International Multilevel Conference that has been running
for more than 20 years. The view of ‘multilevel analysis’ applying to individuals nested
within groups has changed to a view that multilevel models and analysis software offer a
very flexible way to model complex data. Thus, multilevel modeling has contributed to the
analysis of traditional individuals within groups data, repeated measures and longitudinal
data, sociometric modeling, twin studies, meta-analysis and analysis of cluster randomized
trials.
This book treats two classes of multilevel models: multilevel regression models, and
multilevel structural equation models (MSEM).
Multilevel regression models are essentially a multilevel version of the familiar multiple
regression model. As Cohen and Cohen (1983), Pedhazur (1997) and others have shown, the
multiple regression model is very versatile. Using dummy coding for categorical variables,
it can be used to analyze analysis of variance (ANOVA) type models, as well as the more
xi
xii Preface
usual multiple regression models. Since the multilevel regression model is an extension
of the classical multiple regression model, it too can be used in a wide variety of research
problems.
Chapter 2 of this book contains a basic introduction to the multilevel regression model,
also known as the hierarchical linear model, or the random coefficient model. Chapter 3 and
Chapter 4 discuss estimation procedures, and a number of important methodological and
statistical issues. They also discuss some technical issues that are not specific to multilevel
regression analysis, such as centering of predictors and interpreting interactions.
Chapter 5 introduces the multilevel regression model for longitudinal data. The model is
a straightforward extension of the standard multilevel regression model, but there are some
specific complications, such as autocorrelated errors, which are discussed.
Chapter 6 treats the generalized linear model for dichotomous data and proportions.
When the response (dependent) variable is dichotomous or a proportion, standard regression
models should not be used. This chapter discusses the multilevel version of the logistic and
the probit regression model.
Chapter 7 extends the generalized linear model introduced in Chapter 6 to analyze data
that are ordered categorically and to data that are counts of events. In the context of counts,
it presents models that take an overabundance of zeros into account.
Chapter 8 introduces multilevel modeling of survival or event history data. Survival
models are for data where the outcome is the occurrence or non-occurrence of a certain
event, in a certain observation period. If the event has not occurred when the observation
period ends, the outcome is said to be censored, since we do not know whether or not the
event has taken place after the observation period ended.
Chapter 9 discusses cross-classified models. Some data are multilevel in nature, but
do not have a neat hierarchical structure. Examples are longitudinal school research
data, where pupils are nested within schools, but may switch to a different school in later
measurements, and sociometric choice data. Multilevel models for such cross-classified
data can be formulated, and estimated with standard software provided that it can handle
restrictions on estimated parameters.
Chapter 10 discusses multilevel regression models for multivariate outcomes. These can
also be used to assess the reliability of multilevel measurements.
Chapter 11 describes a variant of the multilevel regression model that can be used in
meta-analysis. It resembles the weighted regression model often recommended for meta-
analysis. Using standard multilevel regression procedures, it is a flexible analysis tool,
especially when the meta-analysis includes multivariate outcomes.
Chapter 12 deals with the sample size needed for multilevel modeling, and the problem
of estimating the power of an analysis given a specific sample size. An obvious complication
in multilevel power analysis is that there are different sample sizes at the distinct levels
which should be taken into account.
Preface xiii
Chapter 13 discusses the statistical assumptions made and presents some ways to check
these. It also discusses more robust estimation methods, such as the profile likelihood
method and robust standard errors for establishing confidence intervals, and multilevel
bootstrap methods for estimating bias-corrected point-estimates and confidence intervals.
This chapter also contains an introduction into Bayesian (MCMC) methods for estimation
and inference.
Multilevel structural equation models (MSEM), are a powerful tool for the analysis of
multilevel data. Recent versions of structural equation modeling software such as LISREL,
and Mplus all include at least some multilevel features. The general statistical model for
multilevel covariance structure analysis is quite complicated. Chapter 14 describes two
different approaches to estimation in multilevel confirmatory factor analysis. In addition,
it deals with issues of calculating standardized coefficients and goodness-of-fit indices in
multilevel structural models. Chapter 15 extends this to multilevel path models.
Chapter 16 describes structural models for latent curve analysis. This is an SEM
approach to analyzing longitudinal data, which is very similar to the multilevel regression
models treated in Chapter 5.
This book is intended as an introduction to the world of multilevel analysis. Most of
the chapters on multilevel regression analysis should be readable by social and behavioral
scientists who have a good general knowledge of analysis of variance and classical multiple
regression analysis. Some of these chapters contain material that is more difficult, but these
are generally a discussion of specialized problems, which can be skipped at first reading.
An example is the chapter on longitudinal models, which contains a long discussion of
techniques to model specific structures for the covariances between adjacent time points.
This discussion is not needed in understanding the essentials of multilevel analysis
of longitudinal data, but it may become important when one is actually analyzing such
data. The chapters on multilevel structure equation modeling obviously require a strong
background in multivariate statistics and some background in structural equation modeling,
equivalent to, for example, the material covered in Tabachnick and Fidell’s (2013) book on
multivariate analysis. On the other hand, in addition to an adequate background in structural
equation modeling, the chapters on multilevel structural equation modeling do not require
knowledge of advanced mathematical statistics. In all these cases, we have tried to keep
the discussion of the more advanced statistical techniques theoretically sound, but non-
technical.
In addition to its being an introduction, this book describes many extensions and special
applications. As an introduction, it is useable in courses on multilevel modeling in a variety
of social and behavioral fields, such as psychology, education, sociology, and business.
The various extensions and special applications also make it useful to researchers who
work in applied or theoretical research, and to methodologists who have to consult with
these researchers. The basic models and examples are discussed in non-technical terms; the
emphasis is on understanding the methodological and statistical issues involved in using
xiv Preface
these models. Some of the extensions and special applications contain discussions that are
more technical, either because that is necessary for understanding what the model does,
or as a helpful introduction to more advanced treatments in other texts. Thus, in addition
to its role as an introduction, the book should be useful as a standard reference for a large
variety of applications. The chapters that discuss specialized problems, such as the chapter
on cross-classified data, the meta-analysis chapter, and the chapter on advanced issues in
estimation and testing, can be skipped entirely if preferred.
One important change compared to the second edition is the introduction of two co-authors.
This reflects the expansion of multilevel analysis; the field has become so broad that it is
virtually impossible for a single author to keep up with the new developments, both in
statistical theory and in software.
Compared to the second edition, some chapters have changed much, while other
chapters have mostly been updated to reflect recent developments in statistical research and
software development. One important development is increased use of Bayesian estimation
and development of robust maximum likelihood estimation. We have chosen not to add a
separate chapter on Bayesian estimation; instead, Bayesian estimation is discussed in those
places where its use improves estimation. The chapters on multilevel logistic regression and
on multilevel ordered regression have been expanded with a better treatment of the linked
problems of latent scale and explained variance. In multilevel structural equation modeling
(MSEM) the developments have been so fast that the chapters on multilevel confirmatory
factor analysis and on multilevel path analysis have been significantly revised, in part by
removing discussion of estimation methods that are now clearly outdated. The chapter on
sample size and power and the chapter on multilevel survival analysis have been extensively
rewritten.
An updated website at (https://multilevel-analysis.sites.uu.nl/) holds the data sets for all
the text examples formatted using the latest versions of SPSS, HLM, MLwiN and Mplus,
plus some software introductions with updated screen shots for each of these programs. Most
analyses in this book can be carried out by any multilevel regression program, although the
majority of the multilevel regression analyses were carried out in HLM and MLwiN. The
multilevel SEM analyses all use Mplus. System files and setups using these packages are
also available at the website.
Some of the example data are real, while others have been simulated especially for this
book. The data sets are quite varied so as to appeal to those in several disciplines, including
education, sociology, psychology, family studies, medicine, and nursing; Appendix E
describes the various data sets used in this book in detail. Further example data will be
added to the website for use in computer labs.
Preface xv
Acknowledgments
We thank Dick Carpenter, Lawrence DeCarlo, Brian Gray, Ellen Hamaker, Don Hedeker,
Peter van der Heijden, Herbert Hoijtink, Suzanne Jak, Bernet Sekasanvu Kato, Edith de
Leeuw, Cora Maas, George Marcoulides, Cameron McIntosh, Herb Marsh, Allison O’Mara,
Ian Plewis, Ken Rowe, Elif Unal, Godfried van den Wittenboer, and Bill Yeaton for their
comments on the manuscript of the current book or on earlier editions. Their critical
comments still shape this book. We also thank numerous students for the feedback they
gave us in our multilevel courses.
We thank our colleagues at the Department of Methodology and Statistics of the Faculty
of Social Sciences at Utrecht University for providing us with many discussions and a
generally stimulating research environment. Our research has also benefited from the
lively discussions by the denizens of the Internet Multilevel Modeling and the Structural
Equations Modeling (SEMNET) discussion lists.
We also express our gratitude to the reviewers that reviewed our proposal for the new
edition. They provided valuable feedback on the contents and the structure of the proposed
book.
As always, any errors remaining in the book are entirely our own responsibility. We
appreciate hearing about them, and will keep a list of errata on the homepage of this book.
Joop J. Hox
Mirjam Moerbeek
Rens van de Schoot
Summary
Social research regularly involves problems that investigate the relationship between
individuals and the social contexts in which they live, work, or learn. The general concept
is that individuals interact with the social contexts to which they belong, that individual
persons are influenced by the contexts or groups to which they belong, and that those groups
are in turn influenced by the individuals who make up that group. The individuals and
the social groups are conceptualized as a hierarchical system of individuals nested within
groups, with individuals and groups defined at separate levels of this hierarchical system.
Naturally, such systems can be observed at different hierarchical levels, and variables may
be defined at each level. This leads to research into the relationships between variables
characterizing individuals and variables characterizing groups, a kind of research that is
generally referred to as ‘multilevel research’.
In multilevel research, the data structure in the population is hierarchical, and the sample
data are a sample from this hierarchical population. For example, in educational research,
the population typically consists of classes and pupils within these classes, with classes
organized within schools. The sampling procedure often proceeds in successive stages: first,
we take a sample of schools, next we take a sample of classes within each sampled school,
and finally we take a sample of pupils within each sampled class. Of course, in real research
one may have a convenience sample of schools, or one may decide not to sample pupils but to
study all available pupils in each class. Nevertheless, one should keep firmly in mind that the
central statistical model in multilevel analysis is one of successive sampling from each level
of a hierarchical population.
In this example, pupils are nested within classes. Other examples are cross-national
studies where the individuals are nested within their national units, organizational research
with individuals nested within departments within organizations, family research with
family members within families and methodological research into interviewer effects with
respondents nested within interviewers. Less obvious applications of multilevel models
are longitudinal research and growth curve research, where a series of several distinct
observations are viewed as nested within individuals, and meta-analysis where the subjects
are nested within different studies.
1
2 Multilevel Analysis: Techniques and Applications
In multilevel research, variables can be defined at any level of the hierarchy. Some of these
variables may be measured directly at their ‘own’ natural level; for example, at the school
level we may measure school size and denomination, at the class level we measure class
size, and at the pupil level, intelligence and school success. In addition, we may move
variables from one level to another by aggregation or disaggregation. Aggregation means
that the variables at a lower level are moved to a higher level, for instance, by assigning to
the classes the class mean of the pupils’ intelligence scores. Disaggregation means moving
variables to a lower level, for instance by assigning to all pupils in the schools a variable
that indicates the denomination of the school they belong to.
The lowest level (level 1) is usually defined by the individuals. However, this is not
always the case. For instance, in longitudinal designs, repeated measures within individuals
are the lowest level. In such designs, the individuals are at level two, and groups are at level
three. Most software allows for at least three levels, and some software has no formal limit
to the number of levels. However, models with many levels can be difficult to estimate, and
even if estimation is successful, they are unquestionably more difficult to interpret.
At each level in the hierarchy, we may have several types of variables. The distinctions
made in the following are based on the typology offered by Lazarsfeld and Menzel (1961),
with some simplifications. In our typology, we distinguish between global, structural and
contextual variables.
Global variables are variables that refer only to the level at which they are defined,
without reference to other units or levels. A pupil’s intelligence or gender would be a global
variable at the pupil level. School denomination and class size would be global variables at
the school and class level. Simply put: a global variable is measured at the level at which
that variable actually exists.
Structural variables are operationalized by referring to the sub-units at a lower level.
They are constructed from variables at a lower level, for example, in defining the class
variable ‘mean intelligence’ as the mean of the intelligence scores of the pupils in that
class. Using the mean of a lower-level variable as an explanatory variable at a higher level
is called aggregation, and it is a common procedure in multilevel analysis. Other functions
of the lower-level variables are less common, but may also be valuable. For instance, using
the standard deviation of a lower-level variable as an explanatory variable at a higher level
could be used to test hypotheses about the effect of group heterogeneity on the outcome
variable (cf. Klein and Kozlowski, 2000).
Contextual variables are the result from disaggregation; all units at the lower level
receive the value of a global variable for the context to which they belong at the higher
level. For instance, we can assign to all pupils in a school the school size, or the mean
intelligence, as a pupil-level variable. Disaggregation is not needed in a proper multilevel
analysis. For convenience, multilevel data are often stored in a single data file, in which
the group-level variables are repeated for each individual within a group, but the statistical
Introduction to Multilevel Analysis 3
model and the software will correctly recognize these as a single value at a higher level.
The term contextual variable, however, is still used to denote a variable that models how
the context influences an individual.
In order to analyze multilevel models, it is not important to assign each variable to its
proper place in the typology. The benefit of the scheme is conceptual; it makes clear to
which level a measurement properly belongs. Historically, multilevel problems have led to
analysis approaches that moved all variables by aggregation or disaggregation to one single
level of interest followed by an ordinary multiple regression, analysis of variance, or some
other ‘standard’ analysis method. However, analyzing variables from different levels at one
single common level is inadequate, and leads to two distinct types of problems.
The first problem is statistical. If data are aggregated, the result is that different data
values from many sub-units are combined into fewer values for fewer higher-level units. As
a result, much information is lost, and the statistical analysis loses power. On the other hand,
if data are disaggregated, the result is that a few data values from a small number of super-
units are ‘blown up’ into many more values for a much larger number of sub-units. Ordinary
statistical tests treat all these disaggregated data values as independent information from the
much larger sample of sub-units. The proper sample size for these variables is of course the
number of higher-level units. Using the larger number of disaggregated cases for the sample
size leads to significance tests that reject the null-hypothesis far more often than the nominal
alpha level suggests. In other words, investigators come up with many ‘significant’ results
that are totally spurious.
The second problem is conceptual. If the analyst is not very careful in the interpretation
of the results, s/he may commit the fallacy of the wrong level, which consists of analyzing
the data at one level, and formulating conclusions at another level. Probably the best-known
fallacy is the ecological fallacy, which is interpreting aggregated data at the individual
level. It is also known as the ‘Robinson effect’ after Robinson (1950). Robinson presents
aggregated data describing the relationship between the percentage of blacks and the
illiteracy level in nine geographic regions in 1930. The ecological correlation, that is, the
correlation between the aggregated variables at the region level is 0.95. In contrast, the
individual-level correlation between these global variables is 0.20. Robinson concludes
that in practice an ecological correlation is almost certainly not equal to its corresponding
individual-level correlation. For a statistical explanation, see Robinson (1950) or Kreft and
de Leeuw (1987). Formulating inferences at a higher level based on analyses performed at a
lower level is just as misleading. This fallacy is known as the atomistic fallacy.
A better way to look at multilevel data is to realize that there is not one ‘proper’ level
at which the data should be analyzed. Rather, all levels present in the data are important
in their own way. This becomes clear when we investigate cross-level hypotheses, or
multilevel problems. A multilevel problem is a problem that concerns the relationships
between variables that are measured at a number of different hierarchical levels. For
example, a common question is how a number of individual and group variables influence
4 Multilevel Analysis: Techniques and Applications
one single individual outcome variable. Typically, some of the higher-level explanatory
variables may be structural variables, for example the aggregated group means of lower-
level global (individual) variables. The goal of the analysis is to determine the direct effect
of individual- and group-level explanatory variables, and to determine if the explanatory
variables at the group level serve as moderators of individual-level relationships. If group-
level variables moderate lower-level relationships, this shows up as a statistical interaction
between explanatory variables from different levels. In the past, such data were analyzed
using conventional multiple regression analysis with one dependent variable at the lowest
(individual) level and a collection of disaggregated explanatory variables from all available
levels (cf. Boyd & Iversen, 1979). This approach is completely outdated, since it analyzes
all available data at one single level, it suffers from all of the conceptual and statistical
problems mentioned above.
Multilevel research concerns a population with a hierarchical structure. A sample from such
a population can be described as a multistage sample: first, we take a sample of units from
the higher level (e.g., schools), and next we sample the sub-units from the available units
(e.g., we sample pupils from the schools). In such samples, the individual observations are
in general not independent. For instance, pupils in the same school tend to be similar to
each other, because of selection processes (for instance, some schools may attract pupils
from higher social economic status (SES) levels, while others attract lower SES pupils) and
because of the common history the pupils share by going to the same school. As a result,
the average correlation (expressed as the so-called intraclass correlation) between variables
measured on pupils from the same school will be higher than the average correlation between
variables measured on pupils from different schools. Standard statistical tests lean heavily on
the assumption of independence of the observations. If this assumption is violated (and with
nested data this is almost always the case) the estimates of the standard errors of conventional
statistical tests are much too small, and this results in many spuriously ‘significant’ results.
The effect is generally not negligible, small dependencies in combination with medium to
large group sizes still result in large biases in the standard errors. The strong biases that may
be the effect of violation of the assumption of independent observations made in standard
statistical tests has been known for a long time (Walsh, 1947) and are still a very important
assumption to check in statistical analyses (Stevens, 2009).
The problem of dependencies between individual observations also occurs in survey
research, if the sample is not taken at random but cluster sampling from geographical areas
is used instead. For similar reasons as in the school example given above, respondents from
the same geographical area will be more similar to each other than respondents from different
geographical areas are. This leads again to estimates for standard errors that are too small
and produce spurious ‘significant’ results. In survey research, this effect of cluster sampling
Introduction to Multilevel Analysis 5
is well known (cf. Kish, 1965, 1987). It is called a ‘design effect’, and various methods are
used to deal with it. A convenient correction procedure is to compute the standard errors by
ordinary analysis methods, estimate the intraclass correlation between respondents within
clusters, and finally employ a correction formula to the standard errors. For instance, Kish
( )
(1965, p. 259) corrects the sampling variance using veff = v 1 + (nclus − 1) , where veff is
the effective sampling variance, v is the sampling variance calculated by standard methods
assuming simple random sampling, nclus is the cluster size, and ρ is the intraclass correlation.
The intraclass correlation is described in Chapter 2, together with its estimation. The
following example makes clear how important the assumption of independence is. Suppose
that we take a sample of 10 classes, each with 20 pupils. This comes to a total sample
size of 200. We are interested in a variable with an intraclass correlation οf 0.10, which is
a rather low intraclass correlation. However, the effective sample size in this situation is
200 / [1 + (20 – 1)0.1] = 69.0, which is far less than the apparent total sample size of 200.
Clearly, using a sample size of 200 will lead to standard errors that are much too low.
Since the design effect depends on both the intraclass correlation and the cluster size, large
intraclass correlations are partly compensated by small group sizes. Conversely, small intraclass
correlations at the higher levels are offset by the usually large cluster sizes at these levels.
Some of the correction procedures developed for cluster and other complex samples are
quite powerful (cf. Skinner et al., 1989). In principle such correction procedures could also
be applied in analyzing multilevel data, by adjusting the standard errors of the statistical
tests. However, multilevel models are multivariate models, and in general the intraclass
correlation and hence the effective N is different for different variables. In addition, in most
multilevel problems we have not only clustering of individuals within groups, but we also
have variables measured at all available levels, and we are interested in the relationships
between all of these variables. Combining variables from different levels in one statistical
model is a different and more complicated problem than estimating and correcting for
design effects. Multilevel models are designed to analyze variables from different levels
simultaneously, using a statistical model that properly includes the dependencies.
To provide an example of a clearly multilevel problem, consider the ‘frog pond’ theory
that has been utilized in educational and organizational research. The ‘frog pond’ theory
refers to the notion that a specific individual frog may be a medium-sized frog in a pond
otherwise filled with large frogs, or a medium-sized frog in a pond otherwise filled with
small frogs. Applied to education, this metaphor points out that the effect of an explanatory
variable such as ‘intelligence’ on school career may depend on the average intelligence of
the other pupils in the school. A moderately intelligent pupil in a highly intelligent context
may become demotivated and thus become an underachiever, while the same pupil in a
considerably less intelligent context may gain confidence and become an overachiever.
Thus, the effect of an individual pupil’s intelligence depends on the average intelligence
of the other pupils in the class. A popular approach in educational research to investigate
‘frog pond’ effects has been to aggregate variables like the pupils’ IQ into group means, and
6 Multilevel Analysis: Techniques and Applications
then to disaggregate these group means again to the individual level. As a result, the data
file contains both individual-level (global) variables and higher-level (contextual) variables
in the form of disaggregated group means. Already in 1976 the educational researcher
Cronbach suggested to express the individual scores as deviations from their respective
group means (Cronbach, 1976), a procedure that has become known as centering on the
group mean, or group mean centering. Centering on the group means makes very explicit
that the individual scores should be interpreted relative to their group’s mean. The example
of the ‘frog pond’ theory and the corresponding practice of centering the predictor variables
makes clear that combining and analyzing information from different levels within one
statistical model is central to multilevel modeling.
Multilevel data must be described by multilevel theories, an area that seems underdeveloped
compared to the advances made in the modeling and computing machinery. Multilevel models
in general require that the grouping criterion is clear, and that variables can be assigned
unequivocally to their appropriate level. In reality, group boundaries are sometimes fuzzy
and somewhat arbitrary, and the assignment of variables is not always obvious and simple.
In multilevel research, decisions about group membership and operationalizations involve a
range of theoretical assumptions (Klein & Kozlowski, 2000). If there are effects of the social
context on individuals, these effects must be mediated by intervening processes that depend
on characteristics of the social context. When the number of variables at the different levels
is large, there is an enormous number of possible cross-level interactions (discussed in more
detail in Chapter 2). Ideally, a multilevel theory should specify which direct effects and cross-
level interaction effects can be expected. Theoretical interpretation of cross-level interaction
effects between the individual and the context level require a specification of processes within
individuals that cause those individuals to be differentially influenced by certain aspects of the
context. Attempts to identify such processes have been made by, among others, Stinchcombe
(1968), Erbring and Young (1979), and Chan (1998). The common core in these theories is
that they all postulate processes that mediate between individual variables and group variables.
Since a global explanation by ‘group telepathy’ is generally not acceptable, communication
processes and the internal structure of groups become important concepts. These are often
measured as a structural variable. In spite of their theoretical relevance, structural variables
are infrequently used in multilevel research. Another theoretical area that has been largely
neglected by multilevel researchers is the influence of individuals on the group. In multilevel
modeling, the focus is on models where the outcome variable is at the lowest level. Models
that investigate the influence of individual variables on group outcomes are scarce. For a
review of this issue see DiPrete and Forristal (1994), an example is discussed by Alba and
Logan (1992). Croon and van Veldhoven (2007) discuss analysis models for multilevel data
where the outcome variable is at the highest level.
Introduction to Multilevel Analysis 7
Summary
The multilevel regression model has become known in the research literature under a
variety of names, such as ‘random coefficient model’ (Kreft & de Leeuw, 1998), ‘variance
component model’ (Searle et al., 1992; Longford, 1993), and ‘hierarchical linear model’
(Raudenbush & Bryk, 2002; Snijders & Bosker, 2012). Statistically oriented publications
generally refer to the model as a ‘mixed-effects’ or ‘mixed linear model’ (Littell et al.,
1996) and sociologists refer to it as ‘contextual analysis’ (Lazarsfeld & Menzel, 1961).
The models described in these publications are not exactly the same, but they are highly
similar, and we refer to them collectively as ‘multilevel regression models’. The multilevel
regression model assumes that there is a hierarchical data set, often consisting of subjects
nested within groups, with one single outcome or response variable that is measured at the
lowest level, and explanatory variables at all existing levels. The multilevel regression model
can be extended by adding an extra level for multiple outcome variables (see Chapter 10),
while multilevel structural equation models are fully multivariate at all levels (see Chapter
14 and Chapter 15). Conceptually, it is useful to view the multilevel regression model as
a hierarchical system of regression equations. In this chapter, we explain the multilevel
regression model for two-level data, providing both the equations and an example, and later
extend this model with a three-level example.
2.1 Example
Assume that we have data from J classes, with a different number of pupils nj in each class.
On the pupil level, we have the outcome variable ‘popularity’ (Y), measured by a self-rating
scale that ranges from 0 (very unpopular) to 10 (very popular). We have two explanatory
variables on the pupil level: pupil gender (X1: 0 = boy, 1 = girl) and pupil extraversion
(X2, measured on a self-rating scale ranging from 1–10), and one class-level explanatory
variable teacher experience (Z: in years, ranging from 2–25). There are data on 2000 pupils
in 100 classes, so the average class size is 20 pupils. The data are described in Appendix
E. The data files and other support materials are also available online (at https://multilevel-
analysis.sites.uu.nl/).
To analyze these data, we can set up separate regression equations in each class to predict
the outcome variable Y using the explanatory variables X as follows:
8
The Basic Two-Level Regression Model 9
In this regression equation, β0j is the intercept, β1j is the regression coefficient (regression
slope) for the dichotomous explanatory variable gender (i.e., the difference between boys
and girls), β2j is the regression coefficient (slope) for the continuous explanatory variable
extraversion, and eij is the usual residual error term. The subscript j is for the classes (j = 1…J)
and the subscript i is for individual pupils (i = 1…nj). The difference with the usual regression
model is that we assume that each class has a different intercept coefficient β0j, and different
slope coefficients β1j and β2j. This is indicated in Equations 2.1 and 2.2 by attaching a subscript
j to the regression coefficients. The residual errors eij are assumed to have a mean of zero, and
a variance to be estimated. Most multilevel software assumes that the variance of the residual
errors is the same in all classes. Different authors (cf. Goldstein, 2011; Raudenbush & Bryk,
2002) use different systems of notation. This book uses e2 to denote the variance of the lowest
level residual errors.
Figure 2.1 shows a single-level regression line for a dependent variable Y regressed on a
single explanatory variable X. The regression line represents the predicted values ŷ for Y, the
regression coefficient b0 is the intercept, the predicted value for Y if X = 0. The regression slope
β1 indicates the predicted increase in Y if X increases by one unit.
Since in multilevel regression the intercept and slope coefficients vary across the classes,
they are often referred to as random coefficients. Of course, we hope that this variation is
not totally random, so we can explain at least some of the variation by introducing higher-
level variables. Generally, we do not expect to explain all variation, so there will be some
unexplained residual variation. In our example, the specific values for the intercept and the
ŷ
b1
e
b0
slope coefficients are a class characteristic. In general, a class with a high intercept is predicted
to have more popular pupils than a class with a low value for the intercept. Since the model
contains a dummy variable for gender, the value of the intercept reflects the predicted value
for the boys (who are coded as zero). Varying intercepts shift the average value for the
entire class, both boys and girls. Differences in the slope coefficient for gender or extraversion
indicate that the relationship between the pupils’ gender or extraversion and their predicted
popularity is not the same in all classes. Some classes may have a high value for the slope
coefficient of gender; in these classes, the difference between boys and girls is relatively large.
Other classes may have a low value for the slope coefficient of gender; in these classes, gender
has a small effect on the popularity, which means that the difference between boys and girls
is small. Variance in the slope for pupil extraversion is interpreted in a similar way; in classes
with a large coefficient for the extraversion slope, pupil extraversion has a large impact on their
popularity, and vice versa.
Figure 2.2 presents an example with two groups. The panel on the left portrays two groups
with no slope variation, and as a result the two slopes are parallel. The intercepts for both groups
are different. The panel on the right portrays two groups with different slopes, or slope variation.
Note that variation in slopes also has an effect on the difference between the intercepts!
Across all classes, the regression coefficients β0j … β2j are assumed to have a multivariate
normal distribution. The next step in the hierarchical regression model is to explain the
variation of the regression coefficients β0j … β2j by introducing explanatory variables at the
class level, for the intercept
0 j = 00 + 01Z j + u0 j , (2.3)
Y Y
Slope b2
Group 2 Group 2
Common
slope b
Slope b1
Group 1 Group 1
X X
Figure 2.2 Two groups without (left) and with (right) random slopes.
The Basic Two-Level Regression Model 11
Equation 2.3 predicts the average popularity in a class (the intercept β0j) by the teacher’s
experience (Z). Thus, if γ01 is positive, the average popularity is higher in classes with a more
experienced teacher. Conversely, if γ01 is negative, the average popularity is lower in classes
with a more experienced teacher. The interpretation of the equations under 2.4 is a bit more
complicated. The first equation under 2.4 states that the relationship, as expressed by the slope
coefficient β1j, between the popularity (Y) and the gender (X) of the pupil, depends upon the
amount of experience of the teacher (Z). If γ11 is positive, the gender effect on popularity is
larger with experienced teachers. Conversely, if γ11 is negative, the gender effect on popularity
is smaller with more experienced teachers. Similarly, the second equation under 2.4 states, if
γ21 is positive, that the effect of extraversion is larger in classes with an experienced teacher.
Thus, the amount of experience of the teacher acts as a moderator variable for the relationship
between popularity and gender or extraversion; this relationship varies according to the value
of the moderator variable.
The u-terms u0j, u1j and u2j in Equations 2.3 and 2.4 are (random) residual error terms at the
class level. These residual errors uj are assumed to have a mean of zero, and to be independent
from the residual errors eij at the individual (pupil) level. The variance of the residual errors
u0j is specified as u20 , and the variance of the residual errors u1j and u2j are specified as u21
and u22 . The covariances between the residual error terms are denoted by u01 , u02 and u12 ,
which are generally not assumed to be zero.
Note that in Equations 2.3 and 2.4 the regression coefficients γ are not assumed to vary
across classes. They therefore have no subscript j to indicate to which class they belong.
Because they apply to all classes, they are referred to as fixed coefficients. All between-
class variation left in the β coefficients, after predicting these with the class variable Zj, is
assumed to be residual error variation. This is captured by the residual error terms uj, which
do have subscripts j to indicate to which class they belong.
Our model with two pupil-level and one class-level explanatory variables can be written
as a single complex regression equation by substituting Equations 2.3 and 2.4 into Equation
2.1. Substitution and rearranging terms gives:
Yij = 00 + 10 X 1ij + 20 X 2ij + 01Z j + 11 X 1ij Z j + 21 X 2ij Z j
(2.5)
+u1 j X 1ij + u2 j X 2ij + u0 j + eij
The segment [γ00 + γ10 X1ij + γ20 X2ij + γ01Zj + γ11 X1ijZj + γ11 X2ijZj] in Equation 2.5 contains the
fixed coefficients. It is often called the fixed (or deterministic) part of the model. The segment
[u1jX1ij + u2jX2ij + u0j + eij] in Equation 2.5 contains the random error terms, and it is often called
12 Multilevel Analysis: Techniques and Applications
the random (or stochastic) part of the model. The terms X1iZj and X2ijZj are interaction terms
that appear in the model as a consequence of modeling the varying regression slope βj of a
pupil-level variable Xij with the class-level variable Zj. Thus, the moderator effect of Z on the
relationship between the dependent variable Y and the predictor X, is expressed in the single
equation version of the model as a cross-level interaction. The interpretation of interaction
terms in multiple regression analysis is complex, and this is treated in more detail in Chapter
4. In brief, the point made in Chapter 4 is that the substantive interpretation of the coefficients
in models with interactions is much simpler if the variables making up the interaction are
expressed as deviations from their respective means.
Note that the random error terms u1j are connected to the Xij. Since the explanatory
variable Xij and the corresponding error term uj are multiplied, the resulting error term will
be different for different values of the explanatory variable Xij, a situation that in ordinary
multiple regression analysis is called ‘heteroscedasticity’. The usual multiple regression
model assumes ‘homoscedasticity’, which means that the variance of the residual errors is
independent of the values of the explanatory variables. If this assumption is not true, ordinary
multiple regression does not perform very well. This is another reason why analyzing
multilevel data with ordinary multiple regression techniques does not perform well.
As explained in the introduction in Chapter 1, multilevel models are needed because
grouped data observations from the same group are generally more similar to each
other than the observations from different groups, and this violates the assumption of
independence of all observations. The amount of dependence can be expressed as a
correlation coefficient: the intraclass correlation. The methodological literature contains a
number of different formulas to estimate the intraclass correlation ρ. For example, if we
use one-way analysis of variance with the grouping variable as independent variable to test
the group effect on our outcome variable, the intraclass correlation is given by ρ = [MS(B)-
MS(error)] / [MS(B) + (n-1) × MS(error)], where MS(B) is the between-groups mean
square and n is the common group size. Shrout and Fleiss (1979) give an overview of
formulas for the intraclass correlation for a variety of research designs.
The multilevel regression model can also be used to produce an estimate of the intraclass
correlation. The model used for this purpose is a model that contains no explanatory
variables at all, the so-called intercept-only or empty model (also referred to as baseline
model). The intercept-only model is derived from Equations 2.1 and 2.3 as follows. If there
are no explanatory variables X at the lowest level, Equation 2.1 reduces to
Likewise, if there are no explanatory variables Z at the highest level, Equation 2.3 reduces
to
The intercept-only model of Equation 2.8 does not explain any variance in Y. It only
decomposes the variance into two independent components: e2 , which is the variance of the
lowest-level errors eij, and u20 , which is the variance of the highest-level errors u0j. These two
variances sum up to the total variance, hence they are often referred to as variance components.
Using this model, we can define the intraclass correlation ρ by the equation
u20
= . (2.9)
+ e2
2
u0
The intraclass correlation ρ indicates the proportion of the total variance explained by the
grouping structure in the population. Equation 2.9 simply states that the intraclass correlation
is the proportion of group-level variance compared to the total variance.1 The intraclass
correlation ρ can also be interpreted as the expected correlation between two randomly drawn
units that are in the same group.
In the intercept-only model we defined variance of the lowest-level errors and variance
of the highest-level errors. Both terms can be interpreted as unexplained variance on both
levels since there are no predictors in the model specified yet. After adding predictors, just
like in ordinary regression analyses, the R², which is interpreted as the proportion of variance
modeled by the explanatory variables, can be calculated. In the case of multilevel analyses,
however, there is variance to be explained at every level (and also for random slope factors).
The interpretation of these separate R² values are dependent on the ICC-values. For example, if
the R² at the highest level appears to be 0.20 and the ICC is 0.40, then out of 40 percent of the
total variance 20 percent is explained. This is further explained in Chapter 4.
The intercept-only model is useful as a null-model that serves as a benchmark with which
other models are compared. For our pupil popularity example data, the intercept-only model
is written as
The model that includes pupil gender, pupil extraversion and teacher experience, but not
the cross-level interactions, is written as
popularityij = γ00 + γ10 genderij + γ20 extraversionij + γ01 experiencej + u1j genderij
+ u2j extraversionij + u0j + eij.
14 Multilevel Analysis: Techniques and Applications
Table 2.1 presents the parameter estimates and standard errors for both models.2 For
comparison, the first column presents the parameter estimates of a single-level model. The
intercept is estimated correctly, but the variance term combines the level-one and level-two
variances, and is for that reason not meaningful. M0, the intercept-only two-level model, splits
this variance term in a variance at the first and a variance at the second level. The intercept-
only two-level model estimates the intercept as 5.08, which is simply the average popularity
across all classes and pupils. The variance of the pupil-level residual errors, symbolized
by e2 , is estimated as 1.22. The variance of the class-level residual errors, symbolized by
u20 , is estimated as 0.69. All parameter estimates are much larger than the corresponding
standard errors, and calculation of the Z-test shows that they are all significant at p <0.005.3
( )
The intraclass correlation, calculated by equation 2.9 as = u20 / u20 + e2 , is 0.69 / 1.91,
which equals 0.36. Thus, 36 percent of the variance of the popularity scores is at the group
level, which is very high for social science data. Since the intercept-only model contains
no explanatory variables, the residual variances represent unexplained error variance. The
deviance reported in Table 2.1 is a measure of model misfit; when we add explanatory
variables to the model, the deviance will go down.
The second model in Table 2.1 includes pupil gender and extraversion and teacher
experience as explanatory variables. The regression coefficients for all three variables are
significant. The regression coefficient for pupil gender is 1.25. Since pupil gender is coded
0 = boy, 1 = girl, this means that on average the girls score 1.25 points higher than boys on the
popularity measure, when all other variables are kept constant. The regression coefficient for
pupil extraversion is 0.45, which means that with each scale point higher on the extraversion
The Basic Two-Level Regression Model 15
measure, the popularity is expected to increase with 0.45 scale points. The regression
coefficient for teacher experience is 0.09, which means that for each year of experience of
the teacher, the average popularity score of the class goes up by 0.09 points. This does not
seem very much, but the teacher experience in our example data ranges from 2 to 25 years,
so the predicted difference between the least experienced and the most experienced teacher is
(25 – 2 = ) 23 × 0.09 = 2.07 points on the popularity measure. The value of the intercept is
generally not interpreted; it is the expected value of the dependent variable if all explanatory
variables have the value zero. We can use the standard errors of the regression coefficients
reported in Table 2.1 to construct a 95 percent confidence interval. For the regression coefficient
of pupil gender, the 95 percent confidence interval runs from 1.17 to 1.33, the confidence
interval for pupil extraversion runs from 0.39 to 0.51, and the 95 percent confidence interval
for the regression coefficient of teacher experience runs from 0.07 to 0.11.4 Note that the
interpretation of the regression coefficients in the fixed part is no different than in any other
regression model (cf. Aiken & West, 1991).
The model with the explanatory variables includes variance components for the regression
coefficients of pupil gender and pupil extraversion, symbolized by u21 and u22 in Table 2.1.
The variance of the regression coefficients for pupil extraversion across classes is estimated
as 0.03, with a standard error of 0.008. The variance of the regression coefficients for pupil
gender is estimated as zero and not significant, so the hypothesis that the regression slopes for
pupil gender vary across classes is not supported by the data. We should remove the residual
variance term for the gender slopes from the model, and estimate the new model again. Table
2.2 presents the estimates for the model with a fixed slope for the effect of pupil gender.
Table 2.2 also includes the covariance between the class-level errors for the intercept and the
extraversion slope. These covariances are rarely interpreted (for an exception see Chapter 5
and Chapter 16 where growth models are discussed), and for that reason they are often not
included in the reported tables. However, as Table 2.2 demonstrates, they can be quite large
and significant, so as a rule they are always included in the model.
The significant variance of the regression slopes for pupil extraversion implies that we
should not interpret the estimated value of 0.45 without considering this variation. In an
ordinary regression model, without multilevel structure, the value of 0.45 means that for each
point difference on the extraversion scale, the pupil popularity goes up by 0.45, for all pupils
in all classes. In our multilevel model, the regression coefficient for extraversion varies across
the classes, and the value of 0.45 is just the expected value (the mean) across all classes. The
varying regression slopes for pupil extraversion are assumed to follow a normal distribution.
The variance of this distribution is in our example estimated as 0.034. Interpretation of this
variation is easier when we consider the standard deviation, which is the square root of the
variance and equal to 0.18 in our example data. A useful characteristic of the standard deviation
is that with normally distributed observations, about 67 percent of the observations lie between
one standard deviation below and above the mean, and about 95 percent of the observations lie
between two standard deviations below and above the mean. If we apply this to the regression
coefficients for pupil gender, we conclude that about 67 percent of the regression coefficients
are expected to lie between (0.45 – 0.18 = ) 0.27 and (0.45 + 0.18 = ) 0.63, and about 95 percent
are expected to lie between (0.45 – 0.37 = ) 0.08 and (0.45 + 0.37 = ) 0.82. The more precise
value of Z.975 = 1.96 leads to the 95 percent predictive interval calculated as 0.09 – 0.81. We can
also use the standard normal distribution to estimate the percentage of regression coefficients
that are negative. As it turns out, if the mean regression coefficient for pupil extraversion is
0.45, given the estimated slope variance, less than 1 percent of the classes are expected to have
a regression coefficient that is actually negative. Note that the 95 percent interval computed
here is totally different from the 95 percent confidence interval for the regression coefficient of
pupil extraversion, which runs from 0.41 to 0.50. The 95 percent confidence interval applies to
γ20, the mean value of the regression coefficients across all the classes. The 95 percent interval
calculated here is the 95 percent predictive interval, which expresses that 95 percent of the
regression coefficients of the variable ‘pupil extraversion’ in the classes are predicted to lie
between 0.09 and 0.81.
Given the significant variance of the regression coefficient of pupil extraversion across the
classes, it is attractive to attempt to predict its variation using class-level variables. We have
one class-level variable: teacher experience. The individual level regression equation for this
example, using variable labels instead of symbols, is given by:
The regression coefficient β1 for pupil gender does not have a subscript j, because it is not
assumed to vary across classes. The regression equations predicting β0j, the intercept in class
j, and β2j, the regression slope of pupil extraversion in class j, are given by Equation 2.3 and
Equation 2.4, which are rewritten below using variable labels
The Basic Two-Level Regression Model 17
0 j = 00 + 01experience j + u0 j
(2.11)
2 j = 20 + 21experience j + u2 j .
The algebraic manipulations of the equations above make clear that to explain the variance
of the regression slopes β2j, we need to introduce an interaction term in the model. This
interaction, between the variables pupil extraversion and teacher experience, is a cross-level
interaction, because it involves explanatory variables from different levels. Table 2.3 presents
the estimates from a model with this cross-level interaction. For comparison, the estimates for
the model without this interaction are also included in Table 2.3.
The estimates for the fixed coefficients in Table 2.3 are similar for the effect of pupil gender,
but the regression slopes for pupil extraversion and teacher experience are considerably larger in
the cross-level model. The interpretation remains the same: extraverted pupils are more popular.
The regression coefficient for the cross-level interaction is –0.03, which is small but significant.
This interaction is formed by multiplying the scores for the variables ‘pupil extraversion’ and
‘teacher experience’, and the negative value means that with experienced teachers, the advantage
of extraverted is smaller than expected from the direct effects only. Thus, the difference between
extraverted and introverted pupils is smaller with more experienced teachers.
18 Multilevel Analysis: Techniques and Applications
Comparison of the other results between the two models shows that the variance component
for pupil extraversion goes down from 0.03 in the main effects model to 0.005 in the cross-level
model. Apparently, the cross-level model explains some of the variation of the slopes for pupil
extraversion. The deviance also goes down, which indicates that this model fits better than the
previous model. The other differences in the random part are more difficult to interpret. Much
of the difficulty in reconciling the estimates in the two models in Table 2.3 stems from adding
an interaction effect. This issue is discussed in more detail in Chapter 4.
The coefficients in the tables are all unstandardized regression coefficients. To interpret
them properly, we must take the scale of the explanatory variables into account. In multiple
regression analysis, and structural equation models (SEM) for that matter, the regression
coefficients are often standardized because that facilitates the interpretation when one wants
to compare the effects of different variables within one sample. If the goal of the analysis is
to compare parameter estimates from different samples to each other, one should always use
unstandardized coefficients. To standardize the regression coefficients, as presented in Table
2.1 or Table 2.3, one could standardize all variables before putting them into the multilevel
analysis. However, this would in general also change the estimates of the variance components,
and their standard errors as well. Therefore, it is better to derive the standardized regression
coefficients from the unstandardized coefficients:
In our example data, the standard deviations are: 1.38 for popularity, 0.51 for gender, 1.26
for extraversion, and 6.55 for teacher experience. Table 2.4 presents the unstandardized and
standardized coefficients for the second model in Table 2.2. It also presents the estimates that
we obtain if we first standardize all variables, and then carry out the analysis.
Table 2.4 shows that the standardized regression coefficients are almost the same as
the regression coefficients estimated for standardized variables. The small differences in
Table 2.4 are simply due to rounding errors. However, if we use standardized variables
in our analysis, we find very different variance components and a very different value
for the deviance. This is not only the effect of scaling the variables differently; the
covariance between the slope for pupil extraversion and the intercept is significant for
the unstandardized variables, but not significant for the standardized variables. This kind
of difference in results is general. The fixed part of the multilevel regression model is
invariant for linear transformations, just as the regression coefficients in the ordinary
single-level regression model. This means that if we change the scale of our explanatory
variables, the regression coefficients and the corresponding standard errors change by
the same multiplication factor, and all associated p-values remain exactly the same.
However, the random part of the multilevel regression model is not invariant for linear
transformations. The estimates of the variance components in the random part can and do
change, sometimes dramatically. This is discussed in more detail in Section 4.2 in Chapter
The Basic Two-Level Regression Model 19
4. The conclusion to be drawn here is that, if we have a complicated random part, including
random components for regression slopes, we should think carefully about the scale of our
explanatory variables. If our only goal is to present standardized coefficients in addition
to the unstandardized coefficients, applying Equation 2.13 is safer than transforming our
variables. On the other hand, we may estimate the unstandardized results, including the
random part and the deviance, and then re-analyze the data using standardized variables,
merely using this analysis as a computational trick to obtain the standardized regression
coefficients without having to do hand calculations.
In principle, the extension of the two-level regression model to three and more levels is
straightforward. There is an outcome variable at the first, the lowest level. In addition,
there may be explanatory variables at all available levels. The problem is that three-
and more level models can become complicated very fast. In addition to the usual fixed
regression coefficients, we must entertain the possibility that regression coefficients for
first-level explanatory variables may vary across units of both the second and the third
levels. Regression coefficients for second-level explanatory variables may vary across units
of the third level. To explain such variation, we must include cross-level interactions in the
model. Regression slopes for the cross-level interaction between first-level and second-
20 Multilevel Analysis: Techniques and Applications
level variables may themselves vary across third-level units. To explain such variation, we
need a three-way interaction involving variables at all three levels.
The equations for such models are complicated, especially when we do not use the more
compact summation notation but write out the complete single equation-version of the model
in an algebraic format (for a note on notation see Section 2.4).
The resulting models are not only difficult to follow from a conceptual point of view;
they may also be difficult to estimate in practice. The number of estimated parameters is
considerable, and at the same time the highest level sample size tends to become relatively
smaller. As DiPrete and Forristal (1994, p. 349) put it, the imagination of the researchers “…
can easily outrun the capacity of the data, the computer, and current optimization techniques
to provide robust estimates.”
Nevertheless, three- and more level models have their place in multilevel analysis.
Intuitively, three-level structures such as pupils in classes in schools, or respondents nested
within households, nested within regions, appear to be both conceptually and empirically
manageable. If the lowest level is repeated measures over time, having repeated measures on
pupils nested within schools again does not appear to be overly complicated. In such cases, the
solution for the conceptual and statistical problems mentioned is to keep models reasonably
small. Especially specification of the higher-level variances and covariances should be driven
by theoretical considerations. A higher-level variance for a specific regression coefficient
implies that this regression coefficient is assumed to vary across units at that level. A higher-
level covariance between two specific regression coefficients implies that these regression
coefficients are assumed to covary across units at that level. Especially when models become
large and complicated, it is advisable to avoid higher-order interactions, and to include in the
random part only those elements for which there is strong theoretical or empirical justification.
This implies that an exhaustive search for second-order and higher-order interactions is not
a good idea. In general, we should seek for higher-order interactions only if there is strong
theoretical justification for their importance, or if an unusually large variance component for
a regression slope calls for explanation. For the random part of the model, there are usually
more convincing theoretical reasons for the higher-level variance components than for the
covariance components. Especially if the covariances are small and non-significant, analysts
sometimes do not include all possible covariances in the model. This is defensible, with some
exceptions. First, it is recommended that the covariances between the intercept and the random
slopes are always included. Second, it is recommended to include covariances corresponding
to slopes of dummy variables belonging to the same categorical variable, and for variables that
are involved in an interaction or belong to the same polynomial expression.
u20
= . (2.9, repeated)
u20 + e2
The intraclass correlation is an indication of the proportion of variance at the second level,
and it can also be interpreted as the expected (population) correlation between two randomly
chosen individuals within the same group.
If we have a three-level model, for instance pupils nested within classes, nested within
schools, there are two ways to calculate the intraclass correlation. First, we estimate an
intercept-only model for the three-level data, for which the single-equation model can be
written as follows:
The variances at the first, second, and third level are respectively e2 , u20 , and v20 . The first
method (cf. Davis & Scott, 1995) defines the intraclass correlations at the class and school level as
u20
class = , (2.16)
v20 + u20 + e2
and
v20
school = . (2.17)
v20 + u20 + e2
The second method (cf. Siddiqui et al., 1996) defines the intraclass correlations at the class
and school level as
v20 + u20
class = , (2.18)
v20 + u20 + e2
and
v20
school = . (2.19)
v20 + u20 + e2
Actually, both methods are correct (Algina, 2000). The first method identifies the
proportion of variance at the class and school level. This should be used if we are interested
in a decomposition of the variance across the available levels, or if we are interested in how
much variance is located at each level (a topic discussed in Section 4.5). The second method
represents an estimate of the expected (population) correlation between two randomly chosen
elements in the same group. So ρclass as calculated in Equation 2.18 is the expected correlation
between two pupils within the same class, and it correctly takes into account that two pupils
who are in the same class must by definition also be in the same school. For this reason, the
variance components for classes and schools must both be in the numerator of Equation 2.18.
If the two sets of estimates are different, which may happen if the amount of variance at the
22 Multilevel Analysis: Techniques and Applications
school level is large, there is no contradiction involved. Both sets of equations express two
different aspects of the data, which happen to coincide when there are only two levels. The first
method, which identifies the proportion of variance at each level, is the one most often used.
The data in this example are from a hypothetical study on stress in hospitals. The data
are from nurses working in wards nested within hospitals. In each of 25 hospitals, four
wards are selected and randomly assigned to an experimental and control condition. In the
experimental condition, a training program is offered to all nurses to cope with job-related
stress. After the program is completed, a sample of about 10 nurses from each ward is given
a test that measures job-related stress. Additional variables are: nurse age (years), nurse
experience (years), nurse gender (0 = male, 1 = female), type of ward (0 = general care,
1 = special care), and hospital size (0 = small, 1 = medium, 2 = large).
This is an example of an experiment where the experimental intervention is carried
out on a higher level, in this example the ward level. In biomedical research this design is
known as a multisite cluster randomized trial. They are quite common also in educational
and organizational research, where entire classes or schools are assigned to experimental and
control conditions. Since the design variable Experimental versus Control group (ExpCon)
is manipulated at the second (ward) level, we can study whether the experimental effect is
different in different hospitals, by defining the regression coefficient for the ExpCon variable as
random at the hospital level.
In this example, the variable ExpCon is of main interest, and the other variables are
covariates. Their function is to control for differences between the groups, which can occur
even if randomization is used, especially with small samples, and to explain variance in the
outcome variable stress. To the extent that these variables successfully explain variance, the
power of the test for the effect of ExpCon will be increased. Therefore, although logically
we can test if explanatory variables at the first level have random coefficients at the second
or third level, and if explanatory variables at the second level have random coefficients at the
third level, these possibilities are not pursued. We do test a model with a random coefficient
for ExpCon at the third level, where there turns out to be significant slope variation. This
varying slope can be predicted by adding a cross-level interaction between the variables
expcon and hospsize. In view of this interaction, the variables expcon and hospsize have been
centered on their overall mean.5 Table 2.5 presents the results for a series of models.
The equation for the first model, the intercept-only model is
This produces the variance estimates in the M0 column of Table 2.5. The proportion of
variance (ICC) is 0.52 at the ward level, and 0.17 at the hospital level, calculated following
Equations 2.18 and 2.19. The nurse-level and the ward-level variances are evidently significant.
The Basic Two-Level Regression Model 23
Fixed part Coefficient (s.e.) Coefficient (s.e.) Coefficient (s.e.) Coefficient (s.e.)
Intercept 5.00 (0.11) 5.50 (.12) 5.46 (.12) 5.50 (.11)
ExpCon a
–0.70 (.12) –0.70 (.18) –0.50 (.11)
Age 0.02 (.002) 0.02 (.002) 0.02 (.002)
Gender –0.45 (.03) –0.45 (.03) –0.45 (.03)
Experience –0.06 (.004) –0.06 (.004) –0.06 (.004)
Ward type 0.05 (.12) 0.05 (.07) 0.05 (.07)
Hospital size a
0.46 (.12) 0.29 (.12) 0.46 (.12)
Exp × HSize 1.00 (.16)
Random part
σ e2 ijk 0.30 (.01) 0.22 (.01) 0.22 (.01) 0.22 (.01)
σu2 0 jk 0.49 (.09) 0.33 (.06) 0.11 (.03) 0.11 (.03)
σ v20 k 0.16 (.09) 0.10 (0.05) 0.166 (.06) 0.15 (.05)
σ 2
u1k
0.66 (.22) 0.18 (.09)
Deviance 1942.4 1604.4 1574.2 1550.8
a
Centered on grand mean
The test statistic for the hospital-level variance is Z = 0.162 / 0.0852 = 1.901, which produces
a one-sided p-value of 0.029. The hospital-level variance is significant at the 5 percent level.
The sequence of models in Table 2.5 shows that all predictor variables have a significant effect,
except the ward type, and that the experimental intervention significantly lowers stress. The
experimental effect varies across hospitals, and a large part of this variation can be explained
by hospital size; in large hospitals the experimental effect is smaller.
2.4.1 Notation
In general, there will be more than one explanatory variable at the lowest level and more
than one explanatory variable at the highest level. Assume that we have P explanatory
variables X at the lowest level, indicated by the subscript p (p = 1…P). Likewise, we have
Q explanatory variables Z at the highest level, indicated by the subscript q (q = 1…Q). Then,
Equation 2.5 becomes the more general equation:
24 Multilevel Analysis: Techniques and Applications
Yij = γ00 + γp0 Xpij + γ0q Zqj + γpq ZqjXpij + upj Xpij + u0j + eij . (2.21)
The errors at the lowest level eij are assumed to have a normal distribution with a mean
of zero and a common variance e2 in all groups. The u-terms u0j and upj are the residual
error terms at the highest level. They are assumed to be independent from the errors eij at
the individual level, and to have a multivariate normal distribution with means of zero.
The variance of the residual errors u0j is the variance of the intercepts between the groups,
symbolized by u20 . The variances of the residual errors upj are the variances of the slopes
between the groups, symbolized by u2p . The covariances between the residual error terms
u pp ' are generally not assumed to be zero; they are collected in the higher-level variance/
covariance matrix Ω.6
Note that in Equation 2.15, γ00, the regression coefficient for the intercept, is not associated
with an explanatory variable. We can expand the equation by providing an explanatory
variable that is a constant equal to one for all observed units. This yields the equation
where X0ij = 1, and p = 0…P. Equation 2.23 makes clear that the intercept is a regression
coefficient, just like the other regression coefficients in the equation. Some multilevel
software, for instance HLM (Raudenbush et al., 2011) puts the intercept variable X0 = 1 in
the regression equation by default. Other multilevel software, for instance MLwiN (Rasbash
et al., 2015), requires that the analyst includes a variable in the data set that equals one in all
cases, which must be added explicitly to the regression equation.
Equation 2.23 can be made very general if we let X be the matrix of all explanatory variables
in the fixed part, symbolize the residual errors at all levels by u(l) with l denoting the level, and
associate all error components with predictor variables Z, which may or may not be equal to
the X. This produces the very general matrix formula Y = Xβ + Z(l)u(l) (cf. Goldstein, 2011,
Appendix 2.1). Since this book is more about applications than about mathematical statistics,
it generally uses the algebraic notation, except when multivariate procedures such as structural
equation modeling are discussed.
The notation used in this book is close to the notation used by Goldstein (2011) and Kreft
and de Leeuw (1998). The most important difference is that these authors indicate the higher-
level variance by σ00 instead of our u20 . The logic is that, if σ01 indicates the covariance
between variables 0 and 1, then σ00 is the covariance of variable 0 with itself, which is its
variance. Raudenbush and Bryk (2002), and Snijders and Bosker (2012) use a different
notation; they denote the lowest level error terms by rij, and the higher-level error terms by uj.
The Basic Two-Level Regression Model 25
The lowest-level variance is σ2 in their notation. The higher-level variances and covariances
are indicated by the Greek letter τ (tau); for instance, the intercept variance is given by τ00. The
τpp are collected in the matrix Tau, symbolized as T. The HLM program and manual in part use
a different notation, for instance when discussing longitudinal and three-level models.
In models with more than two levels, two different notational systems are used. One
approach is to use different Greek characters for the regression coefficients at different levels,
and different (Greek or Latin) characters for the variance terms at different levels. With many
levels, this becomes cumbersome, and it is simpler to use the same character, say β for the
regression slopes and u for the residual variance terms, and let the number of subscripts
indicate to which level these belong.
2.4.2 Software
Multilevel models can be formulated in two ways: (1) by presenting separate equations
for each of the levels, and (2) by combining all equations by substitution into a single
model-equation. The softwares HLM (Raudenbush et al., 2011) and Mplus (Muthén &
Muthén, 1998–2015) require specification of the separate equations at each available level.
Most other software, e.g., MLwiN (Rasbash et al., 2015), SAS Proc Mixed (Littell et al.,
1996), SPSS command Mixed (Norusis, 2012), and the R package LME4 (Bates et al.,
2015) use the single equation representation. Both representations have their advantages
and disadvantages. The separate-equation representation has the advantage that it is always
clear how the model is built up. The disadvantage is that it hides from view that modeling
regression slopes by other variables is equivalent to adding a cross-level interaction to the
model. As will be explained in Chapter 4, estimating and interpreting interactions correctly
requires careful thinking. On the other hand, while the single-equation representation
makes the existence of interactions obvious, it conceals the role of the complicated error
components that are created by modeling varying slopes. In practice, to keep track of the
model, it is recommended to start by writing the separate equations for the separate levels,
and to use substitution to arrive at the single-equation representation.
To take a quote from Singer’s excellent introduction to using SAS Proc Mixed for multilevel
modeling (Singer, 1998, p. 350): ‘Statistical software does not a statistician make. That said,
without software, few statisticians and even fewer empirical researchers would fit the kinds of
sophisticated models being promulgated today.’ Indeed, software does not make a statistician,
but the advent of powerful and user-friendly software for multilevel modeling has had a
large impact in research fields as diverse as education, organizational research, demography,
epidemiology, and medicine. This book focuses on the conceptual and statistical issues that
arise in multilevel modeling of complex data structures. It assumes that researchers who apply
these techniques have access to and familiarity with some software that can estimate these
models. Specific software is mentioned in some places, but only if a technique is discussed
that requires specific software features or is only available in a specific program.
26 Multilevel Analysis: Techniques and Applications
Since statistical software evolves rapidly, with new versions of the software coming
out much faster than new editions of general handbooks such as this, we do not discuss
software setups or output in detail. As a result, this book is more about the possibilities
offered by the various techniques than about how these things can be done in a specific
software package. The techniques are explained using analyses on small but realistic data
sets, with examples of how the results could be presented and discussed. At the same time, if
the analysis requires that the software used have some specific capacities, these are pointed
out. This should enable interested readers to determine whether their software meets these
requirements, and assist them in working out the software setups for their favorite package.
In addition to the relevant program manuals, several software programs have been
discussed in introductory articles. Using SAS Proc Mixed for multilevel and longitudinal data
is discussed by Singer (1998). Peugh and Enders (2005) discuss SPSS Mixed using Singer’s
examples. Both Arnold (1992), and Heck and Thomas (2009) discuss multilevel modeling
using HLM and Mplus as the software tool. Sullivan, Dukes and Losina (1999) discuss HLM
and SAS Proc Mixed. West, Welch and Gatecki (2007) present a series of multilevel analyses
using SAS, SPSS, R, Stata and HLM. Heck, Thomas and Tabata (2012, 2014) discuss SPSS.
Finally, the multilevel modeling program at the University of Bristol maintains a multilevel
homepage that contains a series of software reviews. The homepage for this book contains
links to these and other multilevel resources (at https://multilevel-analysis.sites.uu.nl/).
The data sets used in the examples are described in the resources. In addition, it contains
the data sets used in the examples and described in Appendix E (https://multilevel-analysis.
sites.uu.nl/).
Notes
1 The intraclass correlation is an estimate of the proportion of group-level variance in the population. The
proportion of group-level variance in the sample is given by the correlation ratio η² (eta-squared, cf.
Tabachnick & Fidell, 2013, p. 54): μ² = SS(B)/SS(Total).
2 For reasons to be explained later, different options for the details of the maximum likelihood
estimation procedure may result in slightly different estimates. So, if you re-analyze the example
data from this book, the results may differ slightly from the results given here. However, these
differences should never be so large that you would draw entirely different conclusions.
3 Testing variances is preferably done with a test based on the deviance, which is explained in Chapter 3.
4 Chapter 3 treats the interpretation of confidence intervals in more detail.
5 Chapter 4 discusses the interpretation of interactions and centering.
6 We may attach a subscript to Ω to indicate to which level it belongs. As long as there is no risk of
confusion, the simpler notation without the subscript is used.
3
Estimation and Hypothesis Testing in
Multilevel Regression
Summary
The usual method to estimate the values of the regression coefficients and the intercept
and slope variances is the maximum likelihood estimation method. This chapter gives a
non-technical explanation of maximum likelihood estimation, to enable analysts to make
informed decisions on the estimation options offered by current software. Some alternatives
to maximum likelihood estimation are briefly discussed. Other estimation methods, such as
Bayesian estimation methods and bootstrapping, are also briefly introduced in this chapter.
Finally, this chapter describes some procedures that can be used to compare nested and non-
nested models, which are especially useful when variance terms are tested.
Maximum likelihood (ML) is the most commonly used estimation method in multilevel
modeling. The results presented in Chapter 2 are all obtained using full ML estimation.
An advantage of the maximum likelihood estimation method is that it is generally robust,
and produces estimates that are asymptotically (i.e, when the sample size approximates
infinity) efficient and consistent. With large samples, ML estimates are usually robust
against mild violations of the assumptions, such as having non-normal errors. Maximum
likelihood estimation proceeds by maximizing a function called the likelihood function.
27
28 Multilevel Analysis: Techniques and Applications
Two different likelihood functions are used in multilevel regression modeling. One is
full maximum likelihood (FML); in this method, both the regression coefficients and the
variance components are included in the likelihood function. The other estimation method
is restricted maximum likelihood (RML); here only the variance components are included in
the likelihood function, and the regression coefficients are estimated in a second estimation
step. Both methods produce parameter estimates with associated standard errors and an
overall model deviance, which is a function of the likelihood. FML treats the regression
coefficients as fixed but unknown quantities when the variance components are estimated,
but does not take into account the degrees of freedom lost by estimating the fixed effects.
RML estimates the variance components after removing the fixed effects from the model (cf.
Searle et al., 1992, Chapter 6). As a result, FML estimates of the variance components are
biased; they are generally too small. RML estimates have less bias (Longford, 1993). RML
also has the property, that if the groups are balanced (have equal group sizes), the RML
estimates are equivalent to ANOVA estimates, which are optimal (Searle et al., 1992, p. 254).
Since RML is more realistic, it should, in theory, lead to better estimates, especially when
the number of groups is small (Bryk & Raudenbush, 1992; Longford, 1993). In practice, the
differences between the two methods are usually small (cf. Hox, 1998; Kreft & de Leeuw,
1998). For example, if we compare the FML estimates for the intercept-only model for the
popularity data in Table 2.1 with the corresponding RML estimates, the only difference
within two decimals is the intercept variance at level two. FML estimates this as 0.69,
and RML as 0.70. The size of this difference is absolutely trivial. If nontrivial differences
are found, the RML method is preferred (Browne, 1998). FML still continues to be used,
because it has two advantages over RML. Firstly, the computations are generally easier, and
secondly, since the regression coefficients are included in the likelihood function, an overall
chi-square test based on the likelihood can be used to compare two models that differ in the
fixed part (the regression coefficients). With RML, only differences in the random part (the
variance components) can be compared with this test. Most tables in this book have been
produced using FML estimation; if RML is used this is explicitly stated in the text.
Computing the maximum likelihood estimates requires an iterative procedure. At the
start, the computer program generates reasonable starting values for the various parameters
(for example based on single-level regression estimates). In the next step, an ingenious
computation procedure tries to improve upon the starting values, to produce better estimates.
This second step is repeated (iterated) many times. After each iteration, the program inspects
how much the estimates actually changed compared to the previous step. If the changes are
very small, the program concludes that the estimation procedure has converged and that it is
finished. Using multilevel software, we generally take the computational details for granted.
However, computational problems do sometimes occur. A problem common to programs
using an iterative maximum likelihood procedure is that the iterative process is not always
guaranteed to stop. There are models and data sets for which the program may go through an
endless sequence of iterations, which can only be ended by stopping the program. Because of
Estimation and Hypothesis Testing in Multilevel Regression 29
this, most programs set a built-in limit to the maximum number of iterations. If convergence
is not reached within this limit, the computations can be repeated with a higher limit. If the
computations do not converge after an extremely large number of iterations, we suspect that
they may never converge.1 The problem is how one should interpret a model that does not
converge. The usual interpretation is that a model for which convergence cannot be reached is
a bad model, using the simple argument that if estimates cannot be found, this disqualifies the
model. However, the problem may also lie with the data. Especially with small samples, the
estimation procedure may fail even if the model is valid. In addition, it is even possible that,
if only we had a better computer algorithm, or better starting values, we could find acceptable
estimates. Still, experience shows that if a program does not converge with a data set of
reasonable size, the problem often is a badly misspecified model. In multilevel analysis, non-
convergence often occurs when we try to estimate too many random (variance) components
that are actually close or equal to zero. The solution is to simplify the model by leaving out
some random components; often the estimated values from the non-converged solution
provide an indication which random components can be omitted. The strategy you apply to
solve convergence issues should be reported in your logbook and/or paper.
Generalized least squares (GLS) is an extension of the standard estimation ordinary least
squares (OLS) method that allows for heterogeneity and observations that differ in sampling
variance. GLS estimation approximates ML estimates, and they are asymptotically
equivalent. Asymptotic equivalence means that in very large samples they are in practice
indistinguishable. ‘Expected GLS’ estimates can be obtained from a maximum likelihood
procedure by restricting the number of iterations to one. Since GLS estimates are obviously
faster to compute than full ML estimates, they can be used as a stand-in for ML estimates
in computationally intensive procedures such as extremely large data sets. They can also be
used when ML procedures fail to converge; inspecting the GLS results may help to diagnose
the problem. Simulation research has shown that GLS estimates are less efficient, and that
the GLS-derived standard errors are inaccurate (cf. Hox, 1998; van der Leeden et al., 2008;
Kreft, 1996). Therefore, in general, ML estimation should be preferred.
The generalized estimating equations method (GEE, cf. Liang & Zeger, 1986) estimates
the variances and covariances in the random part of the multilevel model directly from
the residuals, which makes them faster to compute than full ML estimates. Typically, the
dependences in the multilevel data are accounted for by a very simple model, represented by
a working correlation matrix. For individuals within groups, the simplest assumption is that
the respondents within the same group all have the same correlation. For repeated measures,
30 Multilevel Analysis: Techniques and Applications
a simple autocorrelation structure is usually assumed. After the estimates for the variance
components are obtained, GLS is used to estimate the fixed regression coefficients. Robust
standard errors are generally used to counteract the approximate estimation of the random
structure. For non-normal data this results in a population average model, where the emphasis
is on estimating average population effects and not on modeling individual differences.
According to Goldstein (2011) and Raudenbush & Bryk (2002), GEE estimates are less
efficient than full ML estimates, but they make weaker assumptions about the structure
of the random part of the multilevel model. If the model for the random part is correctly
specified, ML estimators are more efficient, and the model-based (ML) standard errors are
generally smaller than the GEE-based robust standard errors. If the model for the random
part is incorrect, the GEE-based estimates and robust standard errors are still consistent.
So, provided the sample size is reasonably large, GEE estimators are robust against
misspecification of the random part of the model, including violations of the normality
assumption. A drawback of the GEE approach is that it only approximates the random
effects structure, and therefore the random effects cannot be analyzed in detail. Most
software will simply estimate a full unstructured covariance matrix for the random part,
which makes it impossible to estimate random effects for the intercept or slopes. Given
the general robustness of ML methods, it is preferable to use ML methods when these are
available, and use robust estimators or bootstrap corrections when there is serious doubt
about the assumptions of the ML method. Robust estimators, which are used with GEE
estimators (Burton et al., 1998), are treated in more detail in Chapter 13 of this book.
In many different fields, including the field of multilevel analysis, Bayesian statistics is gaining
popularity (van de Schoot et al., 2017), mainly because it can deal with all kinds of technical
issues, for example multicollinearity (Can et al., 2014) or non-normality (see Chapter 13), or
because it can deal with smaller sample sizes on the highest level (e.g., Baldwin & Fellingham,
2013). The scope of this paragraph is not to provide a full introduction to Bayesian multilevel
modeling, for this we refer to Hamaker and Klugkist (2011). For a very gentle introduction to
Bayesian modeling, we refer the novice reader to, among many others, Kaplan (2014), or
van de Schoot et al. (2014). More detailed information about Bayesian multilevel modeling
can be found in Gelman and Hill (2007). For a discussion in the context of MLwiN see
Browne (2005). In the current chapter, and see also Section 13.5, we want to highlight some
important characteristics of Bayesian estimation.
There are three essential ingredients underlying Bayesian statistics. The first ingredient is
the background knowledge of the parameters in the model being tested. This first ingredient
refers to all knowledge available before seeing the data and is captured in the so-called prior
distribution. The prior is a probability distribution reflecting the researchers’ beliefs about the
value of the parameter in the population, and the amount of uncertainty the researcher has
Estimation and Hypothesis Testing in Multilevel Regression 31
regarding this belief. Researchers may have a great degree of certainty in their belief, and
therefore specify an “informative prior”—that is, a prior with a low variance. In contrast, they
may have very little certainty in this belief, and consequently specify a non-informative prior—
that is, a prior with a large variance, also known as a diffuse or flat prior. The informativeness
of a prior is governed by hyperparameters. For example, the hyperparameters for a normal
distribution are the mean and variance terms that dictate the location and spread of the normal
distribution. A normally distributed prior would be written N(μ,σ2), where N denotes that the
prior follows a normal distribution (other distributions can also be specified in a model), the
mean of the prior is given by μ, and σ2 is the prior variance. Consequently, μ can be based on
background information about the model parameter value, and σ2 can be used to specify how
certain we are about the value of μ. The more informative a prior, the larger the impact it will
have on final model results, especially if the prior is combined with small sample sizes. If a non-
informative prior is desired, this is accomplished by specifying a very large variance for the
prior. Many simulations studies have shown that the more information is captured via the prior
distribution the smaller the sample size can become while maintaining power and precision.
The second ingredient in Bayesian estimation is the information in the data itself. It is
the observed evidence expressed in terms of the likelihood function of the data given the
parameters. In other words, the likelihood function asks: “given a set of parameters, such as the
mean and/or the variance, what is the likelihood or probability of the data at hand?”
The third ingredient is based on combining the first two ingredients, which is called
posterior inference. Both (1) and (2) are combined via Bayes Theorem and are summarized
by the so-called posterior distribution, which is a combination of the prior knowledge
and the observed evidence. The posterior distribution reflects one’s updated knowledge,
balancing prior knowledge with observed data. Given that the posterior is a combination of
information from the prior and the data, a more informative prior has a larger impact on the
posterior (or final result).
The use of prior knowledge is one of the main elements that separate Bayesian and
frequentist methods. However, the process of estimating a Bayesian model can also be quite
different. Typically, Markov chain Monte Carlo (MCMC) methods are used, where estimation
is conducted through the use of a Markov chain—or a chain that captures the nature of the
posterior. Given that the posterior is a distribution (rather than a single, fixed number), we
need to sample from it in order to obtain a “best guess” of what the posterior looks like.
These samples from the posterior distribution form what we refer to as a chain. Every model
parameter has a chain associated with it, and once that chain has converged (i.e., the mean—or
horizontal middle of the chain— and the variance—or height of the chain—have stabilized),
we use the information in the chain to derive the final model estimates. Often, the beginning
portion of the chain is discarded because it represents an unstable part before convergence is
reached; this portion of the chain is called the burn-in phase. The last portion of the chain, the
post burn-in phase of the chain, is then used as the estimated posterior distribution where final
model estimates are obtained.
32 Multilevel Analysis: Techniques and Applications
The prior has the potential to have a rather large impact on final model results (even if
it is non-informative). As a result, it is important to report all details surrounding the prior
(see Depaoli & van de Schoot, 2017), which include: the distribution shape selected, the
hyperparameters (i.e., the level of informativeness), and the source of the prior information.
Equally important is to report a sensitivity analysis of priors to illustrate how robust final
model results are when priors are slightly (or even greatly) modified; this provides a better
understanding of the role of the prior in the analysis. Finally, it is also important to report all
information surrounding the assessment of chain convergence. Final model estimates are only
trustworthy if the Markov chain has successfully converged for every model parameter, and
reporting how this was assessed is a key component to a Bayesian analysis.
Bayesian multilevel estimation methods are discussed in more detail in Chapter 13 where
robust estimation methods are discussed to deal with non-normality, and in Chapter 12 where
sample size issues are discussed.
3.3 Bootstrapping
Bootstrapping is not, by itself, a different estimation method. In its simplest form, the
bootstrap (Efron, 1982; Efron & Tibshirani, 1993) is a method to estimate the parameters
of a model and their standard errors strictly from the sample, without reference to a
theoretical sampling distribution.2 The bootstrap directly follows the logic of statistical
inference. Statistical inference assumes that in repeated sampling, the statistics calculated
in the sample will vary across samples. This sampling variation is modeled by a theoretical
sampling distribution, for instance a normal distribution, and estimates of the expected
value and the variability are taken from this distribution. In bootstrapping, we draw b times
a sample (with replacement) from the observed sample at hand. In each sample, we estimate
the statistic(s) of interest, and the observed distribution of the b statistics is used for the
sampling distribution. Estimates of the expected value and the variability of the statistics are
taken from this empirical sampling distribution (Stine, 1989; Mooney & Duval, 1993; Yung
& Chan, 1999). Thus, in multilevel bootstrapping, in each bootstrap sample the parameters
of the model must be estimated, which is usually done with ML.
Since bootstrapping takes the observed data as the sole information about the population,
it needs a reasonable original sample size. Good (1999, p. 107) suggests a minimum sample
size of 50 when the underlying distribution is not symmetric. Yung and Chan (1999) review
the evidence on the use of bootstrapping with small samples. They conclude that it is not
possible to give a simple recommendation for the minimal sample size for the bootstrap
method. However, in general the bootstrap appears to compare favorably over asymptotic
methods. A large simulation study involving complex structural equation models (Nevitt
& Hancock, 2001) suggests that, for accurate results despite large violations of normality
assumptions, the bootstrap needs an observed sample of more than 150. Given such results,
the bootstrap is not the best approach when the major problem is a small sample size.
Estimation and Hypothesis Testing in Multilevel Regression 33
This section discusses procedures for testing significance and model comparison for the
regression coefficients and variance components.
Language: English
BY MOTOR
to the
GOLDEN GATE
BY
EMILY POST
ILLUSTRATED WITH
PHOTOGRAPHS and ROAD MAPS
Copyright, 1916, by
D. APPLETON AND COMPANY
Copyright, 1915, by P. F. Collier & Son, Inc.
One would have thought that we were starting for the Congo or
the North Pole! Friends and farewell gifts poured in. It was quite
thrilling, although myself in the rôle of a venturesome explorer was a
miscast somewhere. Every little while Edwards, our butler, brought in
a new package.
One present was a dark blue silk bag about twenty inches square
like a pillow-case. At first sight we wondered what to do with it. It
turned out afterward to be the most useful thing we had except a tin
box, the story of which comes later. The silk bag held two hats
without mussing, no matter how they were thrown in, clean gloves,
veils, and any odd necessities, even a pair of slippers. The next
friend of mine going on a motor trip is going to be sent one exactly
like it!
By far the most resplendent of our presents was a marvel of a
luncheon basket. Edwards staggered under its massiveness, and we
all gathered around its silver-laden contents; bottles and jars, boxes
and dishes, flat silver and cutlery, enamelware and glass, food
paraphernalia enough to set before all the kings of Europe.
“I could not bear,” wrote the giver, “to think of your starving in the
desert.”
Stowing the Luggage