0% found this document useful (0 votes)
137 views23 pages

Preacher Kelley 2011

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views23 pages

Preacher Kelley 2011

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Psychological Methods © 2011 American Psychological Association

2011, Vol. 16, No. 2, 93–115 1082-989X/11/$12.00 DOI: 10.1037/a0022658

Effect Size Measures for Mediation Models:


Quantitative Strategies for Communicating Indirect Effects
Kristopher J. Preacher Ken Kelley
University of Kansas University of Notre Dame

The statistical analysis of mediation effects has become an indispensable tool for helping scientists
investigate processes thought to be causal. Yet, in spite of many recent advances in the estimation and
testing of mediation effects, little attention has been given to methods for communicating effect size and
the practical importance of those effect sizes. Our goals in this article are to (a) outline some general
desiderata for effect size measures, (b) describe current methods of expressing effect size and practical
importance for mediation, (c) use the desiderata to evaluate these methods, and (d) develop new methods
to communicate effect size in the context of mediation analysis. The first new effect size index we
describe is a residual-based index that quantifies the amount of variance explained in both the mediator
and the outcome. The second new effect size index quantifies the indirect effect as the proportion of the
maximum possible indirect effect that could have been obtained, given the scales of the variables
involved. We supplement our discussion by offering easy-to-use R tools for the numerical and visual
communication of effect size for mediation effects.

Keywords: mediation, indirect effect, effect size

Supplemental materials: http://dx.doi.org/10.1037/a0022658.supp

Consider the case in which a researcher has established that Y i ⫽ dY.MX ⫹ bMi ⫹ c⬘Xi ⫹ eY.MXi , (3)
some regressor (X) explains some of the variance in a criterion or
dependent variable (Y) via regression. Equation 1 expresses the where dY.MX is the intercept for Y, b is the slope of Y regressed on
model for individual i: M controlling for X, c⬘ is the slope of Y regressed on X controlling
for M, and eY.MXi is the error. The indirect effect, defined as â ⫻ b̂,
Y i ⫽ dY.X ⫹ cXi ⫹ eY.Xi , (1) often is used as an index of mediation (where throughout a
circumflex [ˆ] above a parameter denotes a sample estimate).
where c is the regression coefficient quantifying the total effect of
In general, â ⫻ b̂ ⫽ ĉ ⫺ ĉ⬘, and thus ĉ ⫽ â ⫻ b̂ ⫹ ĉ⬘.
X on Y, dY.X is the intercept of the model, and eY.Xi is the error
Structural equation modeling may also be used to obtain both â
associated with individual i. Mediation analysis consists of esti-
and b̂ simultaneously, correct for the attenuating effects of mea-
mating the indirect effect of X on Y via an intervening variable
surement error, and test more complex models, such as those
called a mediator (M). In the simplest case, the researcher re-
where X, M, and Y are latent. Here we focus on the simplest case
gresses M on X and separately regresses Y on both X and M using
of a single mediator (unless otherwise stated) and no latent vari-
the following equations:
ables. Tests of mediation effects have become very popular in the
managerial, behavioral, educational, and social sciences because
M i ⫽ dM.X ⫹ aXi ⫹ eM.Xi , (2) they help researchers understand how, or by what means, effects
unfold. A path diagram showing a simple mediation model is
where dM.X is the intercept for M, a is the slope of M regressed on
presented in Figure 1.
X, and eM.Xi is the error and
Many methods have been developed to facilitate significance
testing and/or confidence interval formation for indirect effects
This article was published Online First April 18, 2011. (MacKinnon, 2008; MacKinnon, Lockwood, Hoffman, West, &
Kristopher J. Preacher, Department of Psychology, University of Kan- Sheets, 2002). We find the increased attention being devoted to
sas; Ken Kelley, Department of Management, Mendoza College of Busi- appropriate modeling and testing techniques highly encouraging.
ness, University of Notre Dame. On the other hand, we believe this emphasis on modeling and
This study made use of the Socialization of Problem Behavior in Youth: statistical significance falls short of the ideal. Despite the recom-
1969 –1981 data made available by Richard and Shirley Jessor in 1991. The mendation of Baron and Kenny (1986, p. 1177) to consider the
data are available through the Henry A. Murray Research Archive at
absolute size of relevant regression weights in addition to their
Harvard University (producer and distributor). We thank Sonya K. Sterba,
Scott E. Maxwell, and Robert Perera for valuable input.
statistical significance, very little attention has been devoted to
Correspondence concerning this article should be addressed to Kristo- quantifying and reporting the effect size of indirect effects in
pher J. Preacher, Department of Psychology, University of Kansas, 1415 mediation models.
Jayhawk Boulevard, Room 426, Lawrence, KS 66045-7556. E-mail: The fourfold purposes of this article are to (a) outline some
[email protected] general desiderata for effect size estimation, (b) review existing

93
94 PREACHER AND KELLEY

evaluate the stability of results across samples, designs, and anal-


yses” (Wilkinson & the Task Force on Statistical Inference, 1999,
p. 599). In addition, “it is almost always necessary to include some
measure of effect size in the Results section” (American Psycho-
logical Association, 2010, p. 34). But even though researchers are
now urged to report effect size to supplement or replace statistical
significance, researchers who use mediation models have few
resources to which to turn. For researchers who desire to report
effect size for mediation effects, there simply is not much work
that can be referenced (Albert, 2008; MacKinnon, Fairchild, &
Fritz, 2007; Preacher & Hayes, 2008a), and many of the meth-
ods that do exist have limitations that often go unrecognized.
We begin by offering a general definition of effect size, outlin-
ing some desirable properties (desiderata) to which new effect
size measures should aspire, and delineating the issues that
warrant attention when reporting effect size for mediation ef-
fects. Ultimately, we recommend a new effect size measure,
developed in a later section, that we believe has desirable
properties that will be useful in quantifying the magnitude of
the indirect effect in the application of mediation models.

Defining Effect Size


Figure 1. Diagram of models in which the effect of X on Y is (upper)
versus is not (lower) mediated by M. Circles represent residuals, single-
headed arrows represent regression weights, and double-headed arrows There is almost universal agreement among methodologists that
represent variance parameters. effect size is very important to report whenever possible (Grissom
& Kim, 2005; Thompson, 2007; Vacha-Haase, Nilsson, Reetz,
Lance, & Thompson, 2000). Yet, there are inconsistencies in how
effect sizes proposed in the mediation context, (c) use the desid- effect size is defined in the methodological literature, with the
erata to evaluate how effect size has been quantified and reported preponderance of authors favoring either a definition based on the
in the context of mediation, and (d) suggest new ways to commu- magnitude of departure from a particular null hypothesis or a
nicate the magnitude of the indirect effect while avoiding the definition relating effect size to practical importance. For example,
shortcomings of existing methods. The development of quality Cohen (1988) defined effect size as the “degree to which the
effect sizes will facilitate meta-analytic work on mediation, some- phenomenon is present in the population or the degree to which the
thing currently lacking in the mediation literature. Finally, we null hypothesis is false” (pp. 9 –10). Similarly, Vacha-Haase and
provide R code (R Development Core Team, 2010) with the Thompson (2004) defined effect size as a “statistic that quantifies
MBESS1 package (Kelley & Lai, 2010; Kelley, 2007b) to aid the degree to which sample results diverge from the expectations
researchers who wish to use the methods we describe in their own . . . specified in the null hypothesis” (p. 473). Other major works
research. Graphical methods are an important supplement to quan- on effect size have similar definitions (Grissom & Kim, 2005). On
titative descriptions of mediation and can themselves be useful the other hand, some authors prefer to regard effect size as any
ways of communicating results. We discuss graphical methods in numeric quantity intended to convey the practical significance (or
an online supplement.2 importance) of an effect (Kirk, 1996). Practical importance, in
turn, is the substantive importance of an effect in real terms. That
Conceptualizing Effect Size: A Definition is, practical importance is the degree to which scientists, practi-
tioners, executives, consumers, politicians, or the public at large,
and Desiderata
for example, would consider a finding important and worthy of
attention. Yet other authors use both kinds of definition inter-
Numerous methodologists have recommended that effect size
changeably (Henson, 2006). These two kinds of definitions— one
measures accompany reports of statistical significance and nonsig-
based on the size of an effect relative to a null hypothesis and the
nificance. As a result, effect size reporting is now encouraged or
other based on practical importance—imply related but separate
mandated by many journal editors, as well as many organizations
concepts.
with scientific oversight, including the National Center for Edu-
cation Statistics (NCES, 2003), the International Committee of
Medical Journal Editors (via the Consolidated Standard of Report- 1
Originally MBESS stood for Methods for the Behavioral, Educational,
ing Trials [CONSORT; Moher et al., 2010]), and the American and Social Sciences. However, MBESS is now an orphaned acronym,
Educational Research Association (AERA, 2006). Furthermore, as meaning that what was an acronym is now literally its name.
the American Psychological Association (APA) Task Force on 2
The supplemental material on graphical methods may be found at the
Statistical Inference recommended, reporting some measure of Psychological Methods website and at the authors’ websites (http://quantpsy
effect size is “essential to good research” and “enables readers to .org and https://repository.library.nd.edu/view/5/Mediation_Effect_Sizes.pdf).
REPORTING EFFECT SIZE FOR MEDIATION 95

In response to the need for a general, inclusive definition of should converge on the population value as n increases), and
effect size, we define effect size as any measure that reflects a efficient (i.e., the effect size estimator should have reasonably low
quantity of interest, either in an absolute sense or as compared sampling variability).
with some specified value. The quantity of interest might refer to
variability, association, difference, odds, rate, duration, discrep-
Effect Size in the Context of Mediation Analysis
ancy, proportionality, superiority, or degree of fit or misfit. It is
possible for an effect size measure conforming to this definition to
The magnitude of the indirect effect can be informally signified
be used as an index of practical importance, although practical
by the a and b coefficients themselves. MacKinnon (2008) and
importance is not tied to our definition of effect size.
MacKinnon et al. (2007) suggested that either the standardized
regression coefficient or the raw correlation can be used as an
Desiderata for Good Effect Size Indices
effect size measure for the a coefficient, and a partial correlation
can be used as an effect size measure for the b coefficient. This
Some desirable properties for effect size measures, which we
method is not entirely satisfactory, as a and b alone do not convey
term desiderata, are now outlined. First, virtually all effect size
the full meaning of an indirect effect. Therefore, it is important to
indices should be scaled appropriately, given the measurement and
develop a way to gauge the effect size of the product term ab itself.
the question of interest. Without an interpretable scale, it is diffi-
Unfortunately, the indirect effect does not fit any of the classic
cult to use effect size to communicate results in a meaningful and
effect size measures developed in methodological works or re-
useful way. Often effect size is associated with standardized effect
ported in research, such as the standardized mean difference
sizes; indeed, sometimes standardization is a defining characteris-
(Cohen’s d, Hedges’ g), association (␤, r, rbis), odds ratio (OR),
tic of effect size, and in many cases, standardization frees the
percentage of variance explained (intraclass correlation, R2, ␩2,
researcher from having to prepare a new set of interpretive bench-
␻2), or the coefficient of variation. In mediation models, the
marks for every new scale or application (Cohen, 1988). Through-
primary effect of interest is an indirect effect. Such an effect is
out, we define a standardized effect size as one that is not wedded
complex because it is the product of (here) two regression coeffi-
to a particular measurement scale. More formally, it is an effect
cients and does not fit conveniently into the framework of existing
size that does not change in value based on linear transformations
effect sizes. Thus, it is challenging to adapt existing effect size
of the variable(s) involved. Although standardized effect sizes can
measures for use in mediation analysis. In developing and evalu-
be valuable, they are not always to be preferred over an effect size
ating new methods of expressing effect size for indirect effects, it
that is wedded to the original measurement scale, which may
will be important to do so in light of the definition and desiderata
already be expressed in meaningful units that appropriately ad-
outlined earlier. That is, effect sizes suggested for mediation
dress the question of interest (Baguley, 2009; Frick, 1999). For
analysis should be on a meaningful metric, should be amenable to
example, group mean differences in scores on a widely understood
the construction of confidence intervals, and should be indepen-
instrument for measuring depressive symptoms are already ex-
dent of sample size. A meaningful metric in this context is any
pressed on a metric that is understandable to depression research-
metric where the size of the effect can be interpreted in a mean-
ers, and to standardize effects involving the scale would only
ingful way vis-à-vis the constructs under study. Standardized ef-
confuse matters.
fect sizes are on a meaningful scale in units of standard deviations.
Second, it should be emphasized that effect size estimates are
For example, in a regression model with a single independent
themselves sample statistics and thus will almost certainly differ
variable and a single dependent variable that are both standardized,
from their corresponding population values. Therefore, it is im-
a correlation coefficient can be interpreted as the number of
portant to report confidence intervals for effect sizes because the
standard deviations that the dependent variable is expected to
real interest lies not in the estimated value but in the population
increase for a change of one standard deviation in the independent
value (Balluerka, Gómez, & Hidalgo, 2005; Bird, 2002; Cumming
variable. Our suggestion for effect sizes to be on a meaningful
& Finch, 2001; Fidler & Thompson, 2001; Henson, 2006; Kelley,
metric implies no preference for standardized or unstandardized
2007a; Kirk, 1996; Smithson, 2001; Thompson, 2002, 2007).
effect sizes. The metric that most effectively communicates the
Third, although sampling error will affect the uncertainty in any
particular effect size in the specific context is what we regard as
effect size estimate and sampling error will tend to decrease as
the preferred metric. This will vary by situation.
sample size (n) increases, the point estimate itself should be
independent of sample size. Effect sizes are usually considered to
have corresponding population values (parameters), so the estima- Illustrative Example
tion of an effect should be independent of the arbitrary size of the
sample that is collected in order to estimate that population effect. To make our discussion more concrete, we make use of a
Two researchers should not come to different conclusions about publicly available data set, Jessor and Jessor’s (1991) Socialization
the size of an effect simply because their samples are of different of Problem Behavior in Youth 1969 –1981 (SPBY; Jessor & Jessor,
sizes, all other things being equal. None of the effect sizes in 1991). The sample size is n ⫽ 432 with complete data. In the
common use depends on n for their respective definitions (r, applications to follow, the predictor variable is achievement values
Cohen’s d, odds ratios, etc.) other than in a limited fashion that (VAC), obtained by averaging 10 items from the Personal Values
quickly diminishes as n increases. More broadly, the sample esti- Questionnaire (Jessor & Jessor, 1977) administered in 1969 to high
mators of population effect sizes should be unbiased (i.e., the school students in the Boulder area of Colorado. Example items
expected value of the effect size should equal the parameter over ask respondents how much they like having good grades for
infinite repeated sampling), consistent (i.e., the effect size estimate entering college and how much they like being on the honor roll.
96 PREACHER AND KELLEY

The mediator variable is attitude toward deviance (ATD), obtained Table 2


by averaging 30 items from the Attitude Toward Deviance Scale Correlations, Covariances, and Means for Jessor and
(Jessor & Jessor, 1977) administered to the same students in 1970. Jessor’s (1991) Data
Example items ask respondents how wrong it is to break into a
VAC (X) ATD (M) DVB (Y)
locked place or to beat up another kid. Because of the manner in
which responses were scored, higher scores on ATD indicate VAC (X) 2.268 .291 ⫺.190
greater intolerance of deviant behavior. The outcome variable is ATD (M) 0.662 2.276 ⫺.493
deviant behavior (DVB), obtained by averaging 30 items from the DVB (Y) ⫺0.087 ⫺0.226 0.092
Deviant Behavior Report Scale (Jessor & Jessor, 1977) adminis- M 7.158 5.893 1.649
tered to the same sample in 1971. An example item asks respon- Note. Numbers on the diagonal are variances, those below the diagonal
dents how often they have threatened a teacher out of anger. Basic are covariances, and those above the diagonal (italicized) are correlations.
VAC ⫽ (higher) achievement values; ATD ⫽ (more intolerant) attitude
results for the direct and indirect effects linking VAC, ATD, and toward deviance; DVB ⫽ (more) deviant behavior.
DVB are provided in Table 1, and covariances, correlations, and
means for the three variables are provided in Table 2. Figure 2 is
a Venn diagram depicting the variances of VAC, ATD, and DVB rigorous, quantitative manner. The most popular way to express effect
as circles, overlapping to the degree that these variables are re- size for mediation is through informal descriptors, such as complete,
lated. perfect, or partial mediation (Mathieu & Taylor, 2006). James and
Brett (1984) described complete mediation as occurring when the
Existing Methods of Expressing Effect Size for effect of X on Y completely disappears (i.e., c⬘ ⫽ 0) when M is added
Mediation Effects as a predictor of Y. Baron and Kenny (1986) asserted that “the
strongest demonstration of mediation occur[s] when Path [c⬘] is zero”
In this section, we describe and evaluate existing measures of (p. 1176), effectively proposing a way to judge the effect size of an
effect size for mediation effects. Each method is evaluated in light indirect effect by examining the statistical significance of c⬘. The
of the definition and desiderata identified above and illustrated condition in which c⬘⫽ 0 after the detection of a statistically signif-
using SPBY data. icant mediation effect they dub perfect mediation (p. 1177). In prac-
tice, a researcher may claim that a mediation effect is perfect or
Verbal Descriptors complete if c⬘ is not statistically significantly different from zero,
which is to say that perfect mediation exists when there is not
The literature about, and using, mediation is fraught with language sufficient evidence to demonstrate that it does not. In other words, the
invoking the idea of effect size but not directly addressing it in a status quo is to claim perfect mediation when the null hypothesis that
c⬘⫽ 0 is not rejected by the null hypothesis significance test, thus
using the absence of evidence (i.e., a failure to reject the null hypoth-
Table 1 esis that c⬘⫽ 0) as evidence of absence (of the direct effect exerted by
Regression Results for the Mediation of the Effect of X on Y). For example, in the SPBY data, c⬘⫽ ⫺.0102 (p ⫽ .25, ns;
Achievement Values on Deviant Behavior by 95% CI [⫺.028, .007]), and thus the statistically significant indirect
Attitude Toward Deviance effect would signify complete mediation by Baron and Kenny’s
criterion. Of course, one could fail to reject the null hypothesis that
CI CI
Model Estimate SE p (lower) (upper) c⬘⫽ 0 due to insufficient statistical power from an insufficiently large
n. Furthermore, it is not clear what should be done when c⬘⬍ 0 by
Model without mediator
Intercept 1.9236 .0698 ⬍.0001 1.7864 2.0608
VAC 3 DVB (c) ⫺.0383 .0095 .0001 ⫺0.0571 ⫺0.0196
2
RY,X .0361 0.0095 0.0779
Model with mediator
Intercept 2.2900 .0704 ⬍.0001 2.1517 2.4282
VAC 3 ATD (a) .2916 .0462 ⬍.0001 0.2008 0.3825
ATD 3 DVB (b) ⫺.0963 .0088 ⬍.0001 ⫺0.1136 ⫺0.0789
VAC 3 DVB (c⬘) ⫺.0102 .0088 .2472 ⫺0.0276 0.0071
Indirect effect (a ⫻ b) ⫺.0281 ⫺0.0390 ⫺0.0189
2
RM,X .0848 0.0408 0.1405
2
RY,MX .2456 0.1750 0.3155
2
Note. Regression weights a, b, c, and c⬘ are illustrated in Figure 1. RY,X
2
is the proportion of variance in Y explained by X, RM,X is the proportion of
2
variance in M explained by X, and RY,MX is the proportion of variance in Y
explained by X and M. The 95% CI for a ⫻ b is obtained by the
Figure 2. Venn diagram showing the extent to which VAC, ATD, and
bias-corrected bootstrap with 10,000 resamples. The CIs for R2 indices are
obtained analytically. In this example, VAC (achievement values) is the DVB share variance in common. Each circle represents the total variance
independent variable (X), ATD (attitude toward deviance) is the mediator of a variable, and the overlap of two circles represents the portion of
(M), and DVB (deviant behavior) is the outcome (Y). CI (lower) ⫽ lower variance shared in common by two variables. VAC ⫽ (higher) achieve-
bound of a 95% confidence interval; CI (upper) ⫽ upper bound; 3 ⫽ ment values; ATD ⫽ (more intolerant) attitude toward deviance; DVB ⫽
affects. (more) deviant behavior.
REPORTING EFFECT SIZE FOR MEDIATION 97

even a small amount. Baron and Kenny cautioned that at least in (Huang, Sivaganesan, Succop, & Goodman, 2004) and is often
psychology, complete mediation is expected to be rare because of the interpreted loosely as the proportion of the total effect that
prevalence of multiple mediators. These descriptors are found in âb̂
is mediated. In the SPBY data, P̂M ⫽ ⫽
common usage and are intended to denote either the practical impor- âb̂ ⫹ ĉ⬘
tance of an effect (describing an effect as complete carries the impli- 共.2916兲共⫺.0963兲
⫽ .733 (95% CI [.458, 1.357]),3
cation that it is “large” or “important,” whereas, a partial mediation 共.2916兲共⫺.0963兲 ⫺ .0102
effect is not as impressive) or the potential for identifying additional signifying, if P̂M is to be interpreted as a proportion (an assump-
mediators (complete implies that there is no room for further media- tion we soon question), that attitudes toward deviance mediate
tors, whereas partial potentially indicates a need to continue looking approximately three-fourths of the total effect of achievement
for additional mediators). values on deviant behavior. The complement of P̂M, if in fact it
The informal descriptors complete and partial do not fulfill the is interpreted as a proportion, is thus 1 ⫺ P̂M ⫽ .266.
desiderata identified earlier. First, they are not expressed in a mean- Sobel (1982) proposed the ratio of the indirect effect to the
ingfully scaled metric. Although the words complete and partial direct effect:
invoke the idea of proportion, they are not numerical, so the impor-
tance attached to the terms is largely subjective. Second, because they ab
RM ⫽ . (6)
are not numerical, it is impossible to compute confidence intervals for c⬘
them. Third, these descriptors are defined in terms of the statistical
A recent example of the use of RM is provided by Barreto and
significance of c⬘ and so are not independent of sample size. Because
Ellemers (2005), who reported that the ratio of the indirect to direct
of this, we argue that a researcher is implicitly rewarded for using a
effect of type of sexism (hostile vs. benevolent) on perceived
small sample with a greater likelihood of obtaining “complete medi-
sexism through evaluation of the source was 1.7. In the SPBY data,
ation,” which runs counter to the universal recommendation to prefer
âb̂ 共.2916兲共⫺.0963兲
larger samples. Fourth, although they do convey something about R̂M ⫽ ⫽ ⫽ 2.742 (95% CI [⫺4.162, 147.689]),
practical importance, they are highly imprecise. In general, holding ĉ⬘ ⫺.0102
indicating that the indirect effect of VAC on DVB is approximately
everything else constant, it is more likely that a mediator will com-
2.75 times the size of the direct effect, but this ratio is not statistically
pletely mediate a relatively small total effect (c) than a relatively large
significantly different from zero at the 5% level because 0 is contained
total effect, so an effect in which M partially mediates a relatively
in the 95% confidence interval.
large c may be more impressive than one in which M completely
Although PM and RM are easy to estimate in samples, as measures
mediates a relatively small c.
of effect size they suffer from several limitations; we discuss limita-
tions of PM first, followed by limitations of RM. First, consider that as
Ratio Measures of Relative Magnitude an index PM can convey misleading estimates of practical importance.
Depending on the context, obtaining PM ⫽ .9 for a relatively small but
Several quantitative measures of relative magnitude, in addition statistically significant total effect may not necessarily be as impres-
to the verbal descriptors discussed earlier, have been proposed for sive as obtaining PM ⫽ .6 for a relatively large and statistically
mediation effects. Alwin and Hauser (1975) proposed several such significant total effect, yet the former sounds as if it is somehow more
measures in their classic article on the decomposition of effects in important, whereas the latter seems as though it is less impressive
path analysis (see also MacKinnon, 1994; MacKinnon & Dwyer, when quantified using a standardized effect size like P̂M. As we
1993; Sobel, 1982). Two measures that are relevant for simple discuss later, it is important to be mindful of the distinction between
mediation models are the ratio of the indirect effect to the total the value of an effect size, even if it seems rather small or large, and
effect, the practical importance of the effect size in the specific context.
Second, despite the fact that many researchers refer to it as a propor-
ab ab c⬘ tion, P̂M is not a proportion and thus cannot be interpreted as such. The
PM ⫽ ⫽ ⫽ 1⫺ , (4)
ab ⫹ c⬘ c c quantity âb̂/共âb̂ ⫹ ĉ⬘兲 can exceed 1.0 or be negative, depending on
the relation of ĉ⬘ to ĉ (Albert, 2008; Alwin & Hauser, 1975; Mac-
and the ratio of the direct effect to the total effect, Kinnon, 2008), which implies that it is not a proportion. The fact that
ab ab c⬘
1 ⫺ PM ⫽ 1 ⫺ ⫽ 1⫺ ⫽ , (5)
ab ⫹ c⬘ c c 3
This and subsequently reported confidence intervals use bias-corrected
and accelerated (BCa) bootstrap confidence limits. Bootstrapping involves
where a is the slope linking X to M, b is the conditional slope treating the original sample as if it were a population and simulating the
linking M to Y, c is the total effect of X on Y, and c⬘ is the sampling process assumed to have led to the original sample. An arbitrarily
conditional slope linking X to Y (Alwin & Hauser, 1975; Buyse & large number B of bootstrap samples of size n are selected with replacement
Molenberghs, 1998; MacKinnon, 2008; MacKinnon et al., 2007; from the original sample of size n. (B is recommended to be several thousand
MacKinnon, Warsi, & Dwyer, 1995; Shrout & Bolger, 2002; for acceptable precision; we used B ⫽ 10,000.) Each of these B “resamples”
is used to compute the statistic of interest, resulting in B bootstrap estimates of
Tofighi, MacKinnon, & Yoon, 2009; Wang & Taylor, 2002). The
the statistic. The empirical sampling distribution of these bootstrap estimates
sample statistic P̂M is obtained by substituting sample quantities serves as a basis for obtaining confidence limits by referring to values at the
for their corresponding population values. PM is also known as appropriate percentiles (e.g., 2.5 & 97.5) for what are termed percentile
the validation ratio (Freedman, 2001) or mediation ratio confidence intervals. BCa confidence limits are obtained by adjusting the
(Ditlevsen, Christensen, Lynch, Damsgaard, & Keiding, 2005) limits from the percentile confidence intervals according to instructions pro-
in epidemiological research and as the relative indirect effect vided by Efron (1987) and Efron and Tibshirani (1993).
98 PREACHER AND KELLEY

PM is not literally a proportion is not a limitation of PM per se but relatively small effects or trivialize relatively large ones. Consid-
rather of how PM has been discussed and used. Nevertheless, since PM ering the reasonable case where ĉ ⫽ .63 and ĉ⬘ ⫽ ⫺.01, the ratio
cannot be appropriately interpreted as a proportion, it is less useful of the indirect to direct effect will equal a nonsensical – 64, yet if
than its label implies. Measures of explained variance are better suited ĉ ⫽ .63 and ĉ⬘ ⫽ ⫹.01, the ratio will equal ⫹62. In addition, if ĉ
to bear such proportion interpretations, which we discuss later. Third, is relatively small but âb̂ is relatively large, the ratio can assume
focusing on the overall value of PM may neglect additional mediators extremely large values, as RM is an unbounded quantity. Con-
in models where multiple mediators are plausible (MacKinnon et al., versely, if ĉ is relatively large and âb̂ is relatively small, small yet
2007). It is easy to assume that if P̂M seems large (i.e., approaches 1.0, substantively important effects can easily slip through the cracks.
which, as we indicated earlier, is not its upper limit), there is “no Figure 3 shows that for a fixed value of the total effect (c ⫽ .4), RM
room” for additional mediators, when in fact it is possible to identify
assumes small values for most indirect effects likely to occur
additional and/or better mediators. An additional mediator may well
between .0 and .35 and then increases rapidly to ⫹⬁ as c is
be correlated with the one already included in the model, in which
approached from below. For indirect effects above c, RM ap-
case the indirect effect would be partitioned into parts unique to each
proaches ⫺⬁ as c is approached from above.
mediator. Fourth, P̂M and R̂M have large variances over repeated
samples, and thus they are not very efficient estimators. In fact, Although the limitations of P̂M and R̂M we note above are
MacKinnon (1994) showed that both ratio measures can be unstable serious, estimates P̂M and R̂M are currently the most widely used
and commented that they “should be used only with relatively large measures of effect size. There are perhaps four reasons why P̂M and
sample sizes” (p. 139). Simulation research has shown that P̂M is R̂M are so widely used. First, consistent with our third desideratum,
unstable unless n ⬎ 500 (MacKinnon, 1994; MacKinnon et al., 1995). the estimates P̂M and R̂M are relatively unaffected by sample size.
Similarly, R̂M is unstable unless n ⬎ 5,000 (MacKinnon et al., 1995). Second, Alwin and Hauser (1975) noted that the proportionate
RM is so unstable because the numerator (ab) varies inversely with the decomposition of effects into direct and indirect components can
denominator (c⬘). Consequently, minor fluctuations in ab and c⬘ can facilitate interpopulation comparison of such effects, even when
lead to large fluctuations in their ratio. These large fluctuations can the variables of interest are not measured on the same scales across
become enormous when c⬘ is near zero because RM approaches groups. Third, consistent with our second desideratum, both PM
infinity when c⬘ approaches zero. Examination of Figure 3 shows this and RM are amenable to the construction of confidence intervals.
sensitivity for specific situations, where the value of RM abruptly Regarding confidence interval construction, Lin, Fleming, and
approaches positive or negative infinity as the value of c is ap- DeGruttola (1997) gave a confidence interval for PM based on the
proached. Tofighi et al. (2009) similarly reported that very large delta method, and this interval and one based on Fieller’s method
samples are required for stable estimation of ratio measures. Both of are discussed by Freedman (2001) and Wang and Taylor (2002).4
these measures vary in bias and precision as a function of the size of Sobel (1982) provided derivations necessary for constructing an
the effects, with larger effects imparting less bias and being more asymptotic confidence interval for R̂M. MacKinnon et al. (1995)
precise. Taken together, these four limitations make us question the and Tofighi et al. (2009) provided delta method standard errors for
usefulness of PM as a population value worth estimating and inter-
both ratio measures, for the cases where b and c⬘ are either
preting.
correlated or uncorrelated. However, because neither P̂M nor R̂M is
Although the ratio measure RM does not have any pretensions
normally distributed except in very large samples, it is not advis-
toward being a proportion, it simply repackages the same infor-
able to use any of the above noted confidence interval methods but
mation as PM without conveying any additional information [RM ⫽
PM/(1 ⫺ PM)]. Like PM, RM can assume values that exaggerate rather to use the bootstrap approach, as we have discussed (e.g.,
Wang & Taylor, 2002). The fourth reason that P̂M and R̂M are so
widely used, we believe, is that there really have been no better
alternatives proposed in the literature for communicating the mag-
200
nitude of effect.
c = .4 Buyse and Molenberghs (1998) suggested a ratio that we ab-
breviate as SM:
100
R M = ab/c'

c
SM ⫽ . (7)
a
0

4
The delta method is used to derive an approximate probability distribution
-100 for a function g共␪ˆ 兲 of asymptotically normal parameter estimates in the vector
␪ˆ . It proceeds by first finding the first- or second-order (usually higher orders
are not necessary) Taylor series expansion of the function, g̃共␪ˆ 兲, and then
applying the definition of a variance, var(g̃共␪ˆ 兲) ⫽ E关共g̃共␪ˆ 兲兲2兴 ⫺ 关E共g̃共␪ˆ 兲兲兴2.
-200
0.0 0.1 0.2 0.3 0.4 0.5 0.6 The delta method is commonly used to derive estimated standard errors for
functions of parameter estimates, which can then be used to construct confi-
ab dence intervals for estimates that are assumed normally distributed. Fieller’s
method involves linearizing the ratio, finding the values of the squared linear-
Figure 3. Plot of the ratio measure RM for total effect c ⫽ .4 and indirect ized form that are less than or equal to the desired critical value under the ␹2
effects ab ranging from 0 to .7. distribution, then solving for values of g共␪ˆ 兲 satisfying the inequality.
REPORTING EFFECT SIZE FOR MEDIATION 99

SM is a measure of the success of a surrogate endpoint, a measure it is not robust to changes in scale, which limits its usefulness in
of an intermediate variable that may be related to an important meta-analysis.
clinical endpoint. For example, gum inflammation may be treated
as a surrogate endpoint for tooth loss, and LDL cholesterol is often Partially Standardized Indirect Effect
treated as a surrogate for heart disease. Thus, although surrogate
endpoints share much in common with mediators, the emphasis is MacKinnon (2008) suggested that indirect effects may be stan-
on coefficients a and c rather than a and b. The ratio c/a should be dardized in the following way:
about 1.0 if X predicts M to the same extent that it predicts Y
(MacKinnon, 2008; Tofighi et al., 2009). Tofighi et al. (2009) ab
provided a delta method standard error for this ratio measure and ab ps ⫽ , (8)
␴Y
recommend a sample size of at least 500 for accurate SEs when the
ĉ ⫺.0383 which is the ratio of the indirect effect to the standard deviation of
regression weights are small. In the SPBY data, ŜM ⫽ ⫽ ⫽
â .2916 Y. This index represents the size of the indirect effect in terms of
⫺.131 (95% CI [⫺.195, ⫺.077]). However, we caution that SM has standard deviation units in Y. Because ab is interpreted in raw units
at least two flaws that limit its usefulness as an effect size measure of Y, dividing by ␴Y removes the scale of Y, leaving a metric
for mediation. First, it does not incorporate b, a crucial component standardized in Y but not X or M. The interpretation of abps is the
of the indirect effect. Thus, the indirect effect could be quite small number of standard deviations by which Y is expected to increase
or even zero for even a respectably sized ŜM. Second, because it is or decrease per a change in M of size a. Coefficient a, in turn,
a ratio, SM depends on the relative size of the component param- âb̂
eters rather than their absolute magnitudes. As an example of why remains unstandardized. In the SPBY example, âb̂ps ⫽ ⫽
sY
this might be problematic, consider the case of standardized coef- 共.2916兲共⫺.0963兲
ficients â ⫽ .0001 and ĉ ⫽ .0001. In a situation in which ĉ ⫽ .0001 ⫽ ⫺.092 (95% CI [⫺.125, ⫺.064]), implying
.3036
is a trivial effect (even if statistically significant), we probably that DVB is expected to decrease by .092 standard deviations for
should not be impressed by ŜM ⫽ 1. every one-unit increase in VAC (on its 10-point scale) indirectly
via ATD.
Unstandardized Indirect Effect
Completely Standardized Indirect Effect
It often is not appreciated that statistics in their original metrics
can be considered effect sizes if they are directly interpretable
(Abelson, 1995; Baguley, 2009; Frick, 1999; Ozer, 2007). The Carrying MacKinnon’s (2008) logic further, we could fully
most obvious method of expressing the magnitude of the indirect standardize the indirect effect by multiplying abps by sX. The
effect is to directly interpret the sample âb̂ as an estimate of the resulting index would be fully insensitive to the scales of X, M, and
population ab. The unstandardized indirect effect âb̂ is indepen- Y. Preacher and Hayes (2008a) suggested the term index of medi-
dent of n and can be interpreted using the original scales of the ation for this effect size measure:
variables in the model. The product ab has a straightforward
␴X
interpretation as the decrease in the effect of X on Y when M is ab cs ⫽ ab . (9)
added to the model or as the amount by which Y is expected to ␴Y
increase indirectly through M per a unit change in X. In the SPBY
Alwin and Hauser (1975, p. 41) and Cheung (2009) discussed
data, for example, âb̂ ⫽ ⫺.0281 (95% CI [⫺.039, ⫺.019]), im-
this index as well, noting that it can be used to compare indirect
plying that DVB is expected to decrease by .0281 units (on its
effects across populations or studies when variables use different
4-point scale) for every one-unit increase in VAC (on its 10-point
metrics in each population. Thus, standardized indirect effects may
scale) if one considers only the indirect influence via ATD.
be useful in meta-analysis. However, as we note later, many
If the variables X and Y are already on meaningful metrics,
authors point out that the standardization factor varies from study
simply reporting ab and interpreting it may suffice to communicate
to study, implying that standardized effect sizes may be less useful
effect size and practical importance. As has been discussed in the
than is generally thought. Bobko and Rieck (1980) also considered
mediation literature, there are multiple ways to construct confi-
indirect effects using standardized variables, and Raykov, Bren-
dence intervals for ab, the product term does not depend on n, and
nan, Reinhardt, and Horowitz (2008) advocated a scale-free cor-
the product conveys information about practical importance if the
relation structure modeling approach to estimating mediation effects. In
units of X and Y bear meaningful interpretation. If, however, the
sX 1.5061
metric of either X or Y (or both) is arbitrary (as is the case in much the SPBY example, âb̂cs ⫽ âb̂ ⫽ (.2916)(⫺.0963) ⫽ ⫺.139
sY .3036
applied work), not easily interpretable, or not well calibrated to the
(95% CI [⫺.187, ⫺.097]), indicating that DVB decreases by .139
phenomenon of interest, it may not be sensible to directly interpret
standard deviations for every 1 SD increase in VAC indirectly via
ab. Without knowing more about the scales of VAC and DVB,
ATD.
how they are applied in certain areas, or what should be considered
To summarize the three effect size measures just described, note
“impressive” in the specific context of predicting deviant behavior
that all three may be expressed in terms of standardized regression
using the Deviant Behavior Report Scale, it is difficult to know
weights (␤) and standard deviations:
whether to be impressed by the finding that DVB is expected to
decrease by .0281 units per unit change in VAC indirectly through
ATD. A disadvantage of using ab as an effect size measure is that ab ⫽ ␤MX␤YM 冉冊
␴Y
␴X
; (10)
100 PREACHER AND KELLEY

ab ps ⫽ ␤MX␤YM 冉冊
1
␴X
; (11) 2
R 4.7 ⫽
共rMX
2
兲共rYM.X
2
2

RY,MX

. (15)

ab cs ⫽ ␤MX␤YM. (12) 2
The RY,MX 2
term in the expressions for R4.5 2
and R4.7 is the proportion
2
of variance in Y together explained by X and M; visually, RY,MX
It is interesting to note that the metric of M is absent from all corresponds to the proportion of the DVB circle in Figure 2 that is

冉 冊
2
␴M also covered by the VAC or ATD circles. The term rYX is the
three indices. The formula for coefficient a includes a term, squared correlation of X and Y (the proportion of the DVB circle
␴X
冉 冊
2
␴Y occluded by VAC), and rYM.X is the squared partial correlation of Y
and the formula for b includes a term; the ␴M terms cancel with M, partialling out X (the proportion of the DVB circle not
␴M
when a and b are multiplied (MacKinnon, 2000; Preacher & shared with VAC that is shared with ATD). Alternative expres-
Hayes, 2008b). It is a simple matter to construct confidence inter- sions yielding each of these indices purely in terms of multiple R2
vals for any of these indices (the bootstrap is recommended; (for ease of computation) are
Cheung, 2009), and none of them depend on sample size. Even
though abps is partially standardized, the fact that it relies in part 2
R 4.5 ⫽ RY,M
2
⫺ 共RY,MX
2
⫺ RY,X
2
兲; (13b)
on the metric of X prevents it from being used to compare indirect
effects across multiple studies, even though it can be used to 2
RM,X 共RY,MX
2
⫺ RY,X
2

quantify effect size for a given study if the scale of X can be
2
R 4.6 ⫽ ; (14b)
1 ⫺ RY,X
2
meaningfully interpreted. Of the three indices above, only abcs can
generally be used in other situations where it is important to 2
RM,X 共RY,MX
2
⫺ RY,X
2

compare indirect effects across situations using different metrics 2
R 4.7 ⫽ . (15b)
for X and/or Y. A possible limitation of abcs is that it is not RY,MX共1 ⫺ RY,X兲
2 2

bounded in the way that a correlation or a proportion is— either


2
component may be negative, and ␤YM may exceed 1.0. Neverthe- An equivalent expression for R4.5 is
less, unlike PM, abcs retains its interpretability when this happens.
On the other hand, not all methodologists support the use of 2
R 4.5 ⫽ rYM
2
⫺ rY(M.X)
2 , (16)
standardized effect sizes. Bond, Wiitala, and Richard (2003),
2
for example, strongly cautioned against the use of standardized where r Y(M.X) is the squared semipartial correlation of Y with the
2
mean differences in meta-analysis. Achen (1977), Baguley part of M from which X has been partialed. R 4.5 has a straight-
(2009), Greenland (1998), Greenland, Schlesselman, and Criqui forward interpretation as the overlap of the variances of X and
(1986), Kim and Ferree (1981), King (1986), and O’Grady Y that also overlaps with the variance of M, or “the variance in
(1982) are decidedly pessimistic about the use of correlations Y that is common to both X and M but that can be attributed to
and r2 and other standardized effect sizes for expressing effects, neither alone” (Fairchild, MacKinnon, Taborga, & Taylor,
as they depend on the variances of the measured variables. 2
2009, p. 488). Overall, R 4.5 has many of the characteristics of a
good effect size measure: (a) It increases as the indirect effect
approaches the total effect c and so conveys information useful
Indices of Explained Variance in judging practical importance; (b) it does not depend on
sample size; and (c) it is possible to form a confidence interval
A common type of effect size is expressed in terms of explained for the population value. In the SPBY data example,
variance. That is, the researcher often seeks to include predictors 2
R 4.5 ⫽ r YM2
⫺ 共R Y,MX2
⫺ r YX
2
兲 ⫽ (⫺.4932)2 ⫺ (.2456 ⫺
of a criterion such that the variance of residuals is reduced by some (⫺.1901) ) ⫽ .034 (95% CI [.010, .064]). In some situations,
2

nontrivial amount. For example, ␩2 and ␻2 in the analysis of 2


R 4.5 can be negative, as it is not literally the square of another
variance framework, intraclass correlation in the mixed-model value. Fairchild et al. (2009) noted that a negative R 4.5 2
can
framework, and R2 in the regression framework all can be inter- indicate that suppression rather than mediation is occurring.
preted as proportions of explained variance. These indices equate However, because negative values can occur, R 4.5 2
is not tech-
effect size with the proportion of the total variance in one variable nically a proportion of variance as the label R2 would seem to
shared with, or explained by, one or more other variables. They are imply (Fairchild et al., 2009). We believe this limits the use-
popular as effect size estimates in part because they use an easily fulness of R 4.52
as an effect size, but we do not rule out that it
interpretable standardized metric, namely, a proportion metric. may have heuristic value in certain situations.
Therefore, it is not surprising that such measures should be con- 2
Unlike R 4.5 2
, R 4.6 is a product of two squared correlations, in
sidered in the mediation context as well. this case the squared correlation between X and M and the
MacKinnon (2008) suggested three such measures for use in the squared partial correlation of M and Y, partialling for X. In other
mediation context. Here they are referred to by his equation 2
words, R 4.6 is the proportion of Y variance that is not associated
numbers (4.5, 4.6, and 4.7) to distinguish among them. with X but is associated with M, weighted by the proportion of
2
variance explained in M by X. Like R 4.5 , it increases roughly as
2
R 4.5 ⫽ rYM
2
⫺ 共RY,MX
2
⫺ rYX
2
兲; (13) 2
the indirect effect increases. Like R 4.5, it is standardized and
does not depend on n, and it is possible to form confidence
2
R 4.6 ⫽ 共rMX
2
兲共rYM.X
2
兲; (14) intervals for it. However, even though the lower bound is 0, and
REPORTING EFFECT SIZE FOR MEDIATION 101

it cannot exceed 1,5 R 4.6 2


is difficult to interpret because it is the values indicate suppression. In the SPBY example, SOS ⫽
product of two proportions of variance. Because it is the prod- .034/.036 ⫽ .934 (95% CI [.727, .999]). Because SOS can assume
uct of two R2 measures that are computed for different vari- values less than zero or greater than one, it is not strictly a
ables, it is not itself a proportion of variance as the label R2 proportion, but it does tend to increase with ab.
would imply.6 Therefore, it is not appropriate to interpret it on To summarize the R2 indices suggested by MacKinnon (2008)
an R2 metric. Of the three R2 indices suggested by MacKinnon, and Lindenberger and Pötter (1998), none can be interpreted as
2 2
R 4.6 bears the closest resemblance to ab, and regardless of its proportions. On the other hand, R4.6 does fall between 0 and 1
interpretability as a proportion, it mirrors effect size very well. inclusively, and its magnitude does correspond to that of ab (the
In the SPBY data example, R 4.6 2
⫽ (rMX 2 2
)(rYM.X ) ⫽ relationship is slightly concave up). All of the indices suggested by
(.2911) (⫺.4662) ⫽ .018 (95% CI [.009, .032]). That is, .018
2 2
MacKinnon (2008) are standardized and amenable to confidence
is the proportion of variance in deviant behavior that is not interval construction.
associated with achievement values but is associated with atti- Despite the obvious appeal of R2 indices as effect size indices,
tude toward deviance, weighted by the proportion of variance in Fichman (1999) reviews several reasons why researchers may
attitude toward deviance explained by achievement values. wish to be cautious when using R2 indices to compare theories.
2 2 2
R4.7 is simply R4.6 divided by RY,MX , the proportion of variance in According to Fichman (1999), R2 indices are not always useful for
Y together explained by X and M. Because it divides by a number comparing rival theories, can easily be misapplied or used incon-
2 2
that is between 0 and 1, R4.7 represents a simple rescaling of R4.6 . sistently, leading to overinterpretations or underinterpretations of
2
Correspondingly, we find R4.7 difficult to interpret. Whereas it is effect size, are context-dependent (Balluerka et al., 2005), and are
bounded from below by 0, it can exceed 1, but not in situations often less intuitive and more difficult to evaluate than one might
likely to correspond to mediation. Because of this, it (like the other think. Researchers often focus on explained variance, but in so
two R2 indices) cannot be interpreted on a standardized proportion doing they often neglect to understand the underlying process
metric. In the SPBY example, R24.7 ⫽ (rMX 2 2
)(rYM.X 2
)/RY,MX ⫽ itself. Furthermore, explained variance depends on how much
.0184/.2456 ⫽ .075 (95% CI [.041, .119]). variance there is to explain (Fern & Monroe, 1996; Henson, 2006;
We present plots to enable readers to anticipate the behavior of Nakagawa & Cuthill, 2007), and this quantity may differ between
various R2 statistics. We do not suggest that similar figures be studies, between populations, and between manipulated versus
produced in applied research. These figures are intended to help observed versions of the same variable, precluding the use of R2
readers better understand the ranges that the values can assume. indices for meaningfully comparing effects. Ozer (1985) cautioned
Each plot was created by generating 15,000 random 3 ⫻ 3 corre- that R2 may not be interpretable as a proportion of variance in
lation matrices,7 denoted R; fitting a simple mediation model to many circumstances, which undermines any effect size index that
each R; and plotting relevant statistics and effect size indices. For depends on this interpretation. Further, Sechrest and Yeaton
2
example, Figure 4 displays plots of R4.5 plotted against ab for (1982) pointed out that researchers often assume that the amount
15,000 randomly generated negative and indirect effects, holding of variance to be explained is 100%. However, this assumption is
the standardized total effect c constant at .2 (top) and .8 (bottom). rarely met in practice because few variables are measured without
From Figure 4 we can tell that when c is held constant, the most error. The explainable variance in Y is often much less than 100%.
extreme positive score of R4.5 2
is c2. The effect size cannot exceed Sechrest and Yeaton (1982) also pointed out that it is often difficult
2
the square of the standardized total effect. R4.6 is plotted as a to decide on the appropriate effect size to use, and different
function of ab in Figure 5 for 15,000 randomly generated indirect treatment strengths can result in very different effect sizes.
effects, holding the standardized total effect c constant at .2 (top) Finally, it could be argued that because population, rather than
2
and .8 (bottom). R4.7 is plotted as a function of ab in Figure 6 for sample, effect sizes are the true quantities of interest, then the
15,000 randomly generated indirect effects, holding the standard- researcher ought to adjust these R2 indices for positive bias (re-
ized total effect c constant at .2 (top) and .8 (bottom). sulting from using sample values to estimate population quantities)
A related index was suggested by Lindenberger and Pötter if they are to be used at all. For example, Ezekiel (1930) described
(1998). Their shared over simple effects (SOS) index is the ratio of
the variance in Y explained by both X and M divided by the an adjusted R2 index R̃Y.X2
⫽ 1⫺(1⫺RY.X2
)
n⫺1
n⫺m冉 冊 , where n is the
variance in Y explained by X: sample size, m is the number of regression parameters (intercept

1 2
SOS ⫽ 2 关rYM ⫺ 共1 ⫺ rYX兲rYM.X兴
2 2 , (17) 5
In order for R24.6 to exactly equal 1, X and M would have to be perfectly
rYX
correlated, and the squared semipartial correlation of Y with M would have
2
where rYM.X is the partial correlation of M and Y after partialling out to be exactly 1. Because this cannot occur without introducing perfect
collinearity, 1 is a limiting value and is not actually obtainable in practice.
X. A simpler expression for SOS in terms of indices already 6
Tatsuoka (1973, p. 281) reminded us that “the product of two propor-
presented is
tions is itself a meaningful proportion only when the second proportion is
2 based on that subset of the universe that is ‘earmarked’ by the first
R4.5 proportion.”
SOS ⫽ 2
. (18)
rYX 7
Matrices were generated using a fast Markov chain neighborhood
sampling method that retains generated matrices meeting a positive mini-
The authors describe SOS as the proportion of X-related vari- mum eigenvalue criterion. For more information, see Preacher (2006). We
ance in Y that is shared with M. Positive values of SOS indicate selected 15,000 matrices to visually convey the relative density of points in
mediation, a value of 0 indicates no indirect effect, and negative different regions of the plots.
102 PREACHER AND KELLEY

Owing to the moderately large sample size of n ⫽ 432 in the


2
SPBY data, R̃4.5 ⫽ .0333—not very different from the unad-
2
justed value of R4.5 ⫽ .0338. In smaller samples, such adjust-
2
ments would be more noticeable. Bias adjusted versions of R4.6 and
2
R4.7 are

2
R̃ 4.6 ⫽ 冉 1 ⫺ 共1 ⫺ r YM
2

n⫺1
n⫺2 冊冉 冉
1⫺
2

冊 冊
1 ⫺ RY,MX n⫺2
1 ⫺ rYX n ⫺ 3
2
,

(20)

and

冉 1 ⫺ 共1 ⫺ rYM
2

n⫺1
n⫺2 冊冉 冉1⫺ 冊 冊
1 ⫺ RY,MX
2
n⫺2
1 ⫺ rYX n ⫺ 3
2

冉 冊
2
R̃ 4.7 ⫽ ,
n ⫺ 1
1 ⫺ 共1 ⫺ RY,MX兲
2
n⫺3
(21)

respectively. See Wang and Thompson (2007) for an extended


discussion of Ezekiel’s (1930) and other potential adjustments to
r2 and R2.

Hansen and McNeal’s (1996) Effect Size Index for


Two Groups

Many applications of mediation analysis involve a binary X


(such as gender or experimental condition), where the purpose of
the analysis is to determine whether and to what extent the mean

Figure 4. Plots of R24.5 plotted against ab for 15,000 indirect effects,


holding the total effect c constant at .2 (top) and .8 (bottom).

2
and slopes), and X is a vector of regressors. The formula for R4.5
incorporating these adjustments would thus be

2
R̃ 4.5 冉
⫽ 1 ⫺ 共1 ⫺ rYM
2

n⫺1
n⫺2
⫺冊 冉冉 1 ⫺ 共1 ⫺ RY,MX
2

n⫺1
n⫺3 冊

⫺ 1 ⫺ 共1 ⫺ rYX
2

n⫺1
n⫺2 冊冊
n⫺1 n⫺1 Figure 5. Plots of R24.6 plotted against ab for 15,000 indirect effects,
⫽ 1 ⫹ 共1 ⫺ R Y,MX
2
兲 ⫹ 共rYX
2
⫹ rYM
2
⫺ 2兲 . (19)
n⫺3 n⫺2 holding the total effect c constant at .2 (top) and .8 (bottom).
REPORTING EFFECT SIZE FOR MEDIATION 103

effect size measures identified earlier than do the measures de-


scribed in the previous section.

A Residual-Based Index

The first new effect size we consider elaborates on a method


proposed by Berry and Mielke (2002) for effect size computation
in univariate or multivariate regression models. Their original
method involves computing functions of residuals for models
conforming to a null and alternative hypothesis, obtaining their
ratio, and subtracting the result from 1. We propose an index that
combines information about the variance in M explained by X and
the variance in Y explained by both X and M.
Berry and Mielke consider regression models conforming to
null and alternative hypotheses. In the univariate case where M is
regressed on a number of X variables, the null and alternative
models are, respectively (for case i’s data),

冘 冘
m0 m1

Mi ⫽ Xij␤0j ⫹ e0i and Mi ⫽ Xij␤1j ⫹ e1i,


j⫽1 j⫽1

(23, 24)

where m0 is the number of regressors under the null hypothesis,


m1 is the number of regressors under the alternative hypothesis
(m1⬎m0), i indexes cases, ␤0j and ␤1j are coefficients for the Xij
Figure 6. Plots of R24.7 plotted against ab for 15,000 indirect effects,
regressors in the null and alternative models, respectively,
holding the total effect c constant at .2 (top) and .8 (bottom).
and all variables are mean-centered so that intercepts can
be omitted. Residuals for the null and alternative models are
given by
difference in Y can be attributed to X indirectly through a mediator

冘 冘
M. Hansen and McNeal (1996) suggested an effect size index for m0 m1
mediation that can be obtained by applying a sample size adjust- e 0i ⫽ Mi ⫺ Xij␤0j and e1i ⫽ Mi ⫺ Xij␤1j,
ment to Sobel’s (1982) test statistic in such two-group designs. j⫽1 j⫽1
When X is a binary variable,
(25, 26)

ES ⫽
ab
冑a2sb2 ⫹ b2sa2 冑
1 1
⫹ ,
n1 n2
(22) respectively. The effect size is then computed as 1 ⫺
n n n n
冘 冑e / 冘 冑e
2
1i
2
0i ⫽ 1 ⫺ 冘 兩e 兩/ 冘 兩e
1i 兩. Because the denominator
0i
where n1 and n2 are the sample sizes of Group 1 and Group 2, i⫽1 i⫽1 i⫽1 i⫽1
respectively, and sa and sb are the standard errors of the regression sum will always exceed the numerator sum, Berry and Mielke’s
coefficients a and b, respectively. Sample values are substituted for (2002) effect size necessarily lies between 0 and 1.
their population counterparts. Note that sample size is introduced Mediation analysis, on the other hand, involves residuals for the
in the denominator of Sobel’s statistic by including the s2 terms. M equation and the Y equation. Researchers often expect that X
The intent of the multiplier added by Hansen and McNeal is to will explain a large amount of variance in both M and Y and that
remove that influence of sample size, rendering an index that does M will explain the same variance in Y that X explains. Therefore,
not depend on n. ES (effect size) is, in fact, relatively robust to the null scenario in mediation analysis is one in which there is no
large shifts in sample size. However, use of the ES index is limited explanation of variance in M or Y. The limiting alternative sce-
to settings in which X is binary. In addition, because the statistic is nario, on the other hand, is one in which X explains all of the
not bounded, standardized, or robust to changes in scale, it is variance in M, while X and M each explain all of the variance in
unclear how to interpret it. Y. The observed effect size will lie between these two extremes (0
and 1). These extreme values suggest a basis for defining the
New Methods of Expressing Effect Size for residuals to be used in a modification of Berry and Mielke’s (2002)
Mediation Effects index appropriate for mediation analysis.
First, we define the null model residuals for the M and Y
Two alternative approaches avoid some of the problems inher- equations (in which no variance is explained in either) as
ent in informal descriptors and ratio measures. These effect sizes
conform more closely to the definition and desiderata of good e 0Mi ⫽ Mi ⫺ M (27)
104 PREACHER AND KELLEY

and tion 31 with those obtained from using standardized scores instead
of raw scores in the regression model). That is, ␥ (or g) is Equation
e 0Yi ⫽ Yi ⫺ Y, (28) 31 applied to the residuals of regression models in which all of the
variables have been standardized. In the SPBY example, g ⫽ ␥ˆ ⫽
respectively, where M and Y are the means of M and Y. Second, we .044 (95% CI [.023, .072]).
define alternative model residuals for the M and Y equations A second complicating factor associated with Г and ␥ is that
(conforming to the estimated model) as they can be nonzero in situations where the indirect effect is
absent (i.e., ab ⫽ 0 but Г and ␥ are nonzero). Nevertheless, we
e 1Mi ⫽ eM.Xi do not consider nonzero residual-based effect sizes (Г or ␥)
necessarily problematic. If one considers the theoretically ideal
⫽ M i ⫺ dM.X ⫺ aXi (29) mediation effect as one in which X explains all the variance in
and M and both X and M explain all the variance in Y, then it is
sensible to quantify how close to that ideal we have come. The
e 1Yi ⫽ eY.Xi ⫹ eY.Mi ⫺ eY.XMi effect sizes Г and ␥ quantify this idea. This is one case in which
the effect size measure does not coincide with the way in which
⫽ 共Y i ⫺ dY.X ⫺ cXi兲 ⫹ 共Yi ⫺ dY.M ⫺ dMi兲 ⫺ 共Yi ⫺ dY.XM the effect itself is commonly operationalized—it is a measure of
total variance explained rather than a product of regression
⫺ bMi ⫺ c⬘Xi兲 coefficients. Therefore, we suggest that Г and ␥ can serve as
useful supplementary measures to report along with the indirect
⫽ Y i ⫺ dY.X ⫺ cXi ⫺ dY.M ⫺ dMi ⫹ dY.XM ⫹ bMi ⫹ c⬘Xi,
effect and other effect sizes, such as the unstandardized and
(30) standardized maximum possible indirect effect, which we now
discuss.
respectively, where a, b, and c⬘ are as defined earlier and d is the
slope relating M to Y with no other regressors in the model. The
residuals e1Yi correspond to that part of Y not explained jointly by Maximum Possible Indirect Effect and Its
X and M. Therefore, e1Yi is the part of Y not explained by X, plus Standardized Version
the part of Y not explained by M, minus the part these two
quantities share (so that it is not counted twice). Equation 30 is The second effect size we propose, and ultimately recommend, is
analogous to the way in which a joint probability is determined, the magnitude of the indirect effect relative to the maximum possible
where two probabilities are added and their intersection removed indirect effect. In general, an effect that may seem trivial in absolute
[i.e., P(A or B) ⫽ P(A) ⫹ P(B) – P(A and B)]. Ideally, the e1Yis will size may in fact be relatively large when one considers the range of
be as small as possible. These residuals are then combined to potential values the effect could have assumed, given characteristics
produce Г, a residual-based effect size index: of the design or distributional characteristics of the variables. Even
under ideal distributional conditions and linear relationships, there are

冘 冉冑 冊 冘
n n
real limits on the values that regression weights (and thus indirect
2
e1Mi
⫹ 冑e1Y
2
i
共 兩 e1Mi 兩 ⫹ 兩 e1Yi 兩 兲 effects) can take on, given certain characteristics of the data.
i⫽1 i⫽1
Г ⫽ 1⫺ ⫽ For example, consider a multiple regression model that accounts

冘 冉冑 冊 冘 for “only” .125 (raw) units of variance in the dependent variable.


n n

e2
0Mi ⫹ 冑e 2
0Yi 共 兩 e0Mi 兩 ⫹ 兩 e0Yi 兩 兲 Initially, accounting for only .125 units of variance may seem
i⫽1 i⫽1 trivial. However, if the variance of the dependent variable were
(31) only .15 units to begin with, the model accounts for 83.33%
"1 - " (.125/.15 ⫽ .8333) of the variance that it could have possibly
Г can be interpreted as a measure of the extent to which variance accounted for. Thus, looking at the raw value of the amount of
in M is explained by X, and variance in Y is explained jointly by variance accounted for does not necessarily give an accurate
X and M. It has the advantages of being directly interpretable and portrayal of the effectiveness of a regressor.
lying on a meaningfully scaled metric; Г is bounded above by 1 As another example, this time in the context of mediation,
and is very rarely less than 0 when mediation is in evidence. G, the consider the hypothetical situation in which sX2 ⫽ sM 2
⫽ sY2 ⫽ 1.0
sample estimate of Г, is also independent of sample size. Whereas and the total effect c ⫽ .6. Given these constraints, ab is not
confidence intervals may be constructed for Г using bootstrap bounded because b is not bounded. However, consider the case in
methods, as of yet, no exact analytic confidence interval formula- which we hold a fixed to some conditional value, like .3. When this
tion procedure is known to us. In the SPBY example, G ⫽ ⌫ˆ ⫽ is true, b is bounded (in fact, b must lie between ⫾.84), and
.049 (95% CI [.024, .081]). therefore ab is also bounded (here, to ⫾.25). Similarly, for a given
One complicating factor should be noted with respect to G: The value of b under the above constraints, the absolute value of a must
value of G is influenced by the scales of M and Y. If these scales lie within a certain range, and therefore ab is again bounded. The
differ, then G will be unduly influenced by either the residuals range of possible standardized indirect effects is presented graph-
associated with M or those associated with Y. Therefore, we ically on the vertical axis of Figure 7 for c ⫽ ⫺.19 (the standard-
suggest a standardized version, ␥ (g in samples), that has the same ized c coefficient from the SPBY example). From Figure 7 it can
formula but draws residuals from standardized regressions rather be seen that in the neighborhood of a ⫽ 0, the possible range of ab
than unstandardized regressions (i.e., replaces the errors in Equa- is restricted to the neighborhood of ab ⫽ 0. As a departs from 0 in
REPORTING EFFECT SIZE FOR MEDIATION 105

34 by the most extreme possible value with the same sign as ab.
This yields (after a few algebraic steps)

ab 僆 再 共b兲
␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM
2 2

␴X␴Y
2 2
2
冑␴X2 ␴Y2 ⫺ ␴YX2 .
冎 (35)

Taking the most extreme limit of the two limits from Equation
35 that is of the same sign as ab provides the maximum possible
indirect effect. Holding a and c constant, the equivalent bounds on
ab can be derived by beginning with the bounds implied for b and
multiplying by 共a兲 obtained from Equation 33, yielding

ab 僆 ⫾ 再 (a)
冑␴X2 ␴Y2 ⫺ ␴YX2
冑␴X2 ␴M2 ⫺ ␴MX
2 冎 . (36)

As above, taking the most extreme of the two limits from


Equation 36 that is of the same sign as ab provides the maximum
possible indirect effect. Rather than determining the possible range
of ab, the maximum possible indirect effect is obtained by the
product of 共a兲 and 共b兲:
Figure 7. Plot of the indirect effect ab versus a and b when X, M, and Y
are standardized and c ⫽ – 0.19.
共ab兲 ⫽ (a) 共b兲. (37)

Full derivations of these results can be found in Appendix A.


either direction, larger values of b become possible, in turn per- The obtained indirect effect âb̂ can be interpreted in light of this
mitting a greater potential range for values of ab. range. 共ab兲 will be identical for both of these methods. Notice
A logical question, then, is how can these bounds on a, b, and ab also that 共ab兲 can itself be used as an effect size, even though we
be determined? Hubert (1972) demonstrated how to obtain lower and primarily suggest that it be used as the standardizer in the calcu-
upper boundaries for elements of a covariance matrix. Consider the lation of another effect size we present below.
3 ⫻ 3 symmetric matrix S (which in the present case may be In sum, if the research question involves the effect size of an
considered the covariance matrix of X, M, and Y), partitioned as indirect effect, it is sensible to ask what the maximum attainable

冋 册
value of the indirect effect (in the direction of the observed indirect
␴X2 ␴MX ␴YX
A

S ⫽ G⬘
G
册 ␴
var(Y) ⫽ MX
␴M
2

␴YX ␴YM
␴YM .
␴Y2
(32)
effect) could have been, conditional on the sample variances and
on the magnitudes of relationships among some of the variables.8
Reporting that an indirect effect is ab ⫽ .57 tells us little in
S is nonnegative definite if and only if G⬘A⫺1 G ⱕ var共Y兲. This isolation (much like the amount of variance accounted for in the
restriction implies the following permissible range for the a coef- previous example), but when it is considered that the most extreme
ficient of a mediation model if b and c are held fixed: value ab could possibly have attained (given the observed c and
conditioning on either a or b) is .62, the effect size may be

a 僆 再␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM


2 2

␴X2 ␴Y2
2
冑␴X2 ␴Y2 ⫺ ␴YX2
冎 (33)
considered larger than if (ab) were .86.9
As an example of computing 共ab兲 in the SPBY data, first note
that the covariance matrix of VAC, ATD, and DVB is
(where 僆 here means “is contained in”) and the following per-
missible range for the b coefficient if a and c are held fixed:

b僆 再 冑␴X2 ␴Y2 ⫺ ␴YX2


⫾ 2 2 冎 . (34)
S⫽ 冋2.2683
0.6615
0.6615 ⫺0.0869
2.2764 ⫺0.2259 .
⫺0.0869 ⫺0.2259 0.0922
册 (38)

冑␴X␴M ⫺ ␴MX 2
The permissible ranges of a and b are thus
Given these restrictions, it is possible to derive boundaries for
the indirect effect ab given a fixed a and c or a fixed b and c. First, 8
In addition to restrictions imposed by the magnitudes of certain vari-
let 共·兲 be an operator that returns the most extreme possible
ances and coefficients, there is a further restriction on the possible size of
observable value of the argument parameter with the same sign as
an indirect effect. Carroll (1961) and Breaugh (2003) pointed out that,
the corresponding sample parameter estimate. For example, if b̂ ⫽ unless the two variables have equivalent distributions (e.g., both normal),
⫺.10 and the bounds identified for b in Equation 34 are ⫺.21 and their correlation cannot equal 1.0. Because variables are rarely perfectly
.21, (b) ⫽ ⫺.21. (b) would not be .21 because b̂ is negative, normally (or even equally) distributed in real applications, the maximum
necessitating that 共b兲 also be negative. Holding b and c constant, possible effect usually will be lower in practice than in theory.
the bounds on ab can be derived by beginning with the bounds 9
If only c is held to be known (rather than either {a and c} or {b and c}),
implied for a and multiplying by the 共b兲 identified in Equation these results imply a bounded region for ab.
Important: Do not use kappa2! See
Wen & Fan (2015, Psychological
106 PREACHER AND KELLEY Methods, 20(2)), 193-203.

a僆 再
sYMsYX ⫾ 冑sM sY ⫺ sYM
2 2

2 2
sXsY
2
冑sX2 sY2 ⫺ sYX2
冎 ␬2 is interpreted as the proportion of the maximum possible
indirect effect that could have occurred, had the constituent effects
been as large as the design and data permitted. ␬2 ⫽ 0 implies

冦冋 册冧
that there is no linear indirect effect, and ␬2 ⫽ 1 implies that the
⫺.2259 ⫻ ⫺.0869 ⫾ 冑共2.2764兲共.0922兲 ⫺ 共⫺.2259兲2
indirect effect is as large as it potentially could have been. We use
⫻ 冑共2.2683兲共.0922兲 ⫺ 共⫺.0869兲2
僆 the notation kappa-squared (i.e., ␬2 ) to denote that like the squared
共2.2683兲共.0922兲 multiple correlation coefficient, it (a) cannot be negative, (b) is
bounded (inclusively) between 0 and 1, and (c) represents the
僆 兵⫺.762, .950其, (39) proportion of the value of a quantity to the maximum value it could
have been. Otherwise, ␬2 and the population squared multiple
making 共a兲 ⫽ .950, and correlation coefficient have generally different properties. In order

再 冎
to estimate ␬2 , we suggest that sample values of the variances and
冑sX2 sY2 ⫺ sYX2 covariances replace their population counterparts. ␬2 is a standard-
b僆 ⫾
冑sX2 sM2 ⫺ sMX
2
ized value, as it is not wedded to the original scale of the variables,
allows (at least) bootstrap confidence intervals to be formed, and is

再 冑冑
僆 ⫾
共2.2683兲共.0922兲 ⫺ 共⫺.0869兲2
共2.2683兲共2.2764兲 ⫺ 共.6615兲2
冎 independent of sample size. We find these qualities to be advan-
tageous. For the SPBY example, the proportion of the maximum
observed indirect effect that was observed is
僆 兵⫺.207, .207其, (40)
âb̂ ⫺.0281
k 2 ⫽ ␬ˆ 2 ⫽ ⫽ ⫽ .143, (44)
making 共b兲 ⫽ ⫺.207. The sample bounds for ab are obtained 共âb̂兲 ⫺.1961
using Equation 35 and the outer bound for b:
with bootstrap 95% CI [.100, .190].

ab 僆 再 冉共b兲
sYMsYX ⫾ 冑sM sY ⫺ sYM
2 2

sX2 sY2
2
冑sX2 sY2 ⫺ sYX2
冊冎 R Tools

僆 兵⫺.2065共⫺.7618, .9495兲其 To encourage and facilitate the application of the methods we


have advocated for communicating the effect size of mediation
僆 兵⫺.196, .157其, (41) effects, we have developed a set of easy to use R functions, which
are contained in the MBESS (Kelley & Lai, 2010; Kelley, 2007a,
making (ab) ⫽ ⫺.196. Instead, using Equation 36 and the outer 2007b) R (R Development Core Team, 2010) package. The
bound for a, the sample bounds are specific MBESS functions are mediation(), mediation
.effect.bar.plot(), and mediation.effect.plot(),

ab 僆 ⫾ 再 共a兲
冑␴X2 ␴Y2 ⫺ ␴YX2
冑␴X2 ␴M2 ⫺ ␴MX
2 冎 which implement the mediation model and all of the mediation
effect sizes we have discussed, with or without bootstrap confi-
dence intervals. The functions mediation.effect.bar

再 冑共2.2683兲共.0922兲 ⫺ 共⫺.0869兲2

.plot() and mediation.effect.plot() can be used to
僆 ⫾.9495 create effect bar plots and effect plots, respectively—two graphical
冑共2.2683兲共2.2764兲 ⫺ 共.6615兲2 methods of communicating mediation effects (discussed on the
website). The mediation() function accepts either raw data or
僆 兵⫺.196, .196其, (42) summary statistics (i.e., means and variances/covariances) for sim-
ple mediation models, as we have described. The mediation()
making (ab) ⫽ ⫺.196, which was already known from Equation function reports the results of the three separate regression models
41. Regardless of whether 共ab兲 is calculated directly based on and all of the effect sizes, optionally with percentile and/or bias
Equation 37 or indirectly based on Equation 35 or Equation 36, the corrected accelerated bootstrap confidence intervals. Documenta-
value will always be the same. tion for the functions is contained within the MBESS package.
Given that (ab) ⫽ ⫺.196, the observed âb̂ of ⫺.028 implies
that even though the indirect effect is statistically significant, it is
much smaller than it could have been. This is a key point: Bound- Discussion
ing values of parameters often are not appreciated when interpret-
ing the magnitude and importance of effect sizes. Researchers should consider not only the statistical significance
Rather than considering the maximum value of the indirect of indirect effects but also the effect size of a given effect. We
effect as an effect size, per se, we use (ab) to define a reemphasize the growing consensus that reporting effect size is
standardized effect size that compares the value of ab to (ab). crucial to the advancement of psychological science. As Cumming
That is, we define the standardized effect size, which we de- et al. (2007) wrote,
note ␬ 2, It is important and urgent that psychology change its emphasis from the
dichotomous decision making of NHST to estimation of effect size . . .
ab Effect sizes must always be reported—in an appropriate measure, and
␬2 ⫽ . (43)
共ab兲 wherever possible with CIs—and then interpreted. To achieve this goal,
REPORTING EFFECT SIZE FOR MEDIATION 107

researchers need further detailed guidance, examples of good practice, effect size. We suggest that if the researcher wishes to use an effect
and editorial or institutional leadership. (pp. 231–232) size that does not fulfill all the desiderata we have outlined, it
should be supplemented with additional effect sizes.
It is hoped that this discussion has been a step in the right We encourage researchers to think about the most important
direction in the context of reporting and interpreting mediation aspects of the effects they wish to report and seek effect size
effects. This is an especially important type of statistical model in measures that address those aspects. To aid researchers in deciding
which to apply effect sizes, as mediation models are so widely which effect size measure(s) to report, in Table 3 we note, for each
used in research. effect size measure, whether it fulfills the three desiderata. More
We have discussed many effect sizes with potential application concretely, we recommend researchers report, at a minimum, the
in mediation analysis. The researcher may be at somewhat of a loss estimated value of ␬2 , the ratio of the obtained indirect effect to the
when choosing an appropriate effect size measure, given that there maximum possible indirect effect. The benefits of using ␬2 are that
are so many choices. We offer two suggestions that may render the
it is standardized, in the sense that its value is not wedded to the
choice easier. First, there is no reason to report only one effect size.
particular scale used in the mediation analysis; it is on an inter-
If circumstances permit, reporting multiple effect sizes can yield
pretable metric (0 to 1); it is insensitive to sample size; and with
greater understanding of a given effect, with the added benefit that
bootstrap methods, it allows for the construction of confidence
more effect size measures are available for possible use in meta-
intervals. We do not rule out an analytic method of confidence
analysis. As an analogy, regression results are often reported in a
interval formation for ␬2 , but for practical purposes, the bootstrap
table containing unstandardized regression coefficients, standard-
ized regression coefficients, and ⌬R2 for each regressor, R2, and confidence interval is advantageous.
2
RAdj. —five different types of effect sizes, with the first three effect An obvious question regarding ␬2 is “what constitutes a large
size measures being repeated for each regressor in the model. Each value?” As we have previously noted, a “large” value need not
of these effect sizes measures communicates different information constitute an important value, and an important value need not be
in different units. Additionally, a researcher desiring to communi- a “large” value. We also are very hesitant to put any qualitative
cate the meaning of an indirect effect in a mediation analysis might descriptors on a quantitative value. However, if one were forced to
also report the unstandardized indirect effect (ab) and the residual- attach such labels to ␬2 , we believe it makes sense to interpret them
based index ␥, or some other combination of effect sizes. in the same light as squared correlation coefficients are often
Second, earlier we presented three desiderata for good effect interpreted, that is, with Cohen’s (1988) guidelines. In particular,
size measurement: A good effect size should be scaled on a after some hesitation on the part of Cohen to define benchmarks
meaningful, but not necessarily standardized, metric; it should be for various effect sizes (1988, section 1.4), he ultimately concludes
amenable to the construction of confidence intervals; and it should that such benchmarks can be beneficial. For the proportion of
be independent, or nearly so, of sample size. The researcher should variance accounted for in one variable by another (i.e., r2xy), Cohen
remain cognizant of these desiderata when selecting an appropriate defines small, medium, and large effect sizes as .01, .09, and .25

Table 3
Characteristics of 16 Effect Size Measures for Mediation Analysis
Desideratum 1: Interpretable Desideratum 2: Confidence Desideratum 3: Independent
Effect size Standardized? Bounded? scaling? interval available? of sample size?
Verbal descriptors
PM ✓ ✓ ✓
RM ✓ ✓ ✓
SM ✓ ✓
ab ✓ ✓ ✓
abps Partially ✓ ✓ ✓
abcs ✓ ✓ ✓ ✓
R24.5 ✓ ✓ ✓
R24.6 ✓ ✓ ✓ ✓
R24.7 ✓ ✓ ✓
SOS ✓ ✓ ✓
ES ✓ ✓ ✓
⌫ Partially ✓ ✓ ✓
␥ ✓ Partially ✓ ✓ ✓
(ab) ✓ ✓ ✓
␬2 ✓ ✓ ✓ ✓ ✓
Note. SOS ⫽ shared over simple effects; ES ⫽ effect size.
108 PREACHER AND KELLEY

(pp. 79 – 81). Because of the similar properties of r2xy and ␬2 , we intervals. Correspondingly, it is vitally important that researchers
believe that the benchmarks for r2xy are similarly applicable for ␬2 . perform diagnostic checks to ensure that the assumptions of their
Recalling that in the SPBY data ␬2 ⫽ .143 with 95% CI [.100, inferential techniques are not obviously violated. It is well known
.190], one could argue that the mediation effect in the SPBY data that outliers can spuriously inflate or deflate statistical significance,
is at least medium (because the 95% confidence interval excludes Type II error rates (Wilcox, 2005), and confidence interval cover-
.09) but smaller than large (because the confidence interval ex- ages, but they can also inflate or deflate estimates of effect size.
cludes .25). Thus, the size of the mediation effect in the SPBY data Consequently, it is wise to determine the extent to which assump-
may be appropriately labeled as lying in the medium range. How- tions are met and to examine one’s data for outliers. If problems
ever, we emphasize that the best way to describe ␬2 is with its are detected, remedial steps should be taken, or appropriate caveats
quantitative value, estimated to be .143 for the SPBY data. should be included with the reported results.
To truly understand the value of ␬2 in a given context, compre- Fourth, all of the effect sizes we have discussed have limita-
hensive studies describing the typical values of ␬2 in well-defined tions. It is important to keep those limitations in mind when using
research areas would be very useful. Further, such effect sizes them. For example, the effect sizes discussed have not yet been
could be treated as dependent variables with various regressors/ extended for use in models involving multiple mediators. No effect
explanatory variables in a meta-analytic context, where an expla- size is universally applicable or meaningful in all contexts. Cor-
nation of various values of effect sizes is attempted. respondingly, researchers will need to decide which effect size
most appropriately conveys the meaning of the results in the
Limitations and Cautions particular context.
Fifth, effect sizes can depend on variability. Brandstätter (1999)
It is appropriate at this point to identify several limitations pointed out that the “degree of manipulation” can affect the value
and cautions in the application of effect sizes for mediation of the effect size. Cortina and Dunlap (1997) and Dooling and
effects. First, as is the case with virtually any effect size, Danks (1975) made similar points. This realization is important for
relatively small effect sizes may be substantively important, effect sizes in the context of mediation because X frequently is
whereas relatively large ones may be trivial, depending on the manipulated, yet the strength of the manipulation often is made
research context. An objectively small effect in high-stakes arbitrarily large to maximize power for detecting an effect. The
research may be deemed very important by the scientific com- effect size for such effects does not imply that a “large” effect
munity, whereas an objectively large effect in other fields may would be similarly astounding had X merely been observed rather
not reach a noteworthy level. Because of this, we caution than manipulated. In fact, McClelland (1997) and McClelland and
researchers to not rigidly interpret effect size measures against Judd (1993) advocated an “extreme groups” approach for detecting
arbitrary benchmarks. Snyder and Lawson (1993) emphasized effects, such that extremes are oversampled at the expense of
that using benchmarks to judge effect size estimates ignores central scores. Oversampling extreme groups is a worthwhile
judgments regarding clinical significance, the researcher’s per- approach when the goal is to maximize power in order to infer that
sonal value system, the research questions posed, societal con- differences exist. However, trustworthy and generalizable esti-
cerns, and the design of a particular study. Although we do not mates of standardized effect size require (a) random sampling or
argue against setting benchmarks, it is important that the field of (b) manipulation strength that matches what one would expect to
application to which the benchmarks apply should be clearly find in nature (see Cortina & DeShon, 1998, for a summary of
delineated. Further, a strong rationale should be given for why some of these points).
a particular value is given for a benchmark. Probably the safest
route is to simply report the effect size without providing
unnecessary and possibly misleading commentary about its size Future Directions
(Robinson, Whittaker, Williams, & Beretvas, 2003).
Second, we caution that it is a mistake to equate effect size with The methods we have discussed here are hardly definitive. For
practical importance or clinical significance (Thompson, 2002). example, results here are limited to the simple mediation model.
Certainly some values of some effect sizes can convey practical Extension to more complex mediation models, such as those for
importance, but depending on the particular situation, what is and panel data (Cole & Maxwell, 2003), moderated indirect effects
what is not practically important will vary. Fern and Monroe (Edwards & Lambert, 2007; Preacher, Rucker, & Hayes, 2007), or
(1996, pp. 103–104) cautioned that importance or substantive multiple mediators (MacKinnon, 2000; Preacher & Hayes, 2008b)
significance should not be inferred solely on the basis of the should be devised and investigated. As Maxwell and Cole (2007)
magnitude of an effect size. Several features of the research pointed out, P̂M is a biased estimate of effect size if one uses
context should be considered as well. Ultimately, the practical cross-sectional data when the effect of interest is one that takes
importance of an effect depends on the research context, the cost time to unfold. They go on to show that when X has greater
of data collection, the importance of the outcome variable, and the longitudinal stability than M, P̂M will be biased downward relative
likely impact of the results. Consequently, researchers are cau- to the corresponding longitudinal index. Conversely, when M is
tioned to avoid generalizing beyond the particular research design more stable than X, P̂M will be biased upward. This criticism is
employed. Effect sizes should serve only as guides to practical valid, and similar criticisms apply to any effect size measure based
importance, not as replacements for it, and are at best imperfect on the analysis of cross-sectional data when the process under
shadows of the true practical importance of an effect. study is a longitudinal one. The lesson here is that any effect size
Third, outliers and violations of assumptions of statistical meth- estimate must be interpreted in the context of the specific research
ods compromise effect size estimates, p-values, and confidence design used. The specific lags chosen to separate the measurement
REPORTING EFFECT SIZE FOR MEDIATION 109

of X, M, and Y are part of that context, so generalizing results confidence intervals. Specifically, observations should be inde-
beyond that context should be done with extreme caution. pendent and identically distributed, or the researcher risks ob-
Particularly useful would be studies conducted to establish taining confidence intervals with incorrect coverage. In addi-
defensible benchmarks for different effect size measures denot- tion, we have discussed limitations associated with each of the
ing small, medium, and large effects in particular research effect sizes we presented. It is important to explicitly consider
contexts. For example, a study could be conducted to establish the assumptions and limitations when reporting and interpreting
what values of abcs or ␬ 2 should be considered small, medium, effect size.
and large for alcoholism treatment studies to help determine 3. Report confidence intervals for population effect sizes.
what mechanisms are primarily responsible for explaining the Confidence intervals are necessary to communicate the degree of
effectiveness of intervention programs. The establishment of sampling uncertainty associated with estimates of effect size and
generally accepted benchmarks based on published research for are a valuable adjunct to any point estimate effect size measure.
different effect sizes in a variety of research contexts would Our most emphatic recommendation, however, is that meth-
facilitate meta-analysis. We believe ␬ 2, in addition to other odologists undertake more research to establish meaningful,
effect sizes, will be a useful measure in meta-analyses of trustworthy methods of communicating effect size and practical
mediation effects when the proportion of the maximum possible importance for mediation effects. Tests of mediation have pro-
indirect effect obtainable across different samples is an inter- liferated at an unprecedented rate in recent years, with a heavy
esting research question. ␬ 2 fulfills the desiderata for good emphasis on establishing statistical significance and very little
effect size estimates, and it is standardized (and therefore attention devoted to quantifying effect size and/or practical
independent of the scaling of variables) and bounded. importance. We fear that this lack of balance has led to a
We have not discussed sample size planning methods for proliferation of nonchance but trivially important mediation
mediation models, but it is an important issue. The power effects being reported in the literature. In addition, the lack of
analytic (e.g., Cohen, 1988) and the accuracy in parameter effect size reporting for mediation analyses has seriously lim-
estimation (AIPE) approaches to sample size planning (e.g., ited the accumulation of knowledge in some fields. Conse-
Kelley & Maxwell, 2003, 2008) should be considered. Theo- quently, we strongly urge researchers to consider not only
retically, for any effect of interest, sample size can be planned whether their effects are due to chance (i.e., is statistical sig-
so that there is a sufficiently high probability to reject a false nificance reached?) but also how large the effect sizes are and
null hypothesis (i.e., power analysis) and/or sample size can be how relevant they are to theory or practice.
planned so that the confidence interval is sufficiently narrow
(i.e., accuracy in parameter estimation; see Maxwell, Kelley, &
Rausch, 2008, for a review). “Whenever an estimate is of References
interest, so too should the corresponding confidence interval for
the population quantity” (Kelley, 2008, p. 553). The goal of Abelson, R. P. (1995). Statistics as principled argument. Hillsdale, NJ:
AIPE is to obtain a sufficiently narrow confidence interval that Erlbaum.
Achen, C. H. (1977). Measuring representation: Perils of the correlation
conveys the accuracy with which the population value has been
coefficient. American Journal of Political Science, 21, 805– 815. doi:
estimated by the point estimate of the effect size. If that 10.2307/2110737
confidence interval is wide for an effect size of interest, less is Albert, J. M. (2008). Mediation analysis via potential outcomes models.
known about the value of the population parameter than would Statistics in Medicine, 27, 1282–1304. doi:10.1002/sim.3016
be desirable. Moving forward, the power and AIPE approaches Alwin, D. F., & Hauser, R. M. (1975). The decomposition of effects in path
to sample size planning should be fully developed for effect analysis. American Sociological Review, 40, 37– 47. doi:10.2307/
sizes used in a mediation context. 2094445
American Educational Research Association. (2006). Standards for report-
Prescriptions for Research ing on empirical social science research in AERA publications. Wash-
ington, DC: Author.
American Psychological Association. (2010). Publication manual of the
Research on effect size for mediation effects is relatively new
American Psychological Association (6th ed.). Washington, DC:
and thus not fully developed. We nevertheless end by offering Author.
some concrete recommendations for researchers wanting to report Baguley, T. (2009). Standardized or simple effect size: What should be
effect size for mediation effects. We reiterate the “Three Reporting reported? British Journal of Psychology, 100, 603– 617. doi:10.1348/
Rules” suggested by Vacha-Haase and Thompson (2004) for re- 000712608X377117
porting effect size estimates; these rules are just as applicable in Balluerka, N., Gómez, J., & Hidalgo, D. (2005). The controversy over null
the mediation context as in many other contexts: hypothesis significance testing revisited. Methodology, 1, 55–70.
1. Be explicit about what effect size is being reported. Of- Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable
ten we see “effect size” reported with no indication as to whether distinction in social psychological research: Conceptual, strategic, and
the reported index is a correlation coefficient, mean difference, statistical considerations. Journal of Personality and Social Psychology,
51, 1173–1182. doi:10.1037/0022-3514.51.6.1173
Cohen’s d, ␩2, and so on, or why one measure was chosen over
Barreto, M., & Ellemers, N. (2005). The burden of benevolent sexism:
competing measures. The particular effect size cannot always be How it contributes to the maintenance of gender inequalities. European
accurately inferred from the context in which it was reported. Journal of Social Psychology, 35, 633– 642. doi:10.1002/ejsp.270
2. Interpret effect sizes considering both their assumptions Berry, K. J., & Mielke, P. W., Jr. (2002). Least sum of Euclidean regres-
and limitations. All of the effect sizes discussed here require sion residuals: Estimation of effect size. Psychological Reports, 91,
certain assumptions to be satisfied in order to obtain trustworthy 955–962. doi:10.2466/PR0.91.7.955-962
110 PREACHER AND KELLEY

Bird, K. D. (2002). Confidence intervals for effect sizes in analysis of problems in interpretation. The Journal of Consumer Research, 23,
variance. Educational and Psychological Measurement, 62, 197–226. 89 –105. doi:10.1086/209469
doi:10.1177/0013164402062002001 Fichman, M. (1999). Variance explained: Why size does not (always)
Bobko, P., & Rieck, A. (1980). Large sample estimators for standard errors matter. Research in Organizational Behavior, 21, 295–331.
of functions of correlation coefficients. Applied Psychological Measure- Fidler, F., & Thompson, B. (2001). Computing correct confidence intervals
ment, 4, 385–398. doi:10.1177/014662168000400309 for ANOVA fixed- and random-effects effect sizes. Educational and
Bond, C. F., Jr., Wiitala, W. L., & Richard, F. D. (2003). Meta-analysis of Psychological Measurement, 61, 575– 604.
raw mean differences. Psychological Methods, 8, 406 – 418. doi: Freedman, L. S. (2001). Confidence intervals and statistical power of the
10.1037/1082-989X.8.4.406 “Validation” ratio for surrogate or intermediate endpoints. Journal of
Brandstätter, E. (1999). Confidence intervals as an alternative to signifi- Statistical Planning and Inference, 96, 143–153. doi:10.1016/S0378-
cance testing. Methods of Psychological Research Online, 4, 33– 46. 3758(00)00330-X
Breaugh, J. A. (2003). Effect size estimation: Factors to consider and Frick, R. W. (1999). Defending the statistical status quo. Theory & Psy-
mistakes to avoid. Journal of Management, 29, 79 –97. doi:10.1177/ chology, 9, 183–189. doi:10.1177/095935439992002
014920630302900106 Greenland, S. (1998). Meta-analysis. In K. J. Rothman & S. Greenland
Buyse, M., & Molenberghs, G. (1998). Criteria for the validation of (Eds.), Modern epidemiology (pp. 643– 674). Philadelphia, PA: Lippin-
surrogate endpoints in randomized experiments. Biometrics, 54, 1014 – cott-Raven.
1029. doi:10.2307/2533853 Greenland, S., Schlesselman, J. J., & Criqui, M. H. (1986). The fallacy of
Carroll, J. B. (1961). The nature of data, or how to choose a correlation employing standardized regression coefficients and correlations as mea-
coefficient. Psychometrika, 26, 347–372. doi:10.1007/BF02289768 sures of effect. American Journal of Epidemiology, 123, 203–208.
Cheung, M. W.-L. (2009). Comparison of methods for constructing con- Grissom, R. J., & Kim, J. J. (2005). Effect size for research: A broad
fidence intervals of standardized indirect effects. Behavior Research practical approach. Mahwah, NJ: Erlbaum.
Methods, 41, 425– 438. doi:10.3758/BRM.41.2.425 Hansen, W. B., & McNeal, R. B. (1996). The law of maximum expected
Cohen, J. (1988). Statistical power analysis for the behavioral sciences potential effect: Constraints placed on program effectiveness by medi-
(2nd ed.). New York, NY: Academic Press. ator relationships. Health Education Research, 11, 501–507. doi:
Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with 10.1093/her/11.4.501
longitudinal data: Questions and tips in the use of structural equation Henson, R. K. (2006). Effect-size measures and meta-analytic thinking in
modeling. Journal of Abnormal Psychology, 112, 558 –577. doi: counseling psychology research. The Counseling Psychologist, 34, 601–
10.1037/0021-843X.112.4.558 629. doi:10.1177/0011000005283558
Cortina, J. M., & DeShon, R. P. (1998). Determining relative importance Huang, B., Sivaganesan, S., Succop, P., & Goodman, E. (2004). Statistical
of predictors with the observational design. Journal of Applied Psychol- assessment of mediational effects for logistic mediational models. Sta-
ogy, 83, 798 – 804. doi:10.1037/0021-9010.83.5.798 tistics in Medicine, 23, 2713–2728. doi:10.1002/sim.1847
Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of Hubert, L. J. (1972). A note on the restriction of range for Pearson
significance testing. Psychological Methods, 2, 161–172. doi:10.1037/ product-moment correlation coefficients. Educational and Psychologi-
1082-989X.2.2.161 cal Measurement, 32, 767–770. doi:10.1177/001316447203200315
Cumming, G., Fidler, F., Leonard, M., Kalinowski, P., Christiansen, A., James, L. R., & Brett, J. M. (1984). Mediators, moderators, and tests for
Kleinig, A., . . . Wilson, S. (2007). Statistical reform in psychology: Is mediation. Journal of Applied Psychology, 69, 307–321. doi:10.1037/
anything changing? Psychological Science, 18, 230 –232. doi:10.1111/ 0021-9010.69.2.307
j.1467-9280.2007.01881.x Jessor, R., & Jessor, S. L. (1977). Problem behavior and psychosocial
Cumming, G., & Finch, S. (2001). A primer on the understanding, use and development: A longitudinal study of youth. New York, NY: Academic
calculation of confidence intervals that are based on central and non- Press.
central distributions. Educational and Psychological Measurement, 61, Jessor, R., & Jessor, S. L. (1991). Socialization of problem behavior in
532–574. youth [Data file and code book]. Henry A. Murray Research Archive
Ditlevsen, S., Christensen, U., Lynch, J., Damsgaard, M. T., & Keiding, N. (http://www.murray.harvard.edu/), Cambridge, MA.
(2005). The mediation proportion: A structural equation approach for Kelley, K. (2007a). Confidence intervals for standardized effect sizes:
estimating the proportion of exposure effect on outcome explained by an Theory, application, and implementation. Journal of Statistical Soft-
intermediate variable. Epidemiology, 16, 114 –120. doi:10.1097/ ware, 20, 1–24.
01.ede.0000147107.76079.07 Kelley, K. (2007b). Methods for the behavioral, educational, and social
Dooling, D., & Danks, J. H. (1975). Going beyond tests of significance: Is sciences: An R package. Behavior Research Methods, 39, 979 –984.
psychology ready? Bulletin of the Psychonomic Society, 5, 15–17. Kelley, K. (2008). Sample size planning for the squared multiple correla-
Edwards, J. R., & Lambert, L. S. (2007). Methods for integrating moder- tion coefficient: Accuracy in parameter estimation via narrow confi-
ation and mediation: A general path analytic framework using moder- dence intervals. Multivariate Behavioral Research, 43, 524 –555. doi:
ated path analysis. Psychological Methods, 12, 1–22. doi:10.1037/1082- 10.1080/00273170802490632
989X.12.1.1 Kelley, K., & Lai, K. (2010). MBESS (Version 3.2.0) [Computer software
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the and manual]. Retrieved from http://www.cran.r-project.org/
American Statistical Association, 82, 171–185. doi:10.2307/2289144 Kelley, K., & Maxwell, S. E. (2003). Sample size for multiple regression:
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New Obtaining regression coefficients that are accurate, not simply signifi-
York, NY: Chapman & Hall. cant. Psychological Methods, 8, 305–321. doi:10.1037/1082-
Ezekiel, M. (1930). Methods of correlation analysis. New York, NY: 989X.8.3.305
Wiley. Kelley, K., & Maxwell, S. E. (2008). Power and accuracy for omnibus and
Fairchild, A. J., MacKinnon, D. P., Taborga, M. P., & Taylor, A. B. (2009). targeted effects: Issues of sample size planning with applications to
R2 effect-size measures for mediation analysis. Behavior Research multiple regression. In P. Alasuuta, L. Bickman, & J. Brannen (Eds.),
Methods, 41, 486 – 498. doi:10.3758/BRM.41.2.486 The Sage handbook of social research methods (pp. 166 –192). Newbury
Fern, E. F., & Monroe, K. B. (1996). Effect-size estimates: Issues and Park, CA: Sage.
REPORTING EFFECT SIZE FOR MEDIATION 111

Kim, J.-O., & Ferree, G. D., Jr. (1981). Standardization in causal analysis. O’Grady, K. E. (1982). Measures of explained variance: Cautions and
Sociological Methods & Research, 10, 187–210. limitations. Psychological Bulletin, 92, 766 –777. doi:10.1037/0033-
King, G. (1986). How not to lie with statistics: Avoiding common mistakes 2909.92.3.766
in quantitative political science. American Journal of Political Science, Ozer, D. (1985). Correlation and the coefficient of determination. Psycho-
30, 666 – 687. doi:10.2307/2111095 logical Bulletin, 97, 307–315. doi:10.1037/0033-2909.97.2.307
Kirk, R. E. (1996). Practical significance: A concept whose time has come. Ozer, D. J. (2007). Evaluating effect size in personality research. In R. W.
Educational and Psychological Measurement, 56, 746 –759. doi: Robins, R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research
10.1177/0013164496056005002 methods in personality psychology (pp. 495–501). New York, NY: The
Lin, D. Y., Fleming, T. R., & De Gruttola, V. (1997). Estimating the Guilford Press.
proportion of treatment effect explained by a surrogate marker. Preacher, K. J. (2006). Quantifying parsimony in structural equation mod-
Statistics in Medicine, 16, 1515–1527. doi:10.1002/(SICI)1097- eling. Multivariate Behavioral Research, 41, 227–259. doi:10.1207/
0258(19970715)16:13::AID-SIM5723.0.CO;2-1 s15327906mbr4103_1
Lindenberger, U., & Pötter, U. (1998). The complex nature of unique and Preacher, K. J., & Hayes, A. F. (2008a). Contemporary approaches to
shared effects in hierarchical linear regression: Implications for devel- assessing mediation in communication research. In A. F. Hayes, M. D.
opmental psychology. Psychological Methods, 3, 218 –230. doi: Slater, & L. B. Snyder (Eds.), The Sage sourcebook of advanced data
10.1037/1082-989X.3.2.218 analysis methods for communication research (pp. 13–54). Thousand
MacKinnon, D. P. (1994). Analysis of mediating variables in prevention Oaks, CA: Sage.
intervention studies. In A. Cazares & L. A. Beatty (Eds.), Scientific Preacher, K. J., & Hayes, A. F. (2008b). Asymptotic and resampling
methods for prevention intervention research (pp. 127–153; DHHS strategies for assessing and comparing indirect effects in multiple me-
Publication 94 –3631). NIDA Research Monograph, 139. diator models. Behavior Research Methods, 40, 879 – 891. doi:10.3758/
MacKinnon, D. P. (2000). Contrasts in multiple mediator models. In J. S. BRM.40.3.879
Rose, L. Chassin, C. C. Presson, & S. J. Sherman (Eds.), Multivariate Preacher, K. J., Rucker, D. D., & Hayes, A. F. (2007). Assessing moder-
applications in substance use research (pp. 141–160). Mahwah, NJ: ated mediation hypotheses: Theory, methods, and prescriptions. Multi-
Erlbaum. variate Behavioral Research, 42, 185–227.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. Raykov, T., Brennan, M., Reinhardt, J. P., & Horowitz, A. (2008). Com-
Mahwah, NJ: Erlbaum.
parison of mediated effects: A correlation structure modeling approach.
MacKinnon, D. P., & Dwyer, J. H. (1993). Estimating mediated effects in
Structural Equation Modeling, 15, 603– 626. doi:10.1080/
prevention studies. Evaluation Review, 17, 144 –158. doi:10.1177/
10705510802339015
0193841X9301700202
R Development Core Team. (2010). R: A language and environment for
MacKinnon, D. P., Fairchild, A. J., & Fritz, M. S. (2007). Mediation
statistical computing. Vienna, Austria: Author.
analysis. Annual Review of Psychology, 58, 593– 614. doi:10.1146/
Robinson, D. H., Whittaker, T. A., Williams, N. J., & Beretvas, S. N.
annurev.psych.58.110405.085542
(2003). It’s not effect sizes so much as comments about their magnitude
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., &
that mislead readers. The Journal of Experimental Education, 72, 51–
Sheets, V. (2002). A comparison of methods to test mediation and other
64. doi:10.1080/00220970309600879
intervening variable effects. Psychological Methods, 7, 83–104. doi:
Sechrest, L., & Yeaton, W. H. (1982). Magnitudes of experimental effects
10.1037/1082-989X.7.1.83
in social science research. Evaluation Review, 6, 579 – 600. doi:10.1177/
MacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A simulation study
0193841X8200600501
of mediated effect measures. Multivariate Behavioral Research, 30,
Shrout, P. E., & Bolger, N. (2002). Mediation in experimental and non-
41– 62. doi:10.1207/s15327906mbr3001_3
Mathieu, J. E., & Taylor, S. R. (2006). Clarifying conditions and decision experimental studies: New procedures and recommendations. Psycho-
points for mediational type inferences in organizational behavior. Jour- logical Methods, 7, 422– 445. doi:10.1037/1082-989X.7.4.422
nal of Organizational Behavior, 27, 1031–1056. doi:10.1002/job.406 Smithson, M. (2001). Correct confidence intervals for various regression
Maxwell, S. E., & Cole, D. A. (2007). Bias in cross-sectional analyses of effect sizes and parameters: The importance of noncentral distributions
longitudinal mediation. Psychological Methods, 12, 23– 44. doi: in computing intervals. Educational and Psychological Measurement,
10.1037/1082-989X.12.1.23 61, 605– 632. doi:10.1177/00131640121971392
Maxwell, S. E., Kelley, K., & Rausch, J. R. (2008). Sample size planning Snyder, P., & Lawson, S. (1993). Evaluating results using corrected and uncor-
for statistical power and accuracy in parameter estimation. Annual rected effect size estimates. Journal of Experimental Education, 61, 334–349.
Review of Psychology, 59, 537–563. doi:10.1146/annurev.psych Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in struc-
.59.103006.093735 tural equation models. In S. Leinhardt (Ed.), Sociological methodology 1982
McClelland, G. H. (1997). Optimal design in psychological research. (pp. 290–312). Washington, DC: American Sociological Association.
Psychological Methods, 2, 3–19. doi:10.1037/1082-989X.2.1.3 Tatsuoka, M. M. (1973). Multivariate analysis in education research.
McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detect- Review of Research in Education, 1, 273–319.
ing interactions and moderator effects. Psychological Bulletin, 114, Thompson, B. (2002). What future quantitative social science research
376 –390. doi:10.1037/0033-2909.114.2.376 could look like: Confidence intervals for effect sizes. Educational Re-
Moher, D., Hopewell, S., Schulz, K., Montori, V., Gøtzsche, P. C., searcher, 31, 25–32. doi:10.3102/0013189X031003025
Devereaux, P. J., . . . Altman, M. G. (2010). Consort 2010 explana- Thompson, B. (2007). Effect sizes, confidence intervals, and confidence
tion and elaboration: Updated guidelines for reporting parallel group intervals for effect sizes. Psychology in the Schools, 44, 423– 432.
randomized trials. British Medical Journal, 340, 698 –702. doi: doi:10.1002/pits.20234
10.1136/bmj.c869 Tofighi, D., MacKinnon, D. P., & Yoon, M. (2009). Covariances between
Nakagawa, S., & Cuthill, I. C. (2007). Effect size, confidence interval regression coefficient estimates in a single mediator model. British
and statistical significance: A practical guide for biologists. Biolog- Journal of Mathematical and Statistical Psychology, 62, 457– 484.
ical Reviews, 82, 591– 605. doi:10.1111/j.1469-185X.2007.00027.x doi:10.1348/000711008X331024
National Center for Education Statistics. (2003). NCES statistical stan- Vacha-Haase, T., Nilsson, J. E., Reetz, D. R., Lance, T. S., & Thompson,
dards. Washington, DC: Department of Education. B. (2000). Reporting practices and APA editorial policies regarding
112 PREACHER AND KELLEY

statistical significance and effect size. Theory & Psychology, 10, 413– Wang, Z., & Thompson, B. (2007). Is the Pearson r2 biased, and if so, what
425. doi:10.1177/0959354300103006 is the best correction formula? The Journal of Experimental Education,
Vacha-Haase, T., & Thompson, B. (2004). How to estimate and interpret 75, 109 –125. doi:10.3200/JEXE.75.2.109-125
various effect sizes. Journal of Counseling Psychology, 51, 473– 481. Wilcox, R. (2005). Introduction to robust estimation and hypothesis testing
doi:10.1037/0022-0167.51.4.473 (2nd ed.). San Diego, CA: Academic Press.
Wang, Y., & Taylor, J. M. G. (2002). A measure of the proportion of Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical
treatment effect explained by a surrogate marker. Biometrics, 58, 803– methods in psychology journals. American Psychologist, 54, 594 – 604.
812. doi:10.1111/j.0006-341X.2002.00803.x doi:10.1037/0003-066X.54.8.594

Appendix A

Derivation of Boundaries for Maximum Possible Indirect Effect

Correlations within a correlation matrix set limits on the ranges of the remaining correlations because of the
necessity to maintain positive definiteness. These range restrictions, in turn, imply range restrictions on
unstandardized regression weights subject to the variables’ variances. Beginning with correlations in a 3⫻3
matrix,

␳ 21␳ 32 ⫺ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 322 ⱕ ␳ 31 ⱕ ␳ 21␳ 32 ⫹ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 322, (A1)

␳ 31␳ 32 ⫺ 冑1 ⫺ ␳ 312 冑1 ⫺ ␳ 322 ⱕ ␳ 21 ⱕ ␳ 31␳ 32 ⫹ 冑1 ⫺ ␳ 312 冑1 ⫺ ␳ 322, (A2)

␳ 21␳ 31 ⫺ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312 ⱕ ␳ 32 ⱕ ␳ 21␳ 31 ⫹ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312. (A3)

For the simple mediation model considered in this article, in which X, M, and Y are variables 1, 2, and 3,
the corresponding standardized regression weights are

a ⫽ ␳21 , (A4)

␳32 ⫺ ␳21 ␳31


b ⫽ , (A5)
1 ⫺ ␳21
2

␳31 ⫺ ␳21 ␳32


c⬘ ⫽ . (A6)
1 ⫺ ␳21
2

The unstandardized regression weights are

␴M
a ⫽ ␳21 , (A7)
␴X

␳32 ⫺ ␳21 ␳31 ␴Y


b⫽ , (A8)
1 ⫺ ␳21
2
␴M

␳31 ⫺ ␳21 ␳32 ␴Y


c⬘ ⫽ . (A9)
1 ⫺ ␳21
2
␴X

The unstandardized indirect effect is therefore

␳32 ⫺ ␳21 ␳31 ␴Y


ab ⫽ ␳21
1 ⫺ ␳21
2
␴X

␴ XM共␴X2 ␴MY ⫺ ␴XM␴XY兲


⫽ . (A10)
␴M
2
共␴X2 兲2 共1 ⫺ ␴XM
2

(Appendix continues)
REPORTING EFFECT SIZE FOR MEDIATION 113

Now, consider the partitioned matrix:

A
⌺ ⫽ Gⴕ 冋 G
var(Y) . 册 (A11)

⌺ is nonnegative definite if and only if GⴕA⫺1 G ⱕ var(Y). Hubert (1972) showed the special case where
⌺⫽P, a correlation matrix:

A
P ⫽ Gⴕ 冋 G
var(Y) ⫽ 册
1
␳21
␳31
冋 ␳21
1
␳32
␳31
␳32 .
1
册 (A12)

In this special case, the theorem implies

1
1 ⫺ ␳ 21
2
[ ␳31 ␳32 ]
冋⫺␳1 21
⫺␳21
1 册冋␳␳ 册 ⱕ 1,
31
32

1 [ ␳31 ⫺ ␳21 ␳32


1 ⫺ ␳ 21
2
␳32 ⫺ ␳21 ␳31 ] ␳31
冋 册
␳32 ⱕ 1,

␳ 31共␳ 31 ⫺ ␳ 21␳ 32兲 ⫹ ␳ 32共␳ 32 ⫺ ␳ 21␳ 31兲


ⱕ 1,
1 ⫺ ␳ 21
2

␳ 21
2
⫹ ␳ 31
2
⫹ ␳ 32
2
⫺ 2␳ 21␳ 31␳ 32 ⫺ 1 ⱕ 0, (A13)

which can be solved algebraically (by completing the square) to obtain any of the three ranges from above
(Equations A1, A2, and A3). In the more general case of ⌺, we can obtain bounds for, say, ␴MX :

A
⌺ ⫽ Gⴕ冋 G

␴X2 ␴MX
var(Y) ⫽ MX册 ␴M
2

␴YX ␴YM
冋 ␴YX
␴YM ,
␴Y2
册 (A14)

implying

1
␴ ␴ ⫺ ␴MX
2
X
2
M
2
[ ␴YX ␴YM ] ␴M
⫺␴
2

MX
⫺␴MX
␴X2 冋 册冋 册 ␴YX
␴YM ⱕ ␴Y,
2

1
␴ ␴ ⫺ ␴MX
2
X
2
M
2
关␴YX␴M
2
⫺ ␴MX␴YM ⫺␴MX␴YX ⫹ ␴YM␴X2 兴 ␴YX 2
冋 册
␴YM ⱕ ␴Y,

␴ MX
2
␴Y2 ⫺ 2␴MX␴YM␴YX ⱕ ␴X2 ␴M ␴Y ⫺ ␴YM
2 2 2
␴X2 ⫺ ␴YX
2
␴M
2
,

2␴MX␴YM␴YX ␴X2 ␴M ␴Y ⫺ ␴YM


2 2 2
␴X2 ⫺ ␴YX
2
␴M
2

␴ MX
2
⫺ ⱕ ,
␴Y
2
␴Y2

2␴MX␴YM␴YX ␴YM␴YX
冉 冊 ␴X2 ␴M ␴Y ⫺ ␴YM ␴X2 ⫺ ␴YX ␴M
冉 冊
␴YM␴YX 2
2 2 2 2 2 2

␴ MX
2
⫺ ⫹ ⱕ ⫹ ,
␴Y2 ␴Y2 ␴Y2
␴Y2

冉 ␴YM␴YX
冊 ␴X2 ␴M ␴Y ⫺ ␴YM ␴X2 ⫺ ␴YX ␴M ␴YM␴YX 2
冉 冊
2 2 2 2 2 2

␴ MX ⫺ ⱕ ⫹ ,
␴Y2 ␴Y2
␴Y2

␴ MX 僆 再 ␴YM␴YX
␴Y2
⫾ 冑 ␴X2 ␴M ␴Y ⫺ ␴YM
2 2 2

␴Y2
␴X2 ⫺ ␴YX
2
␴M
2


␴YM␴YX
␴Y2
冉 冊冎
2

␴ MX 僆 再
␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM
2 2

␴Y2
2
冑␴X2 ␴Y2 ⫺ ␴YX2
, 冎 (A15)

(Appendix continues)
114 PREACHER AND KELLEY

with 僆 meaning “is contained in.” Ranges for the other two covariances are of similar form. The correlation
case is a special case of this more general treatment for covariances.
The bounds implied for regression coefficient a can be derived from the above result by simply isolating
a using its expression in covariance metric:

␴ MX 僆 再 ␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM


2 2

␴Y2
2
冑␴X2 ␴Y2 ⫺ ␴YX2
, 冎
␴ MX
␴X2 僆 再
␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM
2 2

␴X␴Y
2 2
2
冑␴X2 ␴Y2 ⫺ ␴YX2
, 冎
a僆 再 ␴YM␴YX ⫾ 冑␴M ␴Y ⫺ ␴YM
2 2

␴X␴Y
2 2
2
冑␴X2 ␴Y2 ⫺ ␴YX2 .
冎 (A16)

Another method for obtaining the bounds for a, using its correlation metric expression and altering the central
term until it equals the formula for a and simplifying, is

␳ 31␳ 32 ⫺ 冑1 ⫺ ␳ 312 冑1 ⫺ ␳ 322 ⱕ ␳ 21 ⱕ ␳ 31␳ 32 ⫹ 冑1 ⫺ ␳ 312 冑1 ⫺ ␳ 322,


␴M ␴ ␴ ␴ ␴
␳ 31␳ 32 ⫺ 冑1 ⫺ ␳31
2
冑1 ⫺ ␳322 ␴M ⱕ ␳21 ␴M ⱕ ␳31␳32 ␴M ⫹ 冑1 ⫺ ␳312 冑1 ⫺ ␳322 ␴M,
␴X X X X X

␴ YX␴YM
␴X2 ␴Y2
⫺ 冑 冉 冊冑 冉 冊
1⫺
␴YX
␴X ␴Y
2
1⫺
␴YM
␴M␴Y
2
␴M␴X ␴Y2
␴X2 ␴Y2
␴YX␴YM
ⱕaⱕ 2 2 ⫹
␴X␴Y 冑 冉 冊冑 冉 冊
1⫺
␴YX
␴X ␴Y
2
1⫺
␴YM
␴M␴Y
2
␴M␴X ␴Y2
␴X2 ␴Y2
,

␴ YX␴YM
␴X2 ␴Y2冑 冑
⫺ 1⫺
␴YX
2

␴X␴Y
2 2 1 ⫺
␴YM
2
␴M␴X ␴Y2
2 2 ⱕaⱕ
␴M␴Y ␴X␴Y
2 2
␴YX␴YM
␴X2 ␴Y2
⫹ 冑 冑
1⫺
␴YX2

␴2X␴Y2
1 ⫺
␴YM
␴M
2
␴M␴X ␴Y2
␴Y ␴X2 ␴Y2
2 2 ,

␴ YX␴YM ⫺ 冑␴Y2 ␴X2 ⫺ ␴YX 2


冑␴M2 ␴Y2 ⫺ ␴YM
2
␴YX␴YM ⫹ 冑␴Y2 ␴X2 ⫺ ␴YX 2
冑␴M2 ␴Y2 ⫺ ␴YM
2

ⱕ a ⱕ ,
␴X2 ␴Y2 ␴X2 ␴Y2

a 僆 再 ␴YX␴YM ⫾ 冑␴Y2 ␴X2 ⫺ ␴YX


␴X␴Y
2 2
2
冑␴M2 ␴Y2 ⫺ ␴YM
2
. 冎 (A17)

For b, a similar procedure could be followed:

␳ 21␳ 31 ⫺ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312 ⱕ ␳ 32 ⱕ ␳ 21␳ 31 ⫹ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312,


⫺ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312 ⱕ ␳ 32 ⫺ ␳ 21␳ 31 ⱕ 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312,
冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312 ␳ 32 ⫺ ␳ 21␳ 31 冑1 ⫺ ␳ 212 冑1 ⫺ ␳ 312
⫺ ⱕ ⱕ ,
1⫺␳ 2
21 1 ⫺ ␳ 21
2
1 ⫺ ␳ 21
2

␴ Y 冑1 ⫺ ␳ 31
2
␴Y 冑1 ⫺ ␳31
2

⫺ ⱕbⱕ , (A18)
␴ M 冑1 ⫺ ␳21
2
␴M 冑1 ⫺ ␳21
2

(Appendix continues)
REPORTING EFFECT SIZE FOR MEDIATION 115


冦 冧
␴YX2

␴Y 1 ⫺
␴Y2 ␴X2


b僆 ⫾ ,
␴MX
2

␴M 1⫺ 2 2
␴M␴X

b 僆 ⫾ 再 冑␴X2 ␴Y2 ⫺ ␴YX2


冑␴X2 ␴M2 ⫺ ␴MX
2 冎.

Now that bounds are known for b (given a and c) and for a (given b and c), the bounds for ab can be
determined. For given a and c, the bounds on ab can be derived by beginning with the bounds implied for b
and multiplying all terms by the conditional value 共a兲, the most extreme possible observable value of a with
the same sign as â (from Equation A16 or A17):

冑␴X2 ␴Y2 ⫺ ␴YX2 冑␴X2 ␴Y2 ⫺ ␴YX2


⫺ ⬍b⬍ ,
冑␴X␴M ⫺ ␴MX
2 2 2
冑␴X2 ␴M2 ⫺ ␴MX
2

冑␴X2 ␴Y2 ⫺ ␴YX2 冑␴X2 ␴Y2 ⫺ ␴YX2


⫺ (a) ⬍b 共a兲 ⬍ 共a兲 ,
冑␴X2 ␴M2 ⫺ ␴MX
2
冑␴X2 ␴M2 ⫺ ␴MX
2

ab 僆 ⫾再 (a)
冑␴X2 ␴Y2 ⫺ ␴YX2
冑␴X2 ␴M2 ⫺ ␴MX
2 冎 . (A19)

For given b and c, the bounds on ab can be derived by beginning with the bounds implied for a and
multiplying all terms by the conditional value 共b兲:

␴ YX␴YM ⫺ 冑␴Y2 ␴X2 ⫺ ␴YX


2
冑␴M2 ␴Y2 ⫺ ␴YM
2
␴YX␴YM ⫹ 冑␴Y2 ␴X2 ⫺ ␴YX
2
冑␴M2 ␴Y2 ⫺ ␴YM
2

ⱕaⱕ ,
␴X␴Y
2 2
␴X␴Y
2 2

␴YX␴YM ⫺ 冑␴Y2 ␴X2 ⫺ ␴YX 2


冑␴M2 ␴Y2 ⫺ ␴YM
2
␴YX␴YM ⫹ 冑␴Y2 ␴X2 ⫺ ␴YX 2
冑␴M2 ␴Y2 ⫺ ␴YM
2

共b兲 ⱕa 共b兲 ⱕ 共b兲 ,


␴X2 ␴Y2 ␴X2 ␴Y2

ab 僆 再 共b兲
␴YX␴YM ⫾ 冑␴Y2 ␴X2 ⫺ ␴YX
␴X␴Y
2 2
2
冑␴M2 ␴Y2 ⫺ ␴YM
2
. 冎 (A20)

The maximum possible indirect effect is obtained by the product of 共a兲 and 共b兲:

(ab) ⫽ 共a兲 共b兲. (A21)

Received September 20, 2009


Revision received August 20, 2010
Accepted August 30, 2010 䡲

You might also like