0% found this document useful (0 votes)
28 views14 pages

Investigating Variation in Replicability: A Many Labs'' Replication Project

This article presents a replication project investigating the variation in replicability of 13 psychological effects across 36 independent samples with a total of 6,344 participants. The findings indicate that while 10 effects replicated consistently, certain effects showed weak or no support for replicability, suggesting that replicability is more influenced by the effect itself than by sample or setting. The study aims to establish a framework for assessing replicability and to clarify the conditions under which psychological effects can be reliably observed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views14 pages

Investigating Variation in Replicability: A Many Labs'' Replication Project

This article presents a replication project investigating the variation in replicability of 13 psychological effects across 36 independent samples with a total of 6,344 participants. The findings indicate that while 10 effects replicated consistently, certain effects showed weak or no support for replicability, suggesting that replicability is more influenced by the effect itself than by sample or setting. The study aims to establish a framework for assessing replicability and to clarify the conditions under which psychological effects can be reliably observed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

THIS ARTICLE HAS BEEN CORRECTED.

SEE LAST PAGE

Replication

Investigating Variation in
Replicability
A ‘‘Many Labs’’ Replication Project
Richard A. Klein,1 Kate A. Ratliff,1 Michelangelo Vianello,2 Reginald B. Adams Jr.,3
Štěpán Bahník,4 Michael J. Bernstein,5 Konrad Bocian,6 Mark J. Brandt,7 Beach Brooks,1
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Claudia Chloe Brumbaugh,8 Zeynep Cemalcilar,9 Jesse Chandler,10,36 Winnee Cheong,11


William E. Davis,12 Thierry Devos,13 Matthew Eisner,10 Natalia Frankowska,6 David Furrow,15
Elisa Maria Galliani,2 Fred Hasselman,16,37 Joshua A. Hicks,12 James F. Hovermale,17
S. Jane Hunt,18 Jeffrey R. Huntsinger,19 Hans IJzerman,7 Melissa-Sue John,20
Jennifer A. Joy-Gaba,17 Heather Barry Kappes,21 Lacy E. Krueger,18 Jaime Kurtz,22
Carmel A. Levitan,23 Robyn K. Mallett,19 Wendy L. Morris,24 Anthony J. Nelson,3
Jason A. Nier,25 Grant Packard,26 Ronaldo Pilati,27 Abraham M. Rutchick,28
Kathleen Schmidt,29 Jeanine L. Skorinko,20 Robert Smith,14 Troy G. Steiner,3 Justin Storbeck,8
Lyn M. Van Swol,30 Donna Thompson,15 A. E. van ‘t Veer,7 Leigh Ann Vaughn,31
Marek Vranka,32 Aaron L. Wichman,33 Julie A. Woodzicka,34 and Brian A. Nosek29,35
1
University of Florida, Gainesville, FL, USA, 2University of Padua, Italy, 3The Pennsylvania State University, University Park,
PA, USA, 4University of Würzburg, Germany, 5Pennsylvania State University Abington, PA, USA, 6University of Social Sciences
and Humanities Campus Sopot, Poland, 7Tilburg University, The Netherlands, 8City University of New York, USA,
9
Koç University, Istanbul, Turkey, 10University of Michigan, Ann Arbor, MI, USA, 11HELP University, Kuala Lumpur, Malaysia,
12
Texas A&M University, College Station, TX, USA, 13San Diego State University, CA, USA, 14Ohio State University, Columbus,
OH, USA, 15Mount Saint Vincent University, Nova Scotia, Canada, 16Radboud University Nijmegen, The Netherlands,
17
Virginia Commonwealth University, Richmond, VA, USA, 18Texas A&M University-Commerce, TX, USA, 19Loyola University
Chicago, IL, USA, 20Worcester Polytechnic Institute, MA, USA, 21London School of Economics and Political Science, London,
UK, 22James Madison University, Harrisonburg, VA, USA, 23Occidental College, Los Angeles, CA, USA, 24McDaniel College,
Westminster, MD, USA, 25Connecticut College, New London, CT, USA, 26Wilfrid Laurier University, Waterloo, ON, Canada,
27
University of Brasilia, DF, Brazil, 28California State University, Northridge, CA, USA, 29University of Virginia, Charlottesville,
VA, USA, 30University of Wisconsin-Madison, WI, USA, 31Ithaca College, NY, USA, 32Charles University, Prague, Czech
Republic, 33Western Kentucky University, Bowling Green, KY, USA, 34Washington and Lee University, Lexington, VA, USA,
35
Center for Open Science, Charlottesville, VA, USA, 36PRIME Research, Ann Arbor, MI, USA, 37University Nijmegen,
The Netherlands

Abstract. Although replication is a central tenet of science, direct replications are rare in psychology. This research tested variation in the
replicability of 13 classic and contemporary effects across 36 independent samples totaling 6,344 participants. In the aggregate, 10 effects
replicated consistently. One effect – imagined contact reducing prejudice – showed weak support for replicability. And two effects – flag
priming influencing conservatism and currency priming influencing system justification – did not replicate. We compared whether the
conditions such as lab versus online or US versus international sample predicted effect magnitudes. By and large they did not. The results of this
small sample of effects suggest that replicability is more dependent on the effect itself than on the sample and setting used to investigate the
effect.

Keywords: replication, reproducibility, generalizability, cross-cultural, variation

Replication is a central tenet of science; its purpose is to Collaboration, 2012, 2014). Successful replication of an
confirm the accuracy of empirical findings, clarify the con- experiment requires the recreation of the essential condi-
ditions under which an effect can be observed, and estimate tions of the initial experiment. This is often easier said than
the true effect size (Brandt et al., 2013; Open Science done. There may be an enormous number of variables

Social Psychology 2014; Vol. 45(3):142–152  2014 Hogrefe Publishing. Distributed under the
DOI: 10.1027/1864-9335/a000178 Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
R. A. Klein et al.: Many Labs Replication Project 143

influencing experimental results, and yet only a few tested. contextualized – for example, replicating only with sample
In the behavioral sciences, many effects have been and situational characteristics that are highly consistent
observed in one cultural context, but not observed in others. with the original circumstances. The primary contribution
Likewise, individuals within the same society, or even the of this investigation is to establish a paradigm for testing
same individual at different times (Bodenhausen, 1990), replicability across samples and settings and provide a rich
may differ in ways that moderate any particular result. data set that allows the determinants of replicability to be
Direct replication is infrequent, resulting in a published explored. A secondary purpose is to demonstrate support
literature that sustains spurious findings (Ioannidis, 2005) for replicability for the 13 chosen effects. Ideally, the
and a lack of identification of the eliciting conditions for results will stimulate theoretical developments about
an effect. While there are good epistemological reasons the conditions under which replication will be robust to
for assuming that observed phenomena generalize across the inevitable variation in circumstances of data collection.
individuals and contexts in the absence of contrary evi-
dence, the failure to directly replicate findings is problem-
atic for theoretical and practical reasons. Failure to identify
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

moderators and boundary conditions of an effect may result


Method
This document is copyrighted by the American Psychological Association or one of its allied publishers.

in overly broad generalizations of true effects across situa-


tions (Cesario, 2014) or across individuals (Henrich, Heine,
& Norenzayan, 2010). Similarly, overgeneralization may Researcher Recruitment and Data Collection
lead observations made under laboratory observations to Sites
be inappropriately extended to ecological contexts that dif-
fer in important ways (Henry, MacLeod, Phillips, & Project leads posted a call for collaborators to the online
Crawford, 2004). Practically, attempts to closely replicate forum of the Open Science Collaboration on February 21,
research findings can reveal important differences in what 2013 and to the SPSP Discussion List on July 13, 2013.
is considered a direct replication (Schmidt, 2009), thus Other colleagues were contacted personally. For inclusion,
leading to refinements of the initial theory (e.g., Aronson, each replication team had to: (1) follow local ethical proce-
1992; Greenwald, Pratkanis, Leippe, & Baumgardner, dures, (2) administer the protocol as specified, (3) collect
1986). Close replication can also lead to the clarification data from at least 80 participants,1 (4) post a video simula-
of tacit methodological knowledge that is necessary to elicit tion of the setting and administration procedure, and
the effect of interest (Collins, 1974). (5) document key features of recruiting, sample, and any
changes to the standard protocol. In total, there were
36 samples and settings that collected data from a total of
6,344 participants (27 data collections in a laboratory and
Overview of the Present Research 9 conducted online; 25 from the US, 11 from other coun-
tries; see Table 1 for a brief description of sites and for a
Little attempt has been made to assess the variation in rep- full descriptions of sites, site characteristics, and participant
licability of findings across samples and research contexts. characteristics by site).
This project examines the variation in replicability of
13 classic and contemporary psychological effects across
36 samples and settings. Some of the selected effects are Selection of Replication Studies
known to be highly replicable; for others, replicability is
unknown. Some may depend on social context or partici- Twelve studies producing 13 effects were chosen based on
pant sample, others may not. We bundled the selected stud- the following criteria:
ies together into a brief, easy-to-administer experiment that
1. Suitability for online presentation. Our primary con-
was delivered to each participating sample through a single
infrastructure (http://projectimplicit.net/). cern was to give each study a ‘‘fair’’ replication that
There are many factors that can influence the replicabil- was true to the original design. By administering the
ity of an effect such as sample, setting, statistical power, study through a web browser, we were able to ensure
and procedural variations. The present design standardizes procedural consistency across sites.
procedural characteristics and ensures appropriate statistical 2. Length of study. We selected studies that could be
power in order to examine the effects of sample and setting administered quickly so that we could examine many
on replicability. At one extreme, sample and situational of them in a single study session.
characteristics might have little effect on the tested effects 3. Simple design. With the exception of one correlational
– variation in effect magnitudes may not exceed expected study, we selected studies that featured a simple,
random error. At the other extreme, effects might be highly two-condition design.

1
One sample fell short of this requirement (N = 79) but was still included in the analysis. All sites were encouraged to collect as many
participants as possible beyond the required 80, but the decision to end data collection was determined independently by each site.
Researchers had no access to the data prior to completing data collection.

 2014 Hogrefe Publishing. Distributed under the Social Psychology 2014; Vol. 45(3):142–152
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
144 R. A. Klein et al.: Many Labs Replication Project

Table 1. Data collection sites


Online (O) or US or
Site identifier Location N laboratory (L) international (I)
Abington Penn State Abington, Abington, PA 84 L US
Brasilia University of Brasilia, Brasilia, Brazil 120 L I
Charles Charles University, Prague, Czech Republic 84 L I
Conncoll Connecticut College, New London, CT 95 L US
CSUN California State University, Northridge, LA, CA 96 O US
Help HELP University, Malaysia 102 L I
Ithaca Ithaca College, Ithaca, NY 90 L US
JMU James Madison University, Harrisonburg, VA 174 O US
KU KoÅ University, Istanbul, Turkey 113 O I
Laurier Wilfrid Laurier University, Waterloo, Ontario, Canada 112 L I
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

LSE London School of Economics and Political Science, London, UK 277 L I


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Luc Loyola University Chicago, Chicago, IL 146 L US


McDaniel McDaniel College, Westminster, MD 98 O US
MSVU Mount Saint Vincent University, Halifax, Nova Scotia, Canada 85 L I
MTURK Amazon Mechanical Turk (US workers only) 1,000 O US
OSU Ohio State University, Columbus, OH 107 L US
Oxy Occidental College, LA, CA 123 L US
PI Project Implicit Volunteers (US citizens/residents only) 1,329 O US
PSU Penn State University, University Park, PA 95 L US
QCCUNY Queens College, City University of New York, NY 103 L US
QCCUNY2 Queens College, City University of New York, NY 86 L US
SDSU SDSU, San Diego, CA 162 L US
SWPS University of Social Sciences and Humanities Campus Sopot, Sopot, Poland 79 L I
SWPSON Volunteers visiting www.badania.net 169 O I
TAMU Texas A&M University, College Station, TX 187 L US
TAMUC Texas A&M University-Commerce, Commerce, TX 87 L US
TAMUON Texas A&M University, College Station, TX (Online participants) 225 O US
Tilburg Tilburg University, Tilburg, Netherlands 80 L I
UFL University of Florida, Gainesville, FL 127 L US
UNIPD University of Padua, Padua, Italy 144 O I
UVA University of Virginia, Charlottesville, VA 81 L US
VCU VCU, Richmond, VA 108 L US
Wisc University of Wisconsin-Madison, Madison, WI 96 L US
WKU Western Kentucky University, Bowling Green, KY 103 L US
WL Washington & Lee University, Lexington, VA 90 L US
WPI Worcester Polytechnic Institute, Worcester, MA 87 L US

4. Diversity of effects. We sought to diversify the sample ducted via computer. Exact wording for each study,
of effects by topic, time period of original investiga- including a link to the study, can be found in the supple-
tion, and differing levels of certainty and existing mentary materials. The relevant findings from the original
impact. Justification for study inclusion is described studies can be found in the original proposal.
in the registered proposal (http://osf.io/project/ 1. Sunk costs (Oppenheimer, Meyvis, & Davidenko, 2009).
aBEsQ/). Sunk costs are those that have already been incurred
and cannot be recovered (Knox & Inkster, 1968).
Oppenheimer et al. (2009; adapted from Thaler, 1985)
The Replication Studies asked participants to imagine that they have
tickets to see their favorite football team play an
All replication studies were translated into the dominant important game, but that it is freezing cold on the
language of the country of data collection (N = 7 languages day of the game. Participants rated their likelihood
total; 3/6 translations from English were back-translated). of attending the game on a 9-point scale (1 = definitely
Next, we provide a brief description of each experiment, stay at home, 9 = definitely go to the game). Partici-
original finding, and known differences between original pants were marginally more likely to go to the game
and replication studies. Most original studies were con- if they had paid for the ticket than if the ticket had
ducted with paper and pencil, all replications were con- been free.

Social Psychology 2014; Vol. 45(3):142–152  2014 Hogrefe Publishing. Distributed under the
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
R. A. Klein et al.: Many Labs Replication Project 145

2. Gain versus loss framing (Tversky & Kahneman, roll two 6’s or two 6’s and a 3. For the replication,
1981). The original research showed that changing the condition in which the man rolls two 6’s was
the focus from losses to gains decreases participants’ removed leaving two conditions.
willingness to take risks – that is, gamble to get a bet- 5. Low-versus-high category scales (Schwarz, Hippler,
ter outcome rather than take a guaranteed result. Par- Deutsch, & Strack, 1985). Schwarz and colleagues
ticipants imagined that the US was preparing for the (1985) demonstrated that people infer from response
outbreak of an unusual Asian disease, which is options what are low and high frequencies of a behav-
expected to kill 600 people. Participants were then ior, and self-assess accordingly. In the original demon-
asked to select a course of action to combat the disease stration, participants were asked how much TV they
from logically identical sets of alternatives framed in watch daily on a low-frequency scale ranging from
terms of gains as follows: Program A will save 200 ‘‘up to half an hour’’ to ‘‘more than two and a half
people (400 people will die), or Program B which hours,’’ or a high-frequency scale ranging from ‘‘up
has a 1/3 probability that 600 people will be saved to two and a half hours’’ to ‘‘more than four and a half
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

(nobody will die) and 2/3 probability that no people hours.’’ In the low-frequency condition, fewer partici-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

will be saved (600 people will die). In the ‘‘gain’’ pants reported watching TV for more than two and a
framing condition, participants are more likely to half hours than in the high-frequency condition.
adopt Program A, while this effect reverses in the loss 6. Norm of reciprocity (Hyman & Sheatsley, 1950).
framing condition. The replication replaced the phrase When confronted with a decision about allowing or
‘‘the United States’’ with the country of data collec- denying the same behavior to an ingroup and outgroup,
tion, and the word ‘‘Asian’’ was omitted from ‘‘an unu- people may feel an obligation to reciprocity, or consis-
sual Asian disease.’’ tency in their evaluation of the behaviors (Hyman &
3. Anchoring (Jacowitz & Kahneman, 1995). Jacowitz Sheatsley, 1950). In the original study, American par-
and Kahneman (1995) presented a number of scenarios ticipants answered two questions: whether communist
in which participants estimated size or distance after countries should allow American reporters in and
first receiving a number that was clearly too large or allow them to report the news back to American papers
too small. In the original study, participants answered and whether America should allow communist report-
3 questions about each of 15 topics for which they esti- ers into the United States and allow them to report
mated a quantity. First, they indicated if the quantity back to their papers. Participants reported more sup-
was greater or less than an anchor value. Second, they port for allowing communist reporters into America
estimated the quantity. Third, they indicated their con- when that question was asked after the question about
fidence in their estimate. The original number served allowing American reporters into the communist coun-
as an anchor, biasing estimates to be closer to it. For tries. In the replication, we changed the question
the purposes of the replication we provided anchoring slightly to ensure the ‘‘other country’’ was a suitable,
information before asking just for the estimated modern target (North Korea). For international replica-
quantity for four of the topics from the original study – tion, the target country was determined by the
distance from San Francisco to New York City, popu- researcher heading that replication to ensure suitability
lation of Chicago, height of Mt. Everest, and babies (see supplementary materials).
born per day in the US for countries that use the metric 7. Allowed/Forbidden (Rugg, 1941). Question phrasing
system, we converted anchors to metric units and can influence responses. Rugg (1941) found that
rounded them. respondents were less likely to endorse forbidding
4. Retrospective gambler’s fallacy (Oppenheimer & speeches against democracy than they were to not
Monin, 2009). Oppenheimer and Monin (2009) inves- endorse allowing speeches against democracy.
tigated whether the rarity of an independent, chance Respondents in the United States were asked, in one
observation influenced beliefs about what occurred condition, if the US should allow speeches against
before that event. Participants imagined that they democracy or, in another condition, whether the US
saw a man rolling dice in a casino. In one condition, should forbid speeches against democracy. Sixty-two
participants imagined witnessing three dice being percent of participants indicated ‘‘No’’ when asked if
rolled and all came up 6’s. In a second condition two speeches against democracy should be allowed, but
came up 6’s and one came up 3. In a third condition, only 46% indicated ‘‘Yes’’ when asked if these
two dice were rolled and both came up 6’s. All partic- speeches should be forbidden. In the replication, the
ipants then estimated, in an open-ended format, how words ‘‘The United States’’ were replaced with the
many times the man had rolled the dice before they name of the country the study was administered in.
entered the room to watch him. Participants estimated 8. Quote Attribution (Lorge & Curtiss, 1936). The source
that the man rolled dice more times when they had of information has a great impact on how that informa-
seen him roll three 6’s than when they had seen him tion is perceived and evaluated. Lorge and Curtiss

 2014 Hogrefe Publishing. Distributed under the Social Psychology 2014; Vol. 45(3):142–152
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
146 R. A. Klein et al.: Many Labs Replication Project

(1936) examined how an identical quote would be per- sented with demographic questions, with the back-
ceived if it was attributed to a liked or disliked individ- ground of the page manipulated between subjects.
ual. Participants were asked to rate their agreement In one condition the background showed a faint pic-
with a list of quotations. The quotation of interest ture of US$100 bills; in the other condition the back-
was, ‘‘I hold it that a little rebellion, now and then, ground was a blurred, unidentifiable version of the
is a good thing, and as necessary in the political world same picture. Next, participants completed an 8-ques-
as storms are in the physical world.’’ In one condition tion ‘‘system justification scale’’ (Kay & Jost, 2003).
the quote was attributed to Thomas Jefferson, a liked Participants in the money-prime condition scored
individual, and in the other it was attributed to Vladi- higher on the system justification scale than those
mir Lenin, a disliked individual. More agreement was in the control condition. The authors provided the ori-
observed when the quote was attributed to Jefferson ginal materials allowing us to construct a near identi-
than Lenin (reported in Moskowitz, 2004). In the rep- cal replication for US participants. However, the
lication, we used a quote attributed to either George stimuli were modified for international replications
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Washington (liked individual) or Osama Bin Laden in two ways: First, the US dollar was usually replaced
This document is copyrighted by the American Psychological Association or one of its allied publishers.

(disliked individual). with the relevant country’s currency (see supplemen-


9. Flag Priming (Carter, Ferguson, & Hassin, 2011; tary materials); Second, the system justification ques-
Study 2). The American flag is a powerful symbol in tions were adapted to reflect the name of the relevant
American culture. Carter et al. (2011) examined how country.
subtle exposure to the flag may increase conservatism 11. Imagined contact (Husnu & Crisp, 2010; Study 1).
among US participants. Participants were presented Recent evidence suggests that merely imagining con-
with four photos and asked to estimate the time of tact with members of ethnic outgroups is sufficient to
day at which they were taken. In the flag-prime condi- reduce prejudice toward those groups (Turner, Crisp,
tion, the American flag appeared in two of these pho- & Lambert, 2007). In Husnu and Crisp (2010), British
tos. In the control condition, the same photos were non-Muslim participants were assigned to either
presented without flags. Following the manipulation, imagine interacting with a British Muslim stranger
participants completed an 8-item questionnaire assess- or to imagine that they were walking outdoors (con-
ing views toward various political issues (e.g., abor- trol condition). Participants imagined the scene for
tion, gun control, affirmative action). Participants in one minute, and then described their thoughts for an
the flag-primed condition indicated significantly more additional minute before indicating their interest
conservative positions than those in the control condi- and willingness to interact with British Muslims on
tion. The priming stimuli used to replicate this finding a four-item scale. Participants in the ‘‘imagined con-
were obtained from the authors and identical to those tact’’ group had significantly higher contact inten-
used in the original study. Because it was impractical tions than participants in the control group. In the
to edit the images with unique national flags, the replication, the word ‘‘British’’ was removed from
American flag was always used as a prime. As a con- all references to ‘‘British Muslims.’’ Additionally,
sequence, the replications in the United States were the for the predominately Muslim sample from Turkey
only ones considered as direct replications. For inter- the items were adapted so Christians were the out-
national replications, the survey questions were group target.
adapted slightly to ensure they were appropriate for 12. Sex differences in implicit math attitudes (Nosek,
the political climate of the country, as judged by the Banaji, & Greenwald, 2002). As a possible account
researcher heading that particular replication (see sup- for the sex gap in participation in science and math,
plementary materials). Further, the original authors Nosek and colleagues (2002) found that women had
suggested possible moderators that they have consid- more negative implicit attitudes toward math com-
ered since publication of the original study. We pared to arts than men did in two studies of Yale
included three items at the very end of the replication undergraduates. Participants completed four Implicit
study to test these moderators: (1) How much do you Association Tests (IATs) in random order, one of
identify with being American? (1 = not at all; which measured associations of math and arts with
11 = very much), (2) To what extent do you think positivity and negativity. The replication simplified
the typical American is a Republican or Democrat? the design for length to be just a single IAT.
(1 = Democrat; 7 = Republican), (3) To what extent 13. Implicit math attitudes relations with self-reported
do you think the typical American is conservative or attitudes (Nosek et al., 2002). In the same study as
liberal? (1 = Liberal; 7 = Conservative). Effect 12, self-reported math attitudes were measured
10. Currency priming (Caruso, Vohs, Baxter, & Waytz, with a composite of feeling thermometers and seman-
2013). Money is a powerful symbol. Caruso et al. tic differential ratings, and the composite was posi-
(2013) provide evidence that merely exposing partic- tively related with the implicit measure. The
ipants to money increases their endorsement of the replication used a subset of the explicit items (see
current social system. Participants were first pre- supplementary materials).

Social Psychology 2014; Vol. 45(3):142–152  2014 Hogrefe Publishing. Distributed under the
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
R. A. Klein et al.: Many Labs Replication Project 147
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Figure 1. Replication results organized by effect. ‘‘X’’ indicates the effect size obtained in the original study. Large
circles represent the aggregate effect size obtained across all participants. Error bars represent 99% noncentral
confidence intervals around the effects. Small circles represent the effect sizes obtained within each site (black and
white circles for US and international replications, respectively).

Procedure Confirmatory Analysis Plan


The experiments were implemented on the Project Implicit Prior to data collection we specified a confirmatory analysis
infrastructure and all data were automatically recorded in a plan. All confirmatory analyses are reported either in text or
central database with a code identifying the sample source. in supplementary materials. A few of the tasks produced
After a paragraph of introduction, the studies were pre- highly erratic distributions (particularly anchoring) requir-
sented in a randomized order, except that the math IAT ing revisions to those analysis plans. A summary of differ-
and associated explicit measures were always the final ences between the original plans and actual analysis is
study. After the studies, participants completed an instruc- reported in the supplementary materials.
tional manipulation check (IMC; Oppenheimer et al.,
2009), a short demographic questionnaire, and then the
moderator measures for flag priming. See Table S12 for
IMC and summary demographic information by site. The
IMC was not analyzed further for this report. Each replica- Results
tion team had a private link for their participants, and they
coordinated their own data collection. Experimenters in lab- Summary Results
oratory studies were not aware of participant condition for
each task, and did not interact with participants during data Figure 1 presents an aggregate summary of replications of
collection unless participants had questions. Investigators the 13 effects, presenting each of the four anchoring effects
who led replications at specific sites completed a question- separately. Table 2 presents the original effect size, median
naire about the experimental setting (responses summarized effect size, weighted and unweighted effect size and
in Table S1), and details and videos of each setting along 99% confidence intervals, and proportion of samples that
with the actual materials, links to run the study, supplemen- rejected the null hypothesis in the expected and unexpected
tal tables, datasets, and original proposal are available at direction. In the aggregate, 10 of the 13 studies replicated
https://osf.io/ydpbf/. the original results with varying distance from the original

2
Table names that begin with the prefix ‘‘S’’ (e.g., Table S1) refer to tables that can be found in the supplementary materials. Tables with
no prefix are in this paper.

 2014 Hogrefe Publishing. Distributed under the Social Psychology 2014; Vol. 45(3):142–152
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
148

Table 2. Summary confirmatory results for original and replicated effects


Null hypothesis significance Null hypothesis significance
Original study Unweighted Weighted tests by sample (N = 36) tests of aggregate
Proportion
95% CI Median 99% CI 99% CI p < .05, Proportion
lower, replication Replication lower, Replication lower, opposite p < .05, same Proportion Key

Social Psychology 2014; Vol. 45(3):142–152


Effect ES upper ES ES upper ES upper direction direction ns statistics df N p
Anchoring – babies born 0.93 .51, 1.33 2.43 2.60 2.41, 2.79 2.42 2.33, 2.51 0.00 1.00 0.00 t = 90.49 5,607 5,609 <.001
Anchoring – Mt. Everest 0.93 .51, 1.33 2.00 2.45 2.12, 2.77 2.23 2.14, 2.32 0.00 1.00 0.00 t = 83.66 5,625 5,627 <.001
Allowed/forbidden 0.65 .57, .73 1.88 1.87 1.58, 2.16 1.96 1.88, 2.04 0.00 0.97 0.03 v2 = 3,088.7 1 6,292 <.001
Anchoring – Chicago 0.93 .51, 1.33 1.88 2.05 1.84, 2.25 1.79 1.71, 1.87 0.00 1.00 0.00 t = 65.00 5,282 5,284 <.001
Anchoring – distance to NYC 0.93 .51, 1.33 1.18 1.27 1.13, 1.40 1.17 1.09, 1.25 0.00 1.00 0.00 t = 42.86 5,360 5,362 <.001
Relations between I and 0.93 .77, 1.08 0.84 0.79 0.63, 0.96 0.79 0.75, 0.83 0.00 0.94 0.06 r = .38 5,623 <.001
E math attitudes
Retrospective gambler fallacy 0.69 .16, 1.21 0.61 0.59 0.49, 0.70 0.61 0.54, 0.68 0.00 0.83 0.17 t = 24.01 5,940 5,942 <.001
Gain vs. loss framing 1.13 .89, 1.37 0.58 0.62 0.52, 0.71 0.60 0.53, 0.67 0.00 0.86 0.14 v2 = 516.4 1 6,271 <.001
Sex differences in 1.01 .54, 1.48 0.59 0.56 0.45, 0.68 0.53 0.46, 0.60 0.00 0.71 0.29 t = 19.28 5,840 5,842 <.001
implicit math attitudes
Low vs. high category scales 0.50 .15, .84 0.50 0.51 0.42, 0.61 0.49 0.40, 0.58 0.00 0.67 0.33 v2 = 342.4 1 5,899 <.001
Quote attribution na 0.30 0.31 0.19, 0.42 0.32 0.25, 0.39 0.00 0.47 0.53 t = 12.79 6,323 6,325 <.001
Norm of reciprocity 0.16 .06, .27 0.27 0.27 0.18, 0.36 0.30 0.23, 0.37 0.00 0.36 0.64 v2 = 135.3 1 6,276 <.001
Sunk costs 0.23 .04, .50 0.32 0.31 0.22, 0.39 0.27 0.20, 0.34 0.00 0.50 0.50 t = 10.83 6,328 6,330 <.001
Imagined contact 0.86 .14, 1.57 0.12 0.10 0.00, 0.19 0.13 0.07, 0.19 0.03 0.11 0.86 t = 5.05 6,334 6,336 <.001
Flag priming 0.50 .01, .99 0.02 0.01 0.07, 0.08 0.03 0.04, 0.10 0.04 0.00 0.96 t = 0.88 4,894 4,896 0.38
Currency priming 0.80 .05, 1.54 0.00 0.01 0.06, 0.09 0.02 0.08, 0.04 0.00 0.03 0.97 t = 0.79 6,331 6,333 0.83
R. A. Klein et al.: Many Labs Replication Project

Notes. All effect sizes (ES) presented in Cohen’s d units. Weighted statistics are computed on the whole aggregated dataset (N > 6,000); Unweighted statistics are computed on the
disaggregated dataset (N = 36). 95% CI’s for original effect sizes used cell sample sizes when available and assumed equal distribution across conditions when not available. The
original anchoring article did not provide sufficient information to calculate effect sizes for individual scenarios, therefore an overall effect size is reported. The Anchoring original
effect size is a mean point-biserial correlation computed across 15 different questions in a test-retest design, whereas the present replication adopted a between-subjects design with
random assignments. One sample was removed from sex difference and relations between implicit and explicit math attitudes because of a systemic error in that laboratory’s
recording of reaction times. Flag priming includes only US samples. Confidence intervals around the unweighted mean are based on the central normal distribution. Confidence
intervals around the weighted effect size are based on noncentral distributions.

 2014 Hogrefe Publishing. Distributed under the


Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
R. A. Klein et al.: Many Labs Replication Project 149
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Figure 2. Replication results organized by site. Gray circles represent the effect size obtained for each effect within a
site. Black circles represent the mean effect size obtained within a site. Error bars represent 95% confidence interval
around the mean.

effect size. One study, imagined contact, showed a signifi- attitudes effect was within the 95% confidence interval of
cant effect in the expected direction in just 4 of the 36 sam- the original result, the replication estimate combined with
ples (and once in the wrong direction), but the confidence another large-scale replication (Nosek & Smyth, 2011) sug-
intervals for the aggregate effect size suggest that it is gests that the original effect was an overestimate.
slightly different than zero. Two studies – flag priming
and currency priming – did not replicate the original ef-
fects. Each of these had just one p-value < .05 and it was
in the wrong direction for flag priming. The aggregate ef- Variation Across Samples and Settings
fect size was near zero whether using the median, weighted
mean, or unweighted mean. All confidence intervals in- Figure 1 demonstrates substantial variation for some of the
cluded zero. Figure 1 presents all 36 samples for flag prim- observed effects. That variation could be a function of the
ing, but only US data collections were counted for the true effect size, random error, sample differences, or setting
confirmatory analysis (see Table 2). International samples differences. Comparing the intra-class correlation of sam-
also did not show a flag priming effect (weighted mean ples across effects (ICC = .005; F(35, 385) = 1.06,
d = .03, 99% CI [ .04, .10]). To rule out the possibility p = .38, 95% CI [ .027, .065]) with the intra-class
that the priming effects were contaminated by the contents correlation of effects across samples (ICC = .75;
of other experimental materials, we reexamined only those F(12,420) = 110.62, p < .001, 95% CI [.60, .89]) suggests
participants who completed these tasks first. Again, there that very little in the variability of effect sizes can be attrib-
was no effect (Flag Priming: t(431) = 0.33, p = .75, uted to the samples, and substantial variability is attribut-
95% CI [ .171, .240], Cohen’s d = .03; Currency Priming: able to the effect under investigation. To illustrate,
t(605) = 0.56, p = .57, 95% CI [ .201, .112], Cohen’s Figure 2 shows the same data as Figure 1 organized by
d = .05).3 sample rather than by effect. There is almost no variation
When an effect size for the original study could be in the average effect size across samples.
calculated, it is presented as an ‘‘X’’ in Figure 1. For three However, it is possible that particular samples would
effects (contact, flag priming, and currency priming), the elicit larger magnitudes for some effects and smaller mag-
original effect is larger than for any sample in the present nitudes for others. That might be missed by the aggregate
study, with the observed median or mean effect at or below analyses. Table 3 presents tests of whether the heterogene-
the lower bound of the 95% confidence interval for the ori- ity of effect sizes for each effect exceeds what is expected
ginal effect.4 Though the sex difference in implicit math by measurement error. Cochran’s Q and I2 statistics

3
None of the effects was moderated by which position in the study procedure it was administered.
4
The original anchoring report did not distinguish between topics so the aggregate effect size is reported.

 2014 Hogrefe Publishing. Distributed under the Social Psychology 2014; Vol. 45(3):142–152
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
150 R. A. Klein et al.: Many Labs Replication Project

Table 3. Tests of effect size heterogeneity


Heterogeneity statistics Moderation tests
US or Laboratory
Effect Q DF p I2 international p gp2 or online p gp2
Anchoring – babies born 59.71 35 0.01 0.402 0.16 0.69 0.00 16.14 <0.01 0.00
Anchoring – Mt. Everest 152.34 35 <.0001 0.754 94.33 <0.01 0.02 119.56 <0.01 0.02
Allowed/forbidden 180.40 35 <.0001 0.756 70.37 <0.01 0.01 0.55 0.46 0.00
Anchoring – Chicago 312.75 35 <.0001 0.913 0.62 0.43 0.00 32.95 <0.01 0.01
Anchoring – distance to NYC 88.16 35 <.0001 0.643 9.35 <0.01 0.00 15.74 <0.01 0.00
Relations between I and E math attitudes 54.84 34 <.0001 0.401 0.41* 0.52 <.001* 2.80* 0.09 <.001*
Retrospective gambler fallacy 50.83 35 0.04 0.229 0.40 0.53 0.00 0.34 0.56 0.00
Gain vs. loss framing 37.01 35 0.37 0.0001 0.09 0.76 0.00 1.11 0.29 0.00
Sex differences in implicit math attitudes 47.60 34 0.06 0.201 0.82 0.37 0.00 1.07 0.30 0.00
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Low vs. high category scales 36.02 35 0.42 0.192 0.16 0.69 0.00 0.02 0.88 0.00
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Quote attribution 67.69 35 <.001 0.521 8.81 <0.01 0.001 0.50 0.48 0.00
Norm of reciprocity 38.89 35 0.30 0.172 5.76 0.02 0.00 0.64 0.43 0.00
Sunk costs 35.55 35 0.44 0.092 0.58 0.45 0.00 0.25 0.62 0.00
Imagined contact 45.87 35 0.10 0.206 0.53 0.47 0.00 4.88 0.03 0.00
Flag priming 30.33 35 0.69 0 0.53 0.47 0.00 1.85 0.17 0.00
Currency priming 28.41 35 0.78 0 1.00 0.32 0.00 0.11 0.74 0.00
Notes. Tasks ordered from largest to smallest observed effect size (see Table 2). Heterogeneity tests conducted with R-package
metafor. REML was used for estimation for all tests. One sample was removed from sex difference and relations between implicit and
explicit math attitudes because of a systemic error in that laboratory’s recording of reaction times.
*Moderator statistics are F value of the interaction of condition and the moderator from an ANOVA with condition, country, and
location as independent variables with the exception of relations between impl. and expl. math attitudes for is reported the F value
associated with the change in R squared after the product term between the independent variable and the moderator is added in a
hierarchical linear regression model. Details of all analyses are available in the supplement.

revealed that heterogeneity of effect sizes was largely ob- effect of the manipulation. They did not (p’s = .48, .80,
served among the very large effects – anchoring, al- .62, .07, .05, all DR2 < .001). Details are available in the
lowed-forbidden, and relations between implicit and online supplement.
explicit attitudes. Only one other effect – quote attribution
– showed substantial heterogeneity. This appears to be
partly attributable to this effect occurring more strongly
in US samples and to a lesser degree in international Discussion
samples.
To test for moderation by key characteristics of the A large-scale replication with 36 samples successfully rep-
setting, we conducted a Condition · Country (US or licated eleven of 13 classic and contemporary effects in
other) · Location (lab or online) ANOVA for each effect. psychological science, some of which are well-known to
Table 3 presents the essential Condition · Country and be robust, and others that have been replicated infrequently
Condition · Location effects. Full model results are avail- or not at all. The original studies produced underestimates
able in supplementary materials. A total of 10 of the of some effects (e.g., anchoring-and-adjustment and
32 moderation tests were significant, and seven of those allowed versus forbidden message framing), and overesti-
were among the largest effects – anchoring and allowed- mates of other effects (e.g., imagined contact producing
forbidden. Even including those, none of the moderation ef- willingness to interact with outgroups in the future). Two
fect sizes exceeded a gp2 of .022. The heterogeneity in effects – flag priming influencing conservatism and cur-
anchoring effects may be attributable to differences in rency priming influencing system justification – did not
knowledge of the height of Mt Everest, distance to NYC, replicate.
or population of Chicago between the samples. Overall, A primary goal of this investigation was to examine the
whether the sample was collected in the US or elsewhere, heterogeneity of effect sizes by the wide variety of samples
or whether data collection occurred online or in the labora- and settings, and to provide an example of a paradigm for
tory, had little systematic effect on the observed results. testing such variation. Some studies were conducted online,
Additional possible moderators of the flag priming others in the laboratory. Some studies were conducted in
effect were suggested by the original authors. On the US the United States, others elsewhere. And, a wide variety
participants only (N  4,670), with five hierarchical regres- of educational institutions took part. Surprisingly, these fac-
sion models, we tested whether the items moderated the tors did not produce highly heterogeneous effect sizes.

Social Psychology 2014; Vol. 45(3):142–152  2014 Hogrefe Publishing. Distributed under the
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
R. A. Klein et al.: Many Labs Replication Project 151

Intraclass correlations suggested that most of the variation Note From the Editors
in effects was due to the effect under investigation and
almost none to the particular sample used. Focused tests Commentaries and a rejoinder on this paper are available
of moderating influences elicited sporadic and small effects (Crisp, Miles, & Husnu, 2014; Ferguson, Carter, & Hassin,
of the setting, while tests of heterogeneity suggested that 2014; Kahneman, 2014; Klein et al., 2014; Monin &
most of the variation in effects is attributable to measure- Oppenheimer, 2014; Schwarz & Strack, 2014; doi:
ment error. Further, heterogeneity was mostly restricted to 10.1027/1864-9335/a000202).
the largest effects in the sample – counter to an intuition
that small effects would be the most likely to be variable
across sample and setting. Further, the lack of heterogeneity
is particularly interesting considering that there is substan- Acknowledgments
tial interest and commentary about the contingency of
effects on our two moderators, lab versus online (Gosling, We thank Eugene Caruso, Melissa Ferguson, Daniel
Vazire, Srivastava, & John, 2004; Paolacci, Chandler, & Oppenheimer, and Norbert Schwarz for their feedback on
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Ipeirotis, 2010), and cultural variation across nations the design of the materials. This project was supported by
This document is copyrighted by the American Psychological Association or one of its allied publishers.

(Henrich et al., 2010). grants to the second and fifty-first authors from Project
All told, the main conclusion from this small sample of Implicit. Ratliff and Nosek are consultants of Project Impli-
studies is that, to predict effect size, it is much more impor- cit, Inc., a nonprofit organization that includes in its mission
tant to know what effect is being studied than to know the ‘‘to develop and deliver methods for investigating and
sample or setting in which it is being studied. The key vir- applying phenomena of implicit social cognition, including
tue of the present investigation is that the study procedure especially phenomena of implicit bias based on age, race,
was highly standardized across data collection settings. gender or other factors.’’ Author contributions: Designed
This minimized the likelihood that factors other than sam- research: R. K., B. N., K. R.; Translated materials: S. B.,
ple and setting contributed to systematic variation in K. B., M. Brandt, B. B., Z. C., N. F., E. G., F. H., H. I.,
effects. At the same time, this conclusion is surely con- R. K., R. P., A. V., M. Vianello, M. Vranka; Performed
strained by the small, nonrandom sample of studies repre- research: R. A., S. B., M. Bernstein, K. B., M. Brandt, C.
sented here. Additionally, the replication sites included in B., Z. C., J. C., W. C., W. D., T. D., M. E., N. F., D. F.,
this project cannot capture all possible cultural variation, E. G., J. A. H., J. F. H., S. J. H., J. H., H. I., M. J., J. J.,
and most societies sampled were relatively Western, H. K., R. K., L. K., J. K., C. L., R. M., W. M., A. N., J.
Educated, Industrialized, Rich, and Democratic (WEIRD; N., G. P., R. P., K. R., A. R., K. S., J. L. S., R. S., T. S.,
Henrich et al., 2010). Nonetheless, the present investigation J. S., L. V., D. T., A. V., L. V., M. Vranka, A. L. W., J.
suggests that we should not necessarily assume that there W.; Analyzed data: M. Vianello, F. H., R. K.; Wrote paper:
are differences between samples; indeed, even when mod- B. N., K. R., R. K., M. Vianello, J. C. We report all data
eration was observed in this sample, the effects were still exclusions, manipulations, measures, and how we deter-
quite robust in each setting. mined our sample sizes either in text or the online supple-
The present investigation provides a summary analysis ment. All materials, data, videos of the procedure, and the
of a very large, rich dataset. This dataset will be useful original preregistered design are available at the project
for additional exploratory analysis about replicability in page https://osf.io/ydpbf/.
general, and these effects in particular. The data are avail-
able for download at the Open Science Framework (https://
osf.io/ydpbf/).

Conclusion
This investigation offered novel insights into variation in References
the replicability of psychological effects, and specific infor-
mation about the replicability of 13 effects. This methodol- Aronson, E. (1992). The return of the repressed: Dissonance
ogy – crowdsourcing dozens of laboratories running an theory makes a comeback. Psychological Inquiry, 3,
303–311.
identical procedure – can be adapted for a variety of inves- Bodenhausen, G. V. (1990). Stereotypes as judgmental heuris-
tigations. It allows for increased confidence in the existence tics: Evidence of circadian variations in discrimination.
of an effect and for the investigation of an effect’s depen- Psychological Science, 1, 319–322.
dence on the particular circumstances of data collection Brandt, M. J., IJzerman, H., Dijksterhuis, A., Farach, F. J.,
(Open Science Collaboration, 2014). Further, a consortium Geller, J., Giner-Sorolla, R., . . . van ‘t Veer, A. (2013). The
of laboratories could provide mutual support for each other replication recipe: What makes for a convincing replication?
Journal of Experimental Social Psychology, 50, 217–224.
by conducting similar large-scale investigations on original Carter, T. J., Ferguson, M. J., & Hassin, R. R. (2011). A single
research questions, not just replications. Thus, collective ef- exposure to the American flag shifts support toward
fort could accelerate the identification and verification of Republicanism up to 8 months later. Psychological Science,
extant and novel psychological effects. 22, 1011–1018.

 2014 Hogrefe Publishing. Distributed under the Social Psychology 2014; Vol. 45(3):142–152
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
152 R. A. Klein et al.: Many Labs Replication Project

Caruso, E. M., Vohs, K. D., Baxter, B., & Waytz, A. (2013). Moskowitz, G. B. (2004). Social cognition: Understanding self
Mere exposure to money increases endorsement of free- and others. New York: Guilford Press.
market systems and social inequality. Journal of Experi- Nosek, B. A., Banaji, M. R., & Greenwald, A. G. (2002).
mental Psychology: General, 142, 301–306. Math = male, Me = female, therefore math 5 Me. Journal
Cesario, J. (2014). Priming, replication, and the hardest science. of Personality and Social Psychology, 83, 44–59.
Perspectives on Psychological Science, 9, 40–48. Nosek, B. A., & Smyth, F. L. (2011). Implicit social cognitions
Collins, H. M. (1974). The TEA set: Tacit knowledge and predict sex differences in math engagement and achieve-
scientific networks. Science Studies, 4, 165–185. ment. American Educational Research Journal, 48, 1125–
Crisp, R. J., Miles, E., & Husnu, S. (2014). Support for the 1156.
replicability of imagined contact effects. Commentaries and Open Science Collaboration. (2012). An open, large-scale,
rejoinder on Klein et al. (2014). Social Psychology. Advance collaborative effort to estimate the reproducibility of psy-
online publication. doi: 10.1027/1864-9335/1000202 chological science. Perspectives on Psychological Science,
Ferguson, M. J., Carter, T. J., & Hassin, R. R. (2014). Com- 7, 657–660.
mentary on the attempt to replicate the effect of the Open Science Collaboration. (2014). The reproducibility pro-
American flag on increased republican attitudes. Commen- ject: A model of large-scale collaboration for empirical
taries and rejoinder on Klein et al. (2014). Social Psychol- research on reproducibility. In V. Stodden, F. Leisch, &
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ogy. Advance online publication. doi: 10.1027/1864-9335/ R. Peng (Eds.), Implementing reproducible computational
This document is copyrighted by the American Psychological Association or one of its allied publishers.

1000202 research (A volume in the R series) (pp. 299–323). New


Gosling, S. D., Vazire, S., Srivastava, S., & John, O. P. (2004). York, NY: Taylor & Francis.
Should we trust web-based studies? A comparative analysis Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009).
of six preconceptions about Internet questionnaires. Instructional manipulation checks: Detecting satisficing to
American Psychologist, 59, 93. increase statistical power. Journal of Experimental Social
Greenwald, A. G., Pratkanis, A. R., Leippe, M. R., & Psychology, 45, 867–872.
Baumgardner, M. H. (1986). Under what conditions does Oppenheimer, D. M., & Monin, B. (2009). The retrospective
theory obstruct research progress? Psychological Review, gambler’s fallacy: Unlikely events, constructing the past,
93, 216–229. and multiple universes. Judgment and Decision Making, 4,
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Most people 326–334.
are not WEIRD. Nature, 466, 29. Paolacci, G., Chandler, J., & Ipeirotis, P. (2010). Running
Henry, J. D., MacLeod, M. S., Phillips, L. H., & Crawford, J. R. experiments on Amazon Mechanical Turk. Judgment and
(2004). A meta-analytic review of prospective memory and Decision Making, 5, 411–419.
aging. Psychology and Aging, 19, 27. Rugg, D. (1941). Experiments in wording questions: II. Public
Husnu, S., & Crisp, R. J. (2010). Elaboration enhances the Opinion Quarterly, 5, 91–92.
imagined contact effect. Journal of Experimental Social Schmidt, S. (2009). Shall we really do it again? The powerful
Psychology, 46, 943–950. concept of replication is neglected in the social sciences.
Hyman, H. H., & Sheatsley, P. B. (1950). The current status of Review of General Psychology, 13, 90–100.
American public opinion. In J. C. Payne (Ed.), The teaching Schwarz, N., Hippler, H. J., Deutsch, B., & Strack, F. (1985).
of contemporary affairs: 21st yearbook of the National Response scales: Effects of category range on reported
Council of Social Studies (pp. 11–34). New York, NY: behavior and comparative judgments. Public Opinion
National Council of Social Studies. Quarterly, 49, 388–395.
Ioannidis, J. P. (2005). Why most published research findings Schwarz, N., & Strack, F. (2014). Does merely going through
are false. PLoS Medicine, 2, e124. the same moves make for a ‘‘direct’’ replication? Concepts,
Jacowitz, K. E., & Kahneman, D. (1995). Measures of anchor- contexts, and operationalizations. Commentaries and rejoin-
ing in estimation tasks. Personality and Social Psychology der on Klein et al. (2014). Social Psychology. Advance
Bulletin, 21, 1161–1166. online publication. doi: 10.1027/1864-9335/1000202
Kahneman, D. (2014). A new etiquette for replication. Com- Thaler, R. (1985). Mental accounting and consumer choice.
mentaries and rejoinder on Klein et al. (2014). Social Marketing Science, 4, 199–214.
Psychology. Advance online publication. doi: 10.1027/1864- Turner, R. N., Crisp, R. J., & Lambert, E. (2007). Imagining
9335/1000202 intergroup contact can improve intergroup attitudes. Group
Kay, A. C., & Jost, J. T. (2003). Complementary justice: Effects Processes and Intergroup Relations, 10, 427–441.
of ‘‘poor but happy’’ and ‘‘poor but honest’’ stereotype Tversky, A., & Kahneman, D. (1981). The framing of decisions
exemplars on system justification and implicit activation of and the psychology of choice. Science, 211, 453–458.
the justice motive. Journal of Personality and Social
Psychology, 85, 823–837.
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B. Jr.,
Bahnk, ., Bernstein, M. J., . . . Nosek, B. A. (2014). Received March 4, 2013
Theory building through replication: Response to commen-
taries on the ‘‘Many Labs’’ replication project. Commentar-
Accepted November 26, 2013
ies and rejoinder on Klein et al. (2014). Social Psychology. Published online May 19, 2014
Advance online publication. doi: 10.1027/1864-9335/
1000202
Knox, R. E., & Inkster, J. A. (1968). Postdecision dissonance at
post time. Journal of Personality and Social Psychology, 8, 319. Richard A. Klein
Lorge, I., & Curtiss, C. C. (1936). Prestige, suggestion, and
attitudes. The Journal of Social Psychology, 7, 386–402. Department of Psychology
Monin, B., & Oppenheimer, D. M. (2014). The limits of direct University of Florida
replications and the virtues of stimulus sampling. Commen- Gainesville, FL 32611
taries and rejoinder on Klein et al. (2014). Social Psychol- USA
ogy. Advance online publication. doi: 10.1027/1864-9335/ E-mail [email protected]
1000202

Social Psychology 2014; Vol. 45(3):142–152  2014 Hogrefe Publishing. Distributed under the
Hogrefe OpenMind License http://dx.doi.org/10.1027/a000001
Erratum
Correction to Klein et al. (2014)
https://doi.org/10.1027/1864-9335/a000178

The article entitled “Investigating Variation in Replicability: In addition, a typo was incorrect in Table 2, that led to
A ‘Many Labs’ Replication Project” by Klein, R. A. et al. the df and N reported for one of the anchoring studies to
(2014, Social Psychology, 45(3), 142–152. https://doi.org/ be slightly off.
10.1027/1864-9335/a000178) contained some errors. Please find below the corrections that the authors sug-
One line of code was incorrect in the script that gener- gest, in the order of their appearance. The authors regret
ated results for Rugg (1941, see Figure 1). Effectively, the any inconvenience or confusion these errors may have
authors failed to correctly invert two of the columns (see caused.
Tables 2 and 3). The revised statistics do not alter the sub-
stantive conclusions for this effect (e.g., it remains a suc-
cessful replication), however the correct effect size is
much smaller and closer to the result reported in the orig-
inal study. Page 147, Figure 1

Figure 1. Replication results organized by effect. “X” indicates the effect size obtained in the original study. Large circles represent the aggregate
effect size obtained across all participants. Error bars represent 99% noncentral confidence intervals around the effects. Small circles represent
the effect sizes obtained within each site (black and white circles for US and international replications, respectively).

Ó 2019 Hogrefe Publishing Social Psychology (2019), 50(3), 211–213


https://doi.org/10.1027/1864-9335/a000373
212

Table 2. Summary confirmatory results for original and replicated effects


Effect Original study Unweighted Weighted Null hypothesis significance Null hypothesis significance
tests by sample (N = 36) tests of aggregate

ES 95% CI Median Replication 99% CI Replication 99% CI Proportion Proportion Proportion Key df N p

Social Psychology (2019), 50(3), 211–213


lower, upper replication ES lower, upper ES lower, upper p < .05, p < .05, ns statistics
ES opposite same
direction direction
Anchoring – babies born 0.93 .51, 1.33 2.43 2.60 2.41, 2.79 2.42 2.33, 2.51 0.00 1.00 0.00 t = 90.49 5,607 5,609 <.001
Anchoring – Mt. Everest 0.93 .51, 1.33 2.00 2.45 2.12, 2.77 2.23 2.14, 2.32 0.00 1.00 0.00 t = 83.66 5,625 5,627 <.001
Anchoring – Chicago 0.93 .51, 1.33 1.88 2.05 1.84, 2.25 1.79 1.71, 1.87 0.00 1.00 0.00 t = 65.00 5,282 5,284 <.001
Anchoring – distance to NYC 0.93 .51, 1.33 1.18 1.27 1.13, 1.40 1.17 1.09, 1.25 0.00 1.00 0.00 t = 42.86 5,370 5,372 <.001
Relations between I and 0.93 .77, 1.08 0.84 0.79 0.63, 0.96 0.79 0.75, 0.83 0.00 0.94 0.06 r = .38 5,623 <.001
E math attitudes
Allowed/forbidden 0.46 .38, .54 0.84 0.87 0.84, 0.89 0.76 0.70, 0.83 0.00 0.81 0.19 w2 = 328.83 1 6,292 <.001
Retrospective gambler fallacy 0.69 .16, 1.21 0.61 0.59 0.49, 0.70 0.61 0.54, 0.68 0.00 0.83 0.17 t = 24.01 5,940 5,942 <.001
Gain vs. loss framing 1.13 .89, 1.37 0.58 0.62 0.52, 0.71 0.60 0.53, 0.67 0.00 0.86 0.14 w2 = 516.4 1 6,271 <.001
Sex differences in implicit 1.01 .54, 1.48 0.59 0.56 0.45, 0.68 0.53 0.46, 0.60 0.00 0.71 0.29 t = 19.28 5,840 5,842 <.001
math attitudes
Low vs. high category scales 0.50 .15, .84 0.50 0.51 0.42, 0.61 0.49 0.40, 0.58 0.00 0.67 0.33 w2 = 342.4 1 5,899 <.001
Quote attribution na 0.30 0.31 0.19, 0.42 0.32 0.25, 0.39 0.00 0.47 0.53 t = 12.79 6,323 6,325 <.001
Norm of reciprocity 0.16 .06, .27 0.27 0.27 0.18, 0.36 0.30 0.23, 0.37 0.00 0.36 0.64 w2 = 135.3 1 6,276 <.001
Sunk costs 0.23 .04, .50 0.32 0.31 0.22, 0.39 0.27 0.20, 0.34 0.00 0.50 0.50 t = 10.83 6,328 6,330 <.001
Imagined contact 0.86 .14, 1.57 0.12 0.10 0.00, 0.19 0.13 0.07, 0.19 0.03 0.11 0.86 t = 5.05 6,334 6,336 <.001
Flag priming 0.50 .01, .99 0.02 0.01 0.07, 0.08 0.03 0.04, 0.10 0.04 0.00 0.96 t = 0.88 4,894 4,896 0.38
Currency priming 0.80 .05, 1.54 0.00 0.01 0.06, 0.09 0.02 0.08, 0.04 0.00 0.03 0.97 t = 0.79 6,331 6,333 0.83
Notes. All effect sizes (ES) presented in Cohen’s d units. Weighted statistics are computed on the whole aggregated dataset (N > 6,000); Unweighted statistics are computed on the disaggregated dataset
(N = 36). 95% CI’s for original effect sizes used cell sample sizes when available and assumed equal distribution across conditions when not available. The original anchoring article did not provide sufficient
information to calculate effect sizes for individual scenarios, therefore an overall effect size is reported. The Anchoring original effect size is a mean point-biserial correlation computed across 15 different
questions in a test-retest design, whereas the present replication adopted a between-subjects design with random assignments. One sample was removed from sex difference and relations between implicit and
explicit math attitudes because of a systemic error in that laboratory’s recording of reaction times. Flag priming includes only US samples. Confidence intervals around the unweighted mean are based on the
central normal distribution. Confidence intervals around the weighted effect size are based on noncentral distributions.

Ó 2019 Hogrefe Publishing


Erratum
Erratum 213

Page 148, Table 2 In lines 1-4 lines from the beginning of the paragraph
substitute the sentence with the following:
The correct version of Table 2 is provided above.
. . . revealed that heterogeneity of effect sizes was
largely observed among the very large effects –
anchoring and relations between implicit and explicit
Page 149, Paragraph “Variation Across
attitudes.
Samples and Settings”
In lines 4–11 from the beginning of the paragraph substitute
Reference
the relevant sentences with the following:
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B. Jr., Bahník, S.,
Comparing the intra-class correlation of samples
Bernstein, M. J., . . . Nosek, B. A. (2014). Investigating variation
across effects (ICC = .006; F(35, 385) = 1.07, in replicability: A “Many Labs” replication project. Social
p = .37, 95% CI [ .027, .066]) with the intra-class Psychology, 45(3), 142–152. https://doi.org/10.1027/1864-
correlation of effects across samples (ICC = .67; 9335/a000178
F(12, 420) = 74.11, p < .001, 95% CI [.50, .85]) sug-
gests that very little in the variability of effect sizes Richard A. Klein
Department of Psychology
can be attributed to the samples, and substantial vari- Université Grenoble Alpes
ability is attributable to the effect under investigation. LIP/PC2S
38040 GRENOBLE Cedex 9
France

Page 150, Table 3 [email protected]

The correct version of Table 3 is provided below.

Table 3. Tests of effect size heterogeneity


Effect Heterogeneity statistics Moderation tests
Q df p I2 US or p ηp 2 Laboratory p ηp2
international or online
Anchoring – babies born 59.71 35 0.01 0.402 0.16 0.69 0.00 16.14 <0.01 0.00
Anchoring – Mt. Everest 152.34 35 <.0001 0.754 94.33 <0.01 0.02 119.56 <0.01 0.02
Anchoring – Chicago 312.75 35 <.0001 0.913 0.62 0.43 0.00 32.95 <0.01 0.01
Anchoring – distance to NYC 88.16 35 <.0001 0.643 9.35 <0.01 0.00 15.74 <0.01 0.00
Relations between I and E math attitudes 54.84 34 <.0001 0.401 0.41* 0.52 <.001* 2.80* 0.09 <.001*
Retrospective gambler fallacy 50.83 35 0.04 0.229 0.40 0.53 0.00 0.34 0.56 0.00
Gain vs. loss framing 37.01 35 0.37 0.0001 0.09 0.76 0.00 1.11 0.29 0.00
Sex differences in implicit math attitudes 47.60 34 0.06 0.201 0.82 0.37 0.00 1.07 0.30 0.00
Low vs. high category scales 36.02 35 0.42 0.192 0.16 0.69 0.00 0.02 0.88 0.00
Allowed/forbidden 28.96 35 0.75 0.00 70.37 <.01 0.01 0.55 0.46 0.00
Quote attribution 67.69 35 <.001 0.521 8.81 <0.01 0.001 0.50 0.48 0.00
Norm of reciprocity 38.89 35 0.30 0.172 5.76 0.02 0.00 0.64 0.43 0.00
Sunk costs 35.55 35 0.44 0.092 0.58 0.45 0.00 0.25 0.62 0.00
Imagined contact 45.87 35 0.10 0.206 0.53 0.47 0.00 4.88 0.03 0.00
Flag priming 30.33 35 0.69 0 0.53 0.47 0.00 1.85 0.17 0.00
Currency priming 28.41 35 0.78 0 1.00 0.32 0.00 0.11 0.74 0.00
Notes. Tasks ordered from largest to smallest observed effect size (see Table 2). Heterogeneity tests conducted with R-package metafor. REML was used for
estimation for all tests. One sample was removed from sex difference and relations between implicit and explicit math attitudes because of a systemic error
in that laboratory’s recording of reaction times.
*Moderator statistics are F value of the interaction of condition and the moderator from an ANOVA with condition, country, and location as independent
variables with the exception of relations between impl. and expl. math attitudes for which is reported the F value associated with the change in R squared
after the product term between the independent variable and the moderator is added in a hierarchical linear regression model. Details of all analyses are
available in the supplement.

Ó 2019 Hogrefe Publishing Social Psychology (2019), 50(3), 211–213

You might also like