SSRN 3554020
SSRN 3554020
Luzi Hail
The Wharton School,
University of Pennsylvania
Mark Lang
University of North Carolina
Christian Leuz
Booth School of Business,
University of Chicago & NBER
March 2020
Abstract
We have little knowledge about the prevalence of irreproducibility in the accounting literature. To
narrow this gap, we conducted a survey among the participants of the 2019 JAR Conference on
their perceptions of the frequency, causes and consequences of irreproducible research published
in accounting journals. A majority of respondents believe that irreproducibility is common in the
literature, constitutes a major problem and receives too little attention. Most have encountered
irreproducibility in the work of others (although not in their own work) but chose not to pursue
their failed reproduction attempts to publication. Respondents believe irreproducibility results
chiefly from career or publication incentives as well as from selective reporting of results. They
also believe that practices like sharing code and data combined with stronger incentives to replicate
the work of others would enhance reproducibility. The views of accounting researchers are
remarkably similar to those expressed in a survey by the scientific journal Nature. We conclude
by discussing the implications of our findings and provide several potential paths forward for the
accounting research community.
“When scientists cannot confirm the results from a published study, to some it is an
indication of a problem, and to others, it is a natural part of the scientific process that can
“Reproducibility is like brushing your teeth. […] It is good for you, but it takes time and
effort. Once you learn it, it becomes a habit.” Baker [2016, p. 454]
1. Introduction
In recent years, many concerns have been raised about the reliability of scientific publications
in both the natural and the social sciences (e.g., Begley [2013], Begley and Ioannidis [2015],
National Academies of Sciences [2016], Christensen and Miguel [2018]). The Reproducibility
Project in Psychology, for instance, finds that for only 39 out of 100 experimental and correlational
studies the original results could be replicated (Open Science Collaboration [2015]). Moreover,
even the replicated effects were half the magnitude of the original effects (see also Gilbert et al.
[2016]). A similar project in experimental economics finds that 61% out of 18 replicated studies
have significant effects in the original direction, well below what is implied by reported p-values,
and again the replicated effect size is substantially smaller (Camerer et al. [2016]).1 These
concerns and the perception of a reproducibility crisis led the renowned scientific journal Nature
to conduct an online survey asking 1,576 researchers about their views on reproducibility in
research (Baker [2016]). Slightly more than half the respondents agree that there is a “significant
1
Another project focusing on social science experiments published in Nature or Science finds a significant effect in
the same direction as the original study for 13 (62%) out of 21 studies, and the effect size of the replications is on
average about 50% of the original effect size (Camerer et al. [2018]).
1
crisis,” and an additional 38% saw a “slight crisis.” Over 70% of the researchers reported to have
the frequency, causes and consequences of irreproducible research results published in accounting
journals. We believe that our findings are important for at least two reasons. First, we are not
aware of evidence answering the question whether irreproducibility is common in the accounting
literature. Thus, a survey of researchers’ beliefs is the next-best choice. Second, perceptions are
likely to matter in themselves. They can potentially affect or change attitudes, incentives, and
behavior in the profession. Interestingly, Camerer et al. [2018] find that peer beliefs of replicability
are strongly related to actual replicability, suggesting that the research community can predict
For our study, we use the Nature survey as a template. We surveyed the invitees of the 2019
accounting research. We compare their answers to those of researchers in other fields, as expressed
in the Nature survey. We also synthesize some of the comments and concerns raised during the
ability of a researcher to confirm the findings of a peer-reviewed and published study in a similar
setting that may include slight but reasonable variations of method, sample and/or time period.
2
We received 136 replies (81% response rate) from the 167 invitees to the conference, 12 of
which were partial responses.2 We highlight several key takeaways from our survey. First, similar
to the results in Nature, there is a widespread perception among accounting researchers that a
significant proportion of prior research (i.e., 50% or more of published results) would not (exactly)
of respondents (69%) have encountered situations in which they were unable to reproduce the
work of others, although only 6% tried and could not replicate their own findings.3 Third,
respondents indicate that the inability to reproduce results significantly detracts from the
usefulness of research findings, but they seldom attempt to publish irreproducible results when
they detect them. Fourth, irreproducible results are most often thought to be the outcome of
pressure to publish and of researchers’ tendencies to selectively report results. Poor statistical
analysis and weak research designs as well as proprietary data are also seen as contributing factors.
Fifth, respondents believe that ways to improve reproducibility include creating (professional)
incentives to reproduce others’ work and enhanced standards to facilitate replication. Examples
are the mandatory publishing of code and data, the reporting of sensitivity analyses and additional
One of the striking findings from our survey is that, although accounting researchers’
perceptions of incidence, causes and implications of irreproducibility are similar to those in the
Nature survey, our respondents are more sanguine about the seriousness of the issue. Specifically,
while 90% of the respondents to the Nature survey felt that science faces some level of “crisis”
2
The number of invitees exceeds the number of attendees (134).
3
In the Nature survey, which covers areas such as physics, biology or medicine, more than half of the respondents
have failed to reproduce their own experiments.
3
created by irreproducibility, only 52% of our respondents felt that irreproducibility is a “major
We conclude our paper by pointing to several potentially troubling implications of our survey
results, for instance, that research progress and policymaking are hampered by unreliable research
findings and that the documented perceptions of irreproducibility can shape behavior, even if they
are inaccurate. In light of these issues, we suggest potential paths forward. These paths include
the creation of public repositories for code and data as well as the strengthening of professional
incentives for replications. Much of the discussion among conference participants centered on
these issues as well, so our implications section in this paper reflects views offered during the
conference debate.
2. Survey
We modeled our survey instrument after a 2016 survey conducted by Nature, which reported
responses to similar questions from 1,576 scientists across a range of disciplines (Baker [2016]).4
This approach permits us to compare results for accounting researchers with those for scientists
more generally.5 Our survey consisted of 15 questions about different aspects of reproducibility
reproducibility) plus two demographic questions (job title and primary area of interest). Many of
the survey questions have multiple components and provide opportunity for free-form responses.
The survey was anonymous, and we estimated that it would take 10-15 minutes to complete. We
4
Because we view the results as of descriptive interest, we did not randomize the order of (most) answers and did
not ask similar questions with different wordings (largely consistent with the approach in the Nature survey). We
acknowledge that this may limit internal validity.
5
While we closely follow the Nature survey, we deleted some questions and edited others, so they are more
applicable to accounting research. For comparison, the complete set of questions and the raw data from the Nature
survey are available here: https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970.
4
administered the survey online through Qualtrics in the weeks leading up to the 2019 JAR
conference.
As the terms reproducibility and replicability are often used differently across fields and in
some cases are even used interchangeably, we provide a definition of the two terms at the
beginning of the survey instrument. We define reproducibility and replicability as follows: “we
consider a study to be reproduced (or “reproducibility”) when its findings are confirmed in similar
settings (these may include slight but reasonable variations of method, sample or time period). By
contrast, a study is replicated (or “replicability”) when it is repeated exactly, using the same
We note that our definitions closely follow the Nature survey and therefore comport with
terms commonly used in experimental laboratory research. However, other definitions and terms
exist. For instance, some distinguish between the ability to replicate in a “narrow sense” (what we
call replicability) and in a “wide sense” (what we call reproducibility) and sometimes the terms
are used with the exact opposite meaning (e.g., Clemens [2017], p. 330–332; Duvendack, Palmer-
Jones, and Reed [2017], p. 46–47; Christensen and Miguel [2018], p. 942–945; National
We sent out the survey to the 167 invitees to the 2019 JAR Conference, of which 136
responded (81% response rate). 12 responses were only partially completed. Of the 127
associate professors, 36 assistant professors and 12 PhD students. The majority (77%) of the
6
For instance, Clemens [2017] proposes a terminology for the field of economics in which he separates replications
into verifications (everything is the same as reported in the original) and reproductions (everything is the same
except for a different sample). He uses the term robustness for a reanalysis of the same population but with different
specifications and for extensions in which the same specifications are applied to different populations and samples.
5
participants self-identified as financial archival researchers. 15 individuals (12%) indicated that
they are doing archival research in the areas of auditing, managerial, or tax. The rest identified as
In the discussion that follows, we present the primary findings for a subset of questions and
attempt to weave them together into a consistent narrative. For ease of exposition, we indicate the
respective question (Q) from the survey instrument when discussing the results. The complete
survey instrument together with the raw data generated by our survey are available on the Online
3. Findings
The first question was to get a sense for the respondents’ overall impressions of reproducibility
in the literature. Figure 1, Panel A, graphs the responses to Q1. Results strongly suggest that
reproducibility is perceived as not receiving sufficient attention. 56% said that reproducibility is
receiving “not enough” attention while 3% indicated that it is receiving “too much.” When asked
about the extent to which they agreed with the statement that “the lack of reproducibility of
accounting research findings is a major problem” (Q5), 52% agreed, and 25% disagreed (Figure
1, Panel B).7
7
For parsimony in our discussion, we refer to the combination of “Agree” and “Strongly agree” as “Agree,” and
“Disagree” and “Strongly disagree” as “Disagree” (the subcategories are reported in the figures).
6
For comparison, the Nature survey asked whether there was a “crisis of reproducibility.” 90%
of respondents indicated there was some level of reproducibility crisis, and 3% indicated that there
was no crisis.8 Although the two surveys are not directly comparable, they suggest that both
samples perceive lack of reproducibility to be an important issue in their respective literatures and
research communities. However, the Nature respondents appear to perceive the threat as more
When asked to benchmark accounting research to other business school disciplines (Q4),
respondents generally compared accounting to finance, economics, and marketing. 41% thought
reproducibility in accounting is on par with other business school disciplines. 13% thought
reproducibility in accounting research is better than it is for other business school disciplines.
Several individuals pointed out that some of the other business disciplines rely more commonly
on small sample experiments or surveys that are costly to reproduce and may not generalize to
other samples, while much of accounting research relies on large, generally accessible databases.9
Others mentioned the plethora of robustness tests in accounting research and that accounting
researchers tend to do a better job of detailing their sample selection. However, an almost equal
business school disciplines. Individuals pointed to finance and economics as having higher
two minds on whether archival research is more or less likely to replicate than behavioral or
8
We did not use the word “crisis” in our survey (although it was used in the Nature survey) since it seems emotive
and lacks a clear definition in our context. It is striking that the Nature respondents indicated a higher degree of
concern relative to our respondents even though the question in the Nature survey was worded more strongly than
in our survey (i.e., “crisis” versus “major problem”).
9
Narrative responses should be viewed with caution since the question was optional and asked, “Please use the box
below to tell us more about your comparison group for answering this question.” 42 participants provided a
narrative response.
7
experimental research, weighing small samples for behavioral and experimental research against
Our next question directly addressed the perceived frequency of irreproducibility in peer-
reviewed and published accounting studies. We asked, “In your opinion, what proportion of
published results in accounting research are reproducible (i.e., similar results would be obtained
with slight but reasonable variations of method, sample, time period, etc.)?” (Q3). Figure 2, Panel
A, indicates that perceptions are widely dispersed. The most common response by 31% of the
to our survey estimated that 59% of results were reproducible, very similar to the rate in the Nature
survey.11 Notably, 38% of accounting researchers believe that half or more of published work is
not reproducible.
Because the perceptions about reproducibility potentially differ from replicability, we also
asked the question for replicability (“i.e., the results could be replicated exactly given the dataset
and empirical approach described in the paper;” Q2). Replicability rates were lower than for
reproducibility, as one would expect, with an average of 46% of results expected to replicate
(Figure 2, Panel B).12 53% of respondents believe that 50% or more of results would not exactly
10
P-hacking refers to selectively reporting significant results that support a preferred conclusion.
11
The wording was slightly different for the Nature survey, in which reproducibility was defined as, “the results of a
given study could be replicated exactly or reproduced in multiple similar experimental systems with variations of
experimental settings such as materials and experimental model.” Respondents to the Nature survey estimated that,
on average, 58% of published results were reproducible under that definition.
12
One would expect replicability (as we defined it) to be lower than reproducibility, as the former can also result
from coding errors or difficulties in simply following what the authors have done in their study.
8
replicate. These relatively low expected success rates for reproductions and replications are
consistent with the earlier finding that irreproducibility and irreplicability are perceived to be a
The dispersion in estimates and their relative frequencies are interesting because they suggest
that there is only limited consensus among respondents about the expected reproducibility rates
and, hence, the credibility of published accounting research. In line with this interpretation, Figure
2, Panel B, for example, shows that the two most common responses are (i) that only 20-39% of
published accounting studies would replicate (24% of respondents) and (ii) that 80-100% would
successfully replicate (20% of respondents). Coupled with the significant number of individuals
who responded with “Don’t know” (13%), the results suggest that there is substantial uncertainty
about the replicability or reproducibility of accounting research, but the issue is perceived as
An issue with the preceding findings is that they are based on perceptions, and we do not know
how informed they are.13 We therefore asked whether respondents had personal experience with
irreproducibility. We asked, “Have you ever tried and failed to reproduce someone else’s results?”
(Q14.2). Figure 3, Panel A, shows that 69% of respondents have encountered situations in which
they could not reproduce results from someone else’s research. This finding is similar to the
response in the Nature survey (72%) and adds credence to the perceived rates of irreproducibility
from Figure 2. That is, researchers’ responses are likely informed by their own experiences in
13
That said, Camerer et al. [2018] find that peer beliefs of replicability are strongly correlated with actual replicability,
suggesting that collective knowledge regarding the ability to reproduce or replicate a study’s findings exists.
9
INSERT FIGURE 3 ABOUT HERE
The biggest difference relative to the Nature survey was the response to the question, “Have
you ever tried and failed to reproduce one of your own results?” (Q14.1). Figure 3, Panel B, shows
that only 6% of respondents from our survey acknowledged a failure to reproduce their own results,
compared to almost 56% of respondents to the Nature survey. The difference in failure rates to
replicate one’s own results across surveys (6% for our survey versus 56% for the Nature survey)
is striking given both the similarity in failure rates to replicate others’ work across surveys (69%
versus 72%) and the overall high reported rate of failure to reproduce the work of others in both
surveys. It would have been interesting to ask whether our respondents perceive themselves to be
particularly careful relative to others in the research community, whether they less often try to
replicate their own work, or whether norms or practices in accounting research limit (self-)
reproductions. Finally, only 13% of respondents reported that they have ever been approached by
others inquiring about problems to reproduce one of their own papers (Q15).
The survey answers suggest that the majority of researchers expects that many results in the
accounting literature are unlikely to reproduce. But a key question is whether this outcome matters
and reduces the usefulness of published findings. We thus asked for the degree of agreement with
the following statement: “I think that a failure to reproduce rarely detracts from the validity of the
original finding” (Q11.2). Figure 4 shows that 55% disagree (i.e., believe that failure to reproduce
detracts) while only 22% agree (i.e., believe that failure to reproduce rarely detracts.) In the Nature
survey, 49% believe that failure to reproduce detracts, while 22% do not.
10
However, when we asked whether they believe that “a failure to reproduce a result most often
means that the original finding is wrong” (Q11.1), the proportions flip. 41% of our respondents
disagree (i.e., do not believe the original finding is wrong), while 21% agree (i.e., think the original
finding is wrong). In the Nature survey, 31% think that failure to reproduce the result most often
means that the original finding is wrong. Taken together, the majority of researchers feels that
irreproducibility detracts from the validity of the original findings, but many are hesitant to
conclude that failure to reproduce typically means the original findings are wrong.
[respondents] believe each of the following is an important contributing factor in cases in which
published results are not reproducible,” with eleven potential choices (Q12). Results are reported
in Figure 5, Panel A. Several points are worth noting. Few respondents thought that outright fraud
(“fabricated or falsified results”) was common (see also Bailey, Hasselback, and Karcher [2001]),
but few also thought that bad luck (random chance) was a major cause. Instead, the leading
explanations were based on researchers’ incentives, with “selective reporting of results” (74%)
and “pressure to publish for career advancement” (55%) thought to always or very often contribute
to the irreproducibility of extant findings. The Nature survey lists the same two responses as most
driven by incentives to report favorable results (e.g., through “Hypothesizing After Results are
Known” or HARKing, p-hacking, or selective sampling; see also Duvendack, Palmer-Jones, and
Reed [2017]) rather than by factors such as weak experimental designs (33%), proprietary data
(38%), or poor statistical analysis (42%). Interestingly, 37% of the respondents believe that
11
protocols or code not being publicly available contributes always or very often to irreproducible
results, suggesting that journal policies could play a role in reducing the frequency of
irreproducibility.
Next, we asked “How likely [do] you think the following factors would be to improve the
reproducibility of research?” (Q13). Results for a range of potential answers are reported in Figure
5, Panel B. The two most common responses were “professional incentives for formally
reproducing the work of others” and “professional incentives for adopting practices that enhance
reproducibility.” These answers reinforce the importance of incentives and, hence, the message
from Panel A.14 Respondents also suggested more emphasis on independent validation within
teams and independent replication as important factors. Additional answers provided in the free-
form text box suggest the publishing of code and data, random audits/replications of published
work (maybe conducted by PhD students as part of their summer projects) as well as registered
reports as potential remedies. In sum, the responses indicate that incentives are seen as the primary
drivers of irreproducible research and, in turn, solutions likely include changes to researchers’
To assess how respondents deal with reproducibility in their own work, we asked: “Have you
or your coauthors established any procedures to ensure reproducibility in your work?” (Q8). 75%
indicated that they have established such procedures. Among them, 49% had put procedures in
place within the last five years and 24% had done so within the last two years (Q9). Thus, there
14
Here, our results differ from those in the Nature survey, in which the two predominant answers were, “better
understanding of statistics,” and “more robust experimental design.”
12
is a trend in the direction of greater efforts toward reproducibility. In the free-form answers,
several individuals indicated that they are careful to clearly document each step of the analysis,
conduct numerous robustness tests to ensure that results are not sensitive to design choices, or have
coauthors (or PhD students) independently replicate the analysis. We note that these steps are
We also asked whether respondents had “identified any barriers to implementing changes that
would improve reproducibility of [their] research” (Q10). 35% indicated that they had identified
such barriers. The most common was proprietary data (or other issues related to data sharing).
Several respondents expressed reservations about intellectual property with respect to sharing code
or data as well as concerns that others attempting to reproduce someone else’s work could “twist”
We then asked: “To what extent do you agree/disagree with three statements about publishers’
and editors’ efforts on reproducibility?” (Q7). 38% of respondents agreed that efforts by journal
publishers have been helpful to their work. 57% agreed that efforts made by journal publishers
have had a positive effect on the reproducibility of accounting research, while 12% disagreed.
Notably, respondents clearly believed that journals should play a more active role, with 70%
agreeing that journals should do more to enforce or encourage reproducibility and only 4%
disagreeing. Among the 27 respondents that provided free-form answers, most pointed to policies
requiring code and data sharing as helpful, because they facilitate replication and force researchers
to more carefully vet their code and analyses. We acknowledge that our questions focus on the
perceived benefits of mitigating irreproducibility and not the costs to journals and editors in
13
implementing and monitoring these systems. Related to this caveat, a few respondents pointed to
the limitations of code and data sharing policies (e.g., that they require enforcement or checking).15
future researchers and policy makers become aware that the original results may not hold or be
robust. We asked: “Have you ever published a failed attempt to reproduce someone else’s work?”
(Q14.4). Figure 6, Panel A, shows that only 7% of respondents have pursued a failed reproduction
through to publication. Results are slightly higher for the Nature survey (13%). The low
publication rate of failed reproductions (or replications) is striking given that most researchers
“Have you tried and failed to publish an unsuccessful reproduction?” (Q14.6). Figure 6, Panel B,
shows that only 6% of our respondents have even tried. This number is just slightly lower than
the 7% rate for being able to publish such a failed reproduction. Again, the results are fairly similar
to those for the Nature survey in which 10% had tried and failed to publish an unsuccessful
reproduction. It is interesting that across both surveys, the rate at which researchers have been
successful in publishing failed reproductions slightly exceeds the percentage of unsuccessful cases.
15
Findings in Chang and Li [2018] suggest that replication is often difficult even when authors are required to post
their data and code.
14
This finding suggests that the low rate of published reproductions likely does not simply reflect a
The findings in Section 3.7 pose the question of why researchers do not attempt more often to
pursue failed reproductions through to publication. We did not ask this question. However,
Dewald, Thursby and Anderson [1986] discuss potential reasons, which are driven by incentives
and fall loosely into two categories. First, there are career concerns. In particular, the profession
rewards researchers who develop reputations for innovation and creativity. Replication, by its
very nature, follows prior work and needs to be prescribed, meticulous and careful. Similarly,
papers that attract citations tend to be those that are novel and extend the literature in new
directions. Second, there are cultural issues. Replications and reproductions have the potential to
be viewed as confrontational, because typically they discredit a specific study by specific authors.
Given that researchers operate in a closely-knit community, it is uncomfortable and can be career-
limiting to call into question the work of fellow researchers who likely are to become referees,
4.1. IMPLICATIONS
which in our view has several implications. First, research progress is hampered if papers are built
failures to reproduce were made public, subsequent researchers would have more information that
they could consider when allocating their time and effort across different lines of inquiry.
15
Second, if a significant proportion of results in the literature lack validity, it hampers the
credibility of studies, and potentially the entire field, which in turn reduces its practical relevance
regulatory bodies such as the Financial Accounting Standards Board or the U.S. Securities and
Exchange Commission often seek input from academic research on very specific questions and
topics for which there is only a small number of relevant studies. The specificity of a topic limits
the number of relevant papers and, hence, the extent to which irreproducible results are likely to
be uncovered. More generally, a relatively small number of studies and the resulting concern about
the reliability of the results are a significant impediment for policy makers when pursuing
Third, even if the documented perceptions do not accurately reflect the incidence of
believe (as our survey suggests) that irreproducibility is pervasive; that the odds of being detected
and reported are low; that incentives to publish for promotion are high (and require novel results);
and that other researchers (including those in the comparison set for promotion standards) are
“pushing the envelope,” then such beliefs create or reinforce the detrimental incentive issues that
our survey highlights as being central to irreproducibility. Such beliefs can have significant
Given the above implications, it is only natural to think about balanced approaches for
potential policy changes or paths forward, some of which were suggested by participants during
16
the conference. We highlight various tradeoffs involved (see also Bailey, Hasselback, and Karcher
4.2.1. Business as Usual. Recall that 52% of survey respondents agreed that lack of
reproducibility is a major problem in accounting, and 56% felt that it should receive more attention
(Figure 1). Said another way, a substantial minority of respondents expressed a more sanguine
view on the state of irreproducibility in accounting research. A potential rationale for this view is
that the status quo reflects market forces, weighing the costs and benefits of replication for authors,
There are several reasons why low levels of explicit replication could be optimal. First, the
incremental knowledge gained from a replication, all else equal, will likely be lower than for
original research. To the extent that the costs in terms of author, reviewer and editor time and
effort, as well as journal space, are not commensurately lower, the net benefit to replication is
Second, important results are often replicated indirectly as a byproduct of later work building
on the original findings. The classic example is the seminal Ball and Brown [1968] study. As
noted in a recent follow-up commentary by Ball and Brown [2019], there have been innumerable
replications of the primary results in Ball and Brown [1968], including by PhD students as an
assignment in many doctoral courses. Similar arguments could be made for many other major
results in the accounting literature such as the post earnings announcement drift, the accrual
On the other hand, it is also possible that market imperfections limit the optimal allocation of
effort to replication. Markets involving faculty, journals and universities could contain more
frictions than other markets. Moreover, credibility of research involves externalities. For example,
17
the benefits to replication accrue broadly in the form of trust in the research literature, but the costs
the veracity of research on narrower settings or policies (e.g., studies on specific accounting
but may not be of sufficient interest to the broader research community to generate many follow-
on studies. As a result, self-interest on the part of researchers, universities and journals could result
Overall, one important challenge in deciding whether to proceed with business as usual is that
we do not yet have a good sense for the base rate of reproducibility in accounting research.
4.2.2. Public Repositories of Code and Data. Many journals in accounting, finance, and
economics have policies on posting code and data for published papers to facilitate replication.
We provide a stylized overview of these policies in tabular form in the Appendix.17 The table
documents substantial variation in journal policies across accounting and finance as well as within
the two fields. In economics, there is more coordination. For instance, the Journal of Political
Economy and the Quarterly Journal of Economics adopted (in 2006 and 2016, respectively) the
data policy of the American Economic Review (AER), “in an effort to promote consistent standards
and requirements among general-interest journals in the field of economics.” Since 2019, the AER
Sharing code and data in public repositories should, in principle, lower the costs of replication.
It could also create incentives for authors of original research to adopt procedures that ensure their
results are replicable. Thus, a requirement to post code and data by major accounting journals
16
In theory, market imperfections could also result in too much attention being paid to replication. However, as Figure
1, Panel A, suggests reproducibility is receiving rather too little than too much attention in the research community.
17
See Christensen and Miguel [2018], Table 3, for other journals in economics.
18
could mitigate the incentive problems for reproducibility that we identified in our survey,
especially when the posted code not only generates a study’s final results and tables but also covers
the creation of the final dataset from commonly available databases such as Compustat, CRSP or
Datastream. Again, accounting, finance and economics journals widely differ in how extensive
However, code and data sharing, while potentially useful, is unlikely to be a panacea (see also
Harvey [2019]). Many datasets are proprietary or require licensing agreements and cannot be
publicly shared. Einav and Levin [2014] show that about half of the papers in AER received an
exemption under the data policy in 2013 and 2014 because they used private or non-public
administrative data. An alternate approach could be to use an interface that allows researchers to
upload their data to a protected website (although doing so may not be possible for some
proprietary datasets). Other researchers could then access the website to perform sensitivity
analyses with the data (e.g., transform the data, include additional variables in the model, test
alternate model specifications, etc.), but they would not be able to download the data or access
them directly. During the conference, Joachim Gassen (Humboldt University) presented such a
tool that permits researchers to upload and analyze datasets.18 A large scale adoption of these tools
likely requires substantial upfront investment and ongoing maintenance, and it would be essential
Code sharing also has limitations. Chang and Li [2018] attempt to replicate 67 economics
papers that include replication kits (i.e., data and code) and find that only 33% could be replicated
without help from the authors and, even with the help from authors, still less than 50% could be
replicated. Perhaps this is not surprising; it takes time and effort to provide code that contains
18
See also Gassen [2018] and: https://joachim-gassen.github.io/rdfanalysis/.
19
many comments and is easy to understand. Some could even be tempted to make replication
difficult to avoid more scrutiny of the reported results. Less nefariously, authors with ongoing
research agendas could be hesitant to provide clean code and data to avoid helping potential
competitors. Finally, if code and data sharing make it straightforward to tweak specifications,
there could be incentives for other researchers to engage in “reverse p-hacking,” that is, testing
many specifications and selectively reporting those that are not robust while ignoring others that
are (Harvey [2017]). This behavior could be quite costly to authors of original research. In
addition, it is time consuming for authors of published papers to respond to and refute erroneous
claims (or respond to requests for assistance) by researchers attempting to replicate their work. As
Harvey [2019] notes, these costs fall disproportionately on the most productive authors.
4.2.3. Increased Incentives for Replication. One solution might be for leading journals to
dedicate journal space and publish more reproductions or replications. On the surface, this solution
appears to be consistent with our survey results indicating that providing professional incentives
for reproduction is likely to be the most effective mechanism for ensuring reproducibility (Figure
5, Panel B). But we believe that generating incentives is more complicated. On one hand, the
prospect of a publication could provide incentives for better (or more insightful) reproductions.
On the other hand, the fact that replications are published in leading journals alone does not imply
that they will necessarily provide the same career rewards as publications of original studies. At
this point, it is unclear how replications will or should be judged for publication and career
progression. Moreover, it seems infeasible to publish replications on all published work in terms
of referee and editor capacity as well as journal space.19 Finally, authors of replications could have
19
In this discussion, we assume that replications are subject to the usual journal refereeing and editing process.
20
incentives to bias replications in favor of overturning existing results to increase the odds of
publication.
To mitigate these concerns, it could be worthwhile to debate several models. One approach
is for teams of researchers to replicate multiple studies using pre-registered criteria to increase
efficiency and reduce potential bias. For example, the Reproducibility Project in psychology was
a collaboration of 270 contributing authors replicating 100 studies published in 2008 under careful
pre-specified criteria (Open Science Collaboration [2015]). They found that only 36% of studies
replicated under their definition, which helped to establish a base rate of irreproducibility in
psychology. An advantage of this approach is that it spreads the effort across a large number of
researchers and produces a cohesive set of results to address a specific question (base rate of
irreproducibility in a particular literature). The fact that paper and author pairings were randomly
assigned not only created fewer incentives for researchers to overturn previous results, but likely
A narrower approach is to focus on a series of papers on a specific event or topic. For example,
Black et al. [2019] have an ongoing preregistered plan to replicate three leading studies on
Regulation SHO (that concerns short-sale practices). An advantage of this narrower approach is
that the replications all rely on the same event, which increases efficiencies in performing the
analysis and publishing results, as well as providing overall takeaways for researchers and policy
makers going forward. A potential concern is that the papers to be replicated were chosen because
the replication team viewed the respective results as “implausible” and, hence, this approach is
less informative about the base rate. Other incentive concerns about replications are mitigated by
the fact that Black et al. [2019] have pre-registered their planned research protocol.
21
Finally, some conference participants expressed the opinion that publication of replications
could be outsourced to more specialized outlets, be it independent journals with a specific focus,
the Critical Review of Finance includes, as part of its primary mission, calls for replications of
major papers in specific areas and has recently issued such calls for replications on liquidity,
volatility and higher moments, or regional finance. Similarly, the Journal of Financial Reporting
the Journal of Finance (JF) includes a section for “Replications and Corrigenda” that provides
space for “short papers that document material sensitivities of central results in papers published
in the JF.” It remains to be seen whether allocating special journal space provides sufficient
incentives for replications. It is also not clear whether a replication published in a different journal
from the original publication has the same impact as one published in the original outlet or whether
journals should feel obliged to publish failed replications of papers they have originally published.
5. Conclusion
We believe that the discussion at the JAR conference and the results in our paper suggest the
issue of reproducibility in accounting research should be more widely debated. The overlap in our
results with those of the Nature survey is striking but seems incongruous with the different
reactions to the results and their implications. In the sciences, the majority of researchers think
there is an irreproducibility crisis. In accounting, this does not seem to be the case, despite the
20
Examples for (mediated) post-publication review platforms are PubMedCommons or Open Review, which have
moderators and/or require the full name of contributors. According to Christensen and Miguel [2018], Table 3, the
American Economic Journals allow for post-publication peer review on their websites.
22
concerns by many. One reason is perhaps the various tradeoffs that accounting researchers
highlighted during the JAR conference discussion. For instance, several conference participants
expressed the concern that greater efforts towards reproducibility could reduce or crowd out more
innovative research, although some argued that our field is large enough that we should be able to
engage in attempts to reproduce prior studies as well as pursue new and innovative research.
In the end, we acknowledge that there is no obvious path forward, except that the survey
responses and conference discussions about the causes of irreproducibility suggest that changing
incentives is likely to be a primary component of any solution. Our hope is that, by highlighting
concerns about irreproducibility within the accounting research community and suggesting
potential paths forward, we can spur further debate on this important topic.
23
REFERENCES
24
NATIONAL ACADEMIES OF SCIENCES, ENGINEERING, AND MEDICINE. Statistical Challenges in
Assessing and Fostering the Reproducibility of Scientific Results: Summary of a Workshop.
Washington, DC: The National Academies Press, 2016.
NATIONAL ACADEMIES OF SCIENCES, ENGINEERING, AND MEDICINE. Reproducibility and
Replicability in Science. Washington, DC: The National Academies Press, 2019.
OPEN SCIENCE COLLABORATION. “Estimating the Reproducibility of Psychological Science.”
Science 349 (2015): aac4716-1–8.
25
7% 3%
Panel A: To what extent do you feel that reproducibility is receiving sufficient attention in the research
community? (Q1) Answers Count
7% 3% Too much 4
34% A reasonable amount 46
Too much Not enough 77
A reasonable amount I am unsure 9
Answers
Grand Total Count
136
Not enough
Too much 4
I am unsure
34% A reasonable amount 46
Too much Not enough 77
A reasonable amount I am unsure 9
56% Grand Total 136
Not enough
I am unsure
56%
Panel B
Panel B: Is the lack of reproducibility in accounting research findings a major problem? (Q5)
5% 2%
t? - "The lack of reproducibility of accounting research findings is a major problem" 14%
No opinion 38%
FIG. 1.—The figure plots respondents’ answers to questions whether they believe that there is a reproducibility crisis
in accounting21%
research. Each graph comprises the responses from the invitees to the 2019 JAR Conference.
26
35%
31%
Answers Count
30%
0 - 19% 6
Panel A: In your opinion, what proportion of published results are reproducible? (Q3) 20 - 39% 22
25% 40 - 59% 23
35%
Frequency (%)Frequency (%) 23%
60 - 79% 31
31%
20% 80 - 100%
Answers 42
Count
30% 17%
16% 0Don't
- 19%know 11
6
15% Grand
20 - Total
39% 135
22
25% 40 - 59% 23
23%
10%
Mean
60 - 79% 59%
31
8%
20% Median
80 - 100% 65%
42
17% Std. Dev.
Don't know 24%
11
4% 16%
5%
15% Grand Total 135
0% Mean 59%
10%
0 - 19% 20 - 39% 40 - 59% 60 - 79% 80 - 100% Don't8%know Median 65%
4%
Percentage of results believed to be reproducible Std. Dev. 24%
5%
0%
0 - 19% 20 - 39% 40 - 59% 60 - 79% 80 - 100% Don't know
Percentage
portion of published results in accounting research are exactly replicable (i.e. the of results
results believed
could be to begiven
replicated exactly reproducible
the dataset and empirical approach described in the paper)?
Panel
Panel B B: In your opinion, what proportion of published results are exactly replicable? (Q2)
35%
Answers
portion of published results in accounting research are exactly replicable (i.e. the results could be replicated exactly given the dataset and empirical approach described in the paper)? Count
30%
0 - 19% 20
Panel B
24%
20 - 39% 33
25% 40 - 59% 18
35%
Frequency (%)Frequency (%)
20%
60 - 79% 20
20% 80 - 100%
Answers 27
Count
30%
0Don't
- 19%know 17
20
15% 15% Grand
20 Total
- 39% 135
33
15% 24% 13%
25% 13% 40 - 59% 18
10%
Mean
60 - 79% 46%
20
20%
20% Median
80 - 100% 50%
27
Std. Dev.
Don't know 28%
17
5% 15% 15%
15% 13% Grand Total 135
13%
0% Mean 46%
10%
0 - 19% 20 - 39% 40 - 59% 60 - 79% 80 - 100% Don't know Median 50%
Percentage of results believed to be replicable Std. Dev. 28%
5%
FIG. 2.—The figure plots respondents’ answers to questions about the proportion of reproducible and replicable results
in accounting
0% research. Each graph comprises the responses from the invitees to the 2019 JAR Conference.
0 - 19% 20 - 39% 40 - 59% 60 - 79% 80 - 100% Don't know
Percentage of results believed to be replicable
27
31% Answers Count
Panel A: Have you tried and failed to reproduce someone else’s results?Yes
(Q14.2) 87
No 39
Grand Total 126
Yes
No 31% Answers Count
Yes 87
No 39
Grand Total 126
Yes 69%
No
69%
Panel A
Panel B: Have you tried and failed to reproduce one of your own results? (Q14.1)
6%
- Tried and failed to reproduce one of your own results
Panel A
Answers Count
Yes 7
6%
No 117
Grand Total 124
Yes
No Answers Count
Yes 7
No 117
Grand Total 124
Yes
No
94%
FIG. 3.—The figure plots respondents’ answers to questions whether they have tried and failed to reproduce someone
else’s (their own) work published in accounting research. Each graph comprises the responses from the invitees to the
2019 JAR Conference.
94%
28
4% 3%
8%
18%
Strongly agree Answers Count
Strongly agree 4
Agree Agree 24
Neither agree nor disagree 26
Neither agree nor Disagree 61
disagree Strongly disagree 10
Disagree No opinion 5
Grand Total 130
Strongly disagree
20%
No opinion
47%
FIG. 4.—The figure plots respondents’ assessment of the following statement: “I think that a failure to reproduce [the
results] rarely detracts from the validity of the original finding” (Q11.2). The graph comprises the responses from the
invitees to the 2019 JAR Conference.
29
Panel A: Percentage of respondents who believe that the indicated factor always or very often contributes to
irreproducible results. (Q12)
90%
80%
74%
70%
60% 55%
Frequency (%)
50%
42%
40% 38% 37%
36%
33%
30%
20% 17%
14%
10% 8%
4%
0%
Fraud Pressure to Insufficient Insufficient Selective Poor Proprietary Protocols Methods Poor Bad luck
publish for oversight by peer review reporting statistical data or code not require experimental
career coauthors of research of results analysis publicly technical design
posted expertise
Panel B: How likely are each of these factors to improve the reproducibility of research? (Q13)
90%
84% 84% 83%
80%
80% 76% 74%
69% 68%
70%
60%
Frequency (%)
50%
40%
31% 33%
30% 26%
24%
20%
20% 16% 16% 17%
10%
0%
Professional Professional Better teaching/ Better More robust More emphasis More emphasis Journal editors
incentives incentives for mentoring of understanding empirical or on independent on independent enforcing
for formally adopting PhD students of statistics experimental validation replication reproducibility
reproducing reproducibility design within teams
others' work practices
FIG. 5.—The figure plots respondents’ answering choices to questions about what factors they believe to (i) contribute
to published accounting research not being reproducible and (ii) render accounting research more reproducible. Each
graph comprises the responses from the invitees to the 2019 JAR Conference.
30
7%
93%
Panel A
Answers Count
6% Yes 8
No 117
Grand Total 125
Yes
Answers Count
No Yes 8
No 117
Grand Total 125
Yes
No
94%
FIG. 6.—The figure plots respondents’ answers to questions about their attempts to publish reproductions in
accounting research. Each graph comprises the responses from the invitees to the 2019 JAR Conference.
94%
31
APPENDIX
Data and Code Sharing Policies of Leading Journals in Accounting, Finance and Economics
Panel A: Accounting journals
Data Code Sharing
Data Code Data Final Imple-
Journal Sharing Policy Sharing Policy Data Data Conversion Analysis mentation
Name* (Excerpt) (Excerpt) Sharing Repository Code Code Date
Contemporary No explicit data or code sharing policy. But authors must take responsibility Encouraged None Encouraged Encouraged Between
Accounting for the content of the paper, all methods used, all data sources, and all available. 10/28/2016
Research means of data collection and data manipulation and describe them in a way Data and and 2/15/2017
that diligent readers could replicate the results reported in the paper. programs (based on
should be Wayback
maintained Machine)
for at least 5
years.
Journal of Encourages and enables authors to To facilitate reproducibility and data Encouraged Through Encouraged Encouraged Fall 2016
Accounting share data that support their reuse, the journal also encourages publisher
and research publication where authors to share software, code, (e.g.,
Economics appropriate and enables them to models, algorithms, protocols, Mendeley
interlink the data with the published methods and other useful materials Data).
article. related to the project.
Journal of 1. Data Description Sheet required Upon acceptance of paper and prior Required, but Upload on Required Not required, Submissions
Accounting at initial submission. to publication: code to convert raw only for journal (starting from but after 1/1/2015
Research 2. Complete description of steps to data into final dataset plus a brief identifiers to website. raw data) encouraged
collect and process the data used in description that enables other determine Authors
final analyses. researchers to use the program. In sample should
3. Whenever feasible, identifiers for case of proprietary data, researchers (other data maintain code
final sample. can request an exemption at initial encouraged) and data for at
4. Sharing of all data encouraged. submission and provide a detailed least 6 years.
step-by-step description of the code
or the relevant parts of the code.
Review of Journal encourages authors, where No explicit code policy. Encouraged None No No Not found
Accounting possible and applicable, to deposit available, but information information
Studies data that support the findings of reference list available available
their research in a public repository. from Springer
Springer Nature’s list of Nature
repositories and research data
policy serves as backdrop for
reference.
(Continued)
APPENDIX—Continued
The Submission includes positive A non-exhaustive list of examples Required for None Required for Required for 3/22/2015
Accounting assurance from the author(s) of the used by authors to confirm the inquiries by available. inquiries by inquiries by
Review integrity of the data underlying the authenticity of their data includes editor Data and editor editor
research. Authors are responsible providing access to data files and the (otherwise research notes (otherwise (otherwise
for responding promptly and fully computer code used to perform the encouraged) should be encouraged) encouraged)
to an editor’s request related to the analysis. maintained
integrity of the data used in a for 6 years
submitted or published paper.
Sharing data requested by other
researchers is encouraged but left to
the discretion of individual author
teams.