0% found this document useful (0 votes)
18 views23 pages

Do Differences in Topic Knowledge Matter An Experimental Investigation Into Topic Knowledge As A Possible Moderator of The Testing Effect

The study investigates whether differences in topic knowledge moderate the testing effect, which enhances long-term retention through retrieval practice compared to restudying. Four experiments were conducted with college students, revealing that while background knowledge improved overall retention, it did not influence the effectiveness of testing. The findings suggest that the benefits of testing are independent of the learner's existing domain knowledge, particularly when feedback is provided during retrieval practice.

Uploaded by

palomavictoria14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views23 pages

Do Differences in Topic Knowledge Matter An Experimental Investigation Into Topic Knowledge As A Possible Moderator of The Testing Effect

The study investigates whether differences in topic knowledge moderate the testing effect, which enhances long-term retention through retrieval practice compared to restudying. Four experiments were conducted with college students, revealing that while background knowledge improved overall retention, it did not influence the effectiveness of testing. The findings suggest that the benefits of testing are independent of the learner's existing domain knowledge, particularly when feedback is provided during retrieval practice.

Uploaded by

palomavictoria14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Memory

ISSN: 0965-8211 (Print) 1464-0686 (Online) Journal homepage: www.tandfonline.com/journals/pmem20

Do differences in topic knowledge matter? An


experimental investigation into topic knowledge
as a possible moderator of the testing effect

Jessica A. Macaluso & Scott H. Fraundorf

To cite this article: Jessica A. Macaluso & Scott H. Fraundorf (13 May 2025): Do differences
in topic knowledge matter? An experimental investigation into topic knowledge as a possible
moderator of the testing effect, Memory, DOI: 10.1080/09658211.2025.2500538

To link to this article: https://doi.org/10.1080/09658211.2025.2500538

© 2025 The Author(s). Published by Informa View supplementary material


UK Limited, trading as Taylor & Francis
Group

Published online: 13 May 2025. Submit your article to this journal

Article views: 110 View related articles

View Crossmark data This article has been awarded the Centre
for Open Science 'Open Data' badge.

This article has been awarded the Centre


for Open Science 'Open Materials' badge.

This article has been awarded the Centre


for Open Science 'Preregistered' badge.

Full Terms & Conditions of access and use can be found at


https://www.tandfonline.com/action/journalInformation?journalCode=pmem20
MEMORY
https://doi.org/10.1080/09658211.2025.2500538

Do differences in topic knowledge matter? An experimental investigation into


topic knowledge as a possible moderator of the testing effect
a,b a,b
Jessica A. Macaluso and Scott H. Fraundorf
a
Department of Psychology, University of Pittsburgh, Pittsburgh, PA, USA; bLearning Research and Development Center, Pittsburgh, PA,
USA

ABSTRACT ARTICLE HISTORY


A large body of research indicates that testing results in better long-term retention Received 17 October 2024
compared to restudying. Given the relevance of such effects for education, there is Accepted 25 April 2025
interest in the conditions and learner differences that may moderate the utility of testing,
KEYWORDS
like background knowledge. It is possible that the testing effect is stronger for those who Learning; memory; testing
are more novice, stronger for those who are more experienced, or works equally well for effect; background
everyone. In four experiments, college students read texts and were tested on them one knowledge; feedback
week later. In Experiments 1, 2A, and 2B, we orthogonally manipulated study strategy
(testing versus restudying via reading sentence facts) and availability of background
material for a given topic. In Experiment 2B only, participants received feedback when
studying via retrieval practice. Experiment 3 employed a mixed design in which each
participant used only one strategy or another. Contrary to many past studies, we found
an overall testing effect only when feedback was provided. Critically, background topic
material benefited overall retention, but we found no evidence that background
knowledge moderated the degree of testing benefits. Together, these results suggest that
any learning benefits of testing do not depend on having particular levels of existing
domain knowledge.

Introduction The testing effect


In recent decades, a large body of research has been con­ Typically, experiments regarding the testing effect
ducted concerning the testing effect – the fact that inter­ compare two study strategies: restudying and retrieval
mittent testing of information (e.g., quizzing oneself on practice (e.g., Carpenter et al., 2008; Karpicke & Roedi­
previously studied material) enhances learning and long- ger, 2008). Restudying is when one studies the provided
term retention of that material (Bjork & Bjork, 1992; Roedi­ material and then studies that same material again,
ger & Karpicke, 2006a; Rowland, 2014). The use of testing such as by rereading the same passage. Retrieval prac­
to enhance learning has led it to be identified as an edu­ tice is when one studies the necessary material and
cationally relevant application of cognitive psychology then is required to remember the information in some
(e.g., Roediger & Karpicke, 2006b; Rohrer & Pashler, way, such as by taking a multiple-choice or short-
2010). However, applying principles like the testing effect answer quiz.
in education requires greater understanding of when it As of 2025, there are at least three published meta-
works best – for what groups of learners under what analyses on the testing effect (Adesope et al., 2017;
circumstances? Rowland, 2014; Yang et al., 2021). The results of these
In the present studies, we examined whether the meta-analyses establish that the testing effect is a
testing effect is moderated by learners’ prior topic knowl­ robust phenomenon with medium-to-large effect sizes
edge, which we experimentally manipulated in four for the difference between retrieval practice and
studies. We first review some basic research on the restudy, according to the criteria established by Gignac
testing effect before introducing the need to identify key and Szodorai (2016b), in both lab studies (g = 0.51;
moderators of this effect, and why background topic Adesope et al., 2017; g = 0.50 Rowland, 2014) and auth­
knowledge may be one of them. entic educational settings (g = 0.33; Yang et al., 2021).

CONTACT Jessica A. Macaluso [email protected] Department of Psychology, University of Pittsburgh, Pittsburgh, PA 15260, USA; Learning
Research and Development Center, Pittsburgh, PA 15213, USA
Supplemental data for this article can be accessed online at https://doi.org/10.1080/09658211.2025.2500538
© 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrest­
ricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been published allow the posting of the
Accepted Manuscript in a repository by the author(s) or with their consent.
2 J. A. MACALUSO AND S. H. FRAUNDORF

Cognitive mechanisms underlying the testing effect The Indirect Testing Effect. Practice testing can also
Why is testing so effective for learning? Practice testing support learning through an indirect route, especially
may benefit learning through both direct and indirect when feedback is given (as we discuss below). Through
pathways; though our present studies are not intended this indirect testing effect, practice tests allow learners to
to discriminate between these mechanisms, we briefly assess their current performance and prevent the illusion
summarise each to provide theoretical context. of knowing (Glenberg et al., 1982) – when an individual
The Direct Testing Effect. The direct testing effect occurs feels confident in their understanding even though their
when the experience of taking a test itself improves comprehension of the material has failed. Practice
future retrieval of information from long-term memory. testing wards off this illusion because if one takes an
Several mechanistic accounts of such direct testing exam and struggles, the learner can realise they actually
effects have been proposed. The elaborative retrieval know less than they thought.
hypothesis proposes that testing enhances learning The Forward Testing Effect. The above phenomena
because retrieving the answer for a question in a practice characterise how retrieval can benefit learning and reten­
test activates other related information and thus strength­ tion of material that was previously encountered and
ens the integration of the information being tested now tested (a backwards testing effect). However, a
(Carpenter, 2009). This proposal aligns with the broader forward testing effect can also occur whereby the experi­
levels-of-processing framework, which suggests that the ence of testing on prior material enhances the subsequent
strength and duration of a memory trace depends on learning of other, new information, potentially because
how deeply that information is processed (Craik & Lock­ testing helps to cognitively separate the material or
hart, 1972). For example, rather than just memorising enhance learners’ motivation (Gupta et al., 2024;
terms regarding animal biology, one could try to relate Wissman et al., 2011; Yang et al., 2019).
these terms to something personal like their pet. This elab­ While the forward testing effect speaks to the learning
oration of information during testing makes learners rely benefits of testing, it also presents a potential methodo­
on their previous knowledge and increase their overall logical confound: If a participant gets a retrieval practice
memory retention. Additionally, the episodic context condition, then the restudy condition, the restudy con­
account describes retrieval practice as a type of context dition may also be benefitting from the prior use of retrie­
reinstatement, with continuous updating and narrowing val practice, consequently eliminating the apparent
of the search set as additional information is assimilated benefits of retrieval practice. We return to this issue in
(Karpicke et al., 2014). Similarly, the mediator effectiveness Experiment 3.
hypothesis (Pyc & Rawson, 2010) suggests that practice
testing creates improved mediators that link information
to cues. Mediators are mental processes that occur Feedback and the testing effect
between the onset of a stimulus and of a corresponding In the traditional testing effect experimental paradigm,
response. These cues, when recalled, allow the target infor­ participants do not get feedback whether they got the
mation to be activated and recalled. answer correct or not when practicing retrieval. This is typi­
Although these accounts propose somewhat different cally done to equate the number of exposures to the to-
mechanisms of the testing effect, one theme common to be-learned material: all participants practice the material
all of them is that benefits of testing are not based on once with a single study strategy, either by retrieval prac­
rote memorisation; rather, practice testing activates ela­ tice or restudying. By comparison, when feedback is pro­
borative processes during learning and consequently vided during retrieval practice, the material is
creates memory structures that aid in later retrieval of encountered twice: once via retrieval and once when
necessary information. Thus, testing requires more active getting feedback.
engagement because effortful retrieval of the material Nevertheless, while testing has been found to be ben­
(e.g., remembering the right answer when answering a eficial without feedback, testing with feedback can lead
question) is necessary. Likewise, these theories support to even better learning outcomes (Rowland, 2014; Yang
the idea that retrieval itself modifies one’s knowledge; it et al., 2021; c.f., Adesope et al., 2017). Prior research has
is not simply a check on what information is already shown that feedback, both immediate and delayed, can
known. lead to better memory retention of material when learning
More broadly, the testing effect is consistent with the via a multiple-choice test (Butler et al., 2007).
principle of transfer appropriate processing (Morris et al., Feedback is advantageous because it helps alleviate
1977) – the principle that the conditions that are most intrusions (i.e., the false “recollection” of incorrect infor­
effective for learning are those that most resemble the mation; Butler & Roediger, 2008). Feedback is also
conditions under which knowledge will eventually need helpful if the learner has forgotten or failed to retrieve
to be deployed. Practice testing, compared to restudying, information during a recall attempt (Kornell et al., 2009).
creates conditions during learning that are more compar­ Additionally, feedback increases the likelihood a correct
able to a later test. response will be retained and recalled when a learner is
MEMORY 3

correct but has low confidence (Agarwal et al., 2012; Butler features of a problem that are clearly presented. For
et al., 2008; Fazio et al., 2010; c.f., Pashler et al., 2005). instance, Chi and colleagues (1981) found that experts in
Given the relevance of feedback as a moderator of the physics (i.e., physics Ph.D. students) focused mainly on
testing effect, here we will examine our effects of interest rules and equations governing a problem while novices
both with and without feedback. In Experiments 1, 2A, and (i.e., undergraduate physics students) relied on surface-
3, we focus on the moderating effect of background topic level aspects of a problem (e.g., only the information
knowledge on testing exclusively in the case without feed­ shown in a given problem rather than tying it into a
back. In Experiment 2B, in addition to exploring the mod­ bigger concept).
erating effect of background topic knowledge on testing, Prior knowledge also has been shown to predict learn­
we also investigated the role of immediate feedback. ing of new but related information. Witherby and Carpen­
ter (2022) first had participants complete a previous
Interim summary and role of moderators knowledge test for the domains of cooking and football.
Overall, prior work has established the testing effect as a Next, the researchers had their participants learn new
potent learning tool. Indeed, the use of retrieval practice information regarding the two topics. Lastly, Witherby
has been suggested as a great example of how cognitive and Carpenter presented a final test to their participants
science can inform education (Roediger et al., 2011; to assess learning. Witherby and Carpenter found evidence
Rohrer & Pashler, 2010). For example, Roediger and col­ for a rich-get-richer effect where prior domain knowledge
leagues (2011) proposed ten benefits of testing with helped participants learn new information about the
respect to education, like filling in gaps in knowledge same topic (e.g., having prior knowledge of football
and aiding in later retention of material. helped participants learn new information about football).
While the benefits of retrieval practice are well-estab­
lished and educationally relevant, effectively deploying The bifurcation model
retrieval practice in authentic educational settings requires The relevance of prior knowledge to the testing effect is
answering additional practical questions, like which con­ implied by the bifurcation model of Kornell et al. (2011).
ditions best support retrieval practice and who would This model proposes that testing produces one of two
benefit the most from it. It is thus necessary to explore differential outcomes for each item. If a learner recalls an
possible moderators of the testing effect that may item correctly during retrieval, this item gets a strong
impact its effectiveness. However, Smith-Peirce and boost to memory strength, and testing clearly outperforms
Butler (2025) note that while ample in the past few restudy. Conversely, if a learner does not recall an item
decades has explored how the testing effect generalises during retrieval, this item will not get a memory boost
across characteristics of the to-be-learned material and and is more prone to being forgotten. Therefore, there is
of the test itself, far fewer studies have explored learner a “bifurcation” where some items do get a benefit from
characteristics that might moderate the testing effect. retrieval, but others do not. By contrast, restudying pro­
Here, we consider one potential moderator with high vides a weaker boost, but it applies to all items, regardless
practical relevance: background knowledge. Educational of how strong initial knowledge is.
activities often must reach a broad range of students Thus, how much retrieval practice could be expected to
with varying levels of knowledge or expertise, so it is criti­ benefit one’s memory, relative to restudying, may depend
cal to establish whether testing is beneficial for learners upon how well a learner can perform on the initial test,
across a range of prior knowledge. We first describe the rel­ which in turn is highly likely to depend on one’s level of
evance of background knowledge of learning before existing knowledge about a topic. For example, a true
reviewing previous studies that have examined it as a novice who does not know anything about a topic might
potential moderator of the testing effect. not be able to do meaningful retrieval practice and
thereby not derive any learning benefits from it. This is
further substantiated in the meta-analytic evidence
Background topic knowledge and learning
(Rowland, 2014) that the testing effect is larger the
The importance of background topic knowledge greater a learner’s initial performance on retrieval practice.
Background topic knowledge is a plausible moderator of
the testing effect because there is broad evidence that,
Background topic knowledge and the testing effect
across a variety of tasks and domains, prior topic knowl­
edge exerts a powerful influence on learning (Chase & While the robustness of the testing effect is well-estab­
Simon, 1973; Chi et al., 1981; Huet & Mariné, 2005; Kala­ lished, and the literature review above suggests back­
koski & Saariluoma, 2001; for a review, see Macaluso & ground topic knowledge could plausibly moderate the
Fraundorf, under review; Vicente & Wang, 1998). Experi­ testing effect, there is not a large body of work that empiri­
enced individuals often produce representations that go cally explores this phenomenon. Overoye et al. (2021)
beyond what is explicitly shown to the learner, resulting found that an experimental provision of background
in more elaborative understanding. In contrast, less experi­ material increased the benefit of a related phenomenon –
enced individuals concentrate on the surface-level the benefits of pretesting before encountering a topic –
4 J. A. MACALUSO AND S. H. FRAUNDORF

but did not examine retrieval practice. Additionally, a Smith-Peirce and Butler (2025) summarise this state of
number of experiments have investigated how the differ­ affairs by noting that there is a lack of consensus as to
ences between retrieval and restudy accumulate over mul­ whether there are individual differences in the testing
tiple encounters with identical material (e.g., Karpicke & effect but that more research is necessary to complement
Roediger, 2008; Soderstrom et al., 2015), often with the the limited number of such studies to date. Further, a limit­
conclusion that there are some but diminishing returns ation common to all of these studies is that they are corre­
from multiple test cycles. However, it is less clear how lational, so one cannot infer causation from these findings.
related but distinct background knowledge would inter­ It is possible that apparent effects of prior knowledge
face with the testing effect. might in fact be driven by some other confounding vari­
Some studies have explored this relationship by examin­ able that is correlated with prior knowledge (e.g., motiv­
ing the correlation between learners’ prior topic knowledge ation to study the presented topic, working memory, or
and the size of the testing magnitude. For example, in one general intelligence) or that such confounders might
study with undergraduates in an educational psychology mask an effect of prior knowledge.
class, retrieval practice benefitted everyone regardless of We know of only one study where background topic
their previous topic knowledge, and the benefits were the knowledge was manipulated experimentally to study its
greatest when students studied unfamiliar content possible moderating influence on the testing effect
(Cogliano et al., 2019). In contrast, two other studies (Buchin & Mulligan, 2023). Buchin and Mulligan trained
found that retrieval practice is more beneficial for learners participants on one of two domains: either historical
with high knowledge. In a biology course, Carpenter et al. geology (e.g., geologic time and isotopic dating) or sen­
(2016) compared student performance when students sation and perception (e.g., colour perception and chemi­
either retrieved information (e.g., remembering vocabulary cal senses). In the three-day training phase, participants
definitions) with feedback or copied the necessary infor­ were randomly assigned to learn about three of the four
mation down without retrieval. Carpenter and colleagues subtopics within the domain, one subtopic per day for
found that higher-performing learners benefited the most about forty-five minutes each day. Next, participants
from retrieving, but middle-performing and lower-perform­ immediately entered the learning phase where they read
ing learners benefited more from copying. Similarly, Francis four short passages; three topics participants were
et al. (2020) found that, among undergraduate students in a already trained on, and one topic was novel. Two days
psychology course, only students high in prior knowledge, later, participants, regardless of condition assignment,
and not those low in prior knowledge, benefited from retrie­ took a final test both on historical geology and on sen­
val, though this comparison was to no activity at all rather sation and perception. The final test had questions that
than to restudy. Lastly, another study found that under­ were nearly identical to the information shown in the
graduates in a psychology course performed better when learning phase (“retention”), similar to what was studied
they studied via a practice test with corrective feedback in the learning phase (“near-transfer”), and about infor­
compared to restudying, regardless of prior knowledge mation that was initially studied in the training phase
(Glaser & Richter, 2023; for null results using a pre-exper­ but not retrieved in the learning phase (“far-transfer”).
imental measure of prior knowledge, see also Xiaofeng Buchin and Mulligan found that participants performed
et al., 2016). These studies have conflicting results – two significantly better on the test questions when they
find testing most helpful for high-performing learners, studied via retrieval practice, compared to restudying. Cri­
another finds a compensatory benefit for low-performing tically, retrieval practice was equally effective for both
learners, and another finds testing is equally helpful for levels of prior knowledge on a topic. However, given
everyone. that only one known experimental study has explored
Is a potential compensatory effect as simple as the fact this phenomenon, more research is necessary.
that more-knowledgeable individuals have less to learn, or
is there something about the nature of testing itself that
The present study
would benefit a less-knowledgeable learner compared to
a more-knowledgeable learner? One challenge in asses­ In the current study, we present four experiments examin­
sing the effects of prior knowledge on new learning is to ing background topic knowledge as a potential moderator
establish that the learning is indeed new. It is quite poss­ of the testing effect among college students reading expo­
ible that higher-knowledge participants perform better sitory science passages. For the first three experiments,
on an end-of-experiment test of putative learning not both study strategy (retrieval practice versus restudying
because they were actually better at acquiring that knowl­ via reading sentence facts) and availability of background
edge within the context of the experiment but because topic material were orthogonally manipulated. For Exper­
they simply already knew more of the material before iment 2B, participants also received immediate feedback
the experiment ever began. We argue that the nature of when studying via retrieval practice. Experiments 2A and
testing itself could be compensatory, where it would 2B were also looked at together post-hoc, to examine
benefit a less-knowledgeable learner more than a more- the role of feedback as a between-subjects variable.
knowledgeable learner. Lastly, Experiment 3 used a between-participants
MEMORY 5

manipulation to control for the possible confound of a received background passages for two passages that they
forward testing effect, as we detail below. practiced using retrieval and did not receive background
One drawback of Experiments 1, 2A, and 2B is that they passages for two passages that they restudied. In the
could be subject to the forward testing effect: if a partici­ retrieve – no background condition, participants conversely
pant gets a retrieval practice condition followed by the received background passages for two passages that they
restudy condition, the material studied in the restudy con­ restudied and did not receive background passages for
dition could also benefit from the previous retrieval prac­ two passages that they practiced using retrieval. If back­
tice use, thus eliminating the apparent benefits of ground topic knowledge strengthens the testing effect,
retrieval practice. Such a forward testing effect could we would expect the testing effect to exist only or be
obscure any benefits of the (backwards) testing effect of larger in the retrieve – background condition. Conversely,
interest. To account for possible effects of the forward if retrieval practice has a compensatory effect in the face of
testing effect and rule out this potential confound, Exper­ low domain knowledge, we would expect the testing
iment 3 will use a between-person design in which partici­ effect to be only present or larger for participants in the
pants only restudy or only use retrieval practice, not both. retrieve-no background condition.
We present three competing hypotheses for these
studies. First, it is possible that participants who have
already acquired additional background knowledge Method
about a topic will show a larger testing effect. This Participants
would be in line with the bifurcation model: benefits of
retrieval practice are enhanced when learners can success­ Our sample size was somewhat limited by the number of
fully remember the to-be-retrieved information, which participants we could recruit. To determine whether this
should be more likely with supportive background infor­ sample provided sufficient statistical power, we conducted
mation. By contrast, true novices may have an insufficient a sensitivity analysis in G*Power (Faul et al., 2007). We
knowledge base to support meaningful retrieval practice. planned a mixed-effects model, which is not currently sup­
Second, it is conversely feasible that retrieval practice is ported by G*Power, but for designs with crossed random
particularly beneficial for those who are more novice. For effects of participants and items, such as this one, our
novice learners, retrieval practice could be compensatory planned mixed-effects models should have greater power
and help them catch up to expert learners. This would than the corresponding repeated-measures ANOVA
align with one of the correlational studies mentioned because the mixed-effects model additionally accounts for
above (Cogliano et al., 2019) as well as the analogous item-level variation that is treated as unexplained error in
finding that individuals with lower working memory the RM-ANOVA (Quene & van den Bergh, 2004). Thus, we
capacity showed a greater compensatory benefit from conducted a sensitivity analysis for a repeated-measures
retrieval compared to those with higher working ANOVA in G*Power with the understanding that, if any­
memory capacities (Agarwal et al., 2017). thing, this likely underestimated our statistical power.
A third possibility is that the testing effect works equally The sensitivity analysis revealed a sample size of 129
well for everyone – as suggested by the correlational participants would be sufficient detect a small-to-
results of Glaser and Richter (2023) and the experiment medium effect size (R2 = .02, power = .80) for a main
by Buchin and Mulligan (2023). Perhaps the testing effect effect of retrieval practice, and we recruited more partici­
is so robust that all learners can benefit from it, regardless pants than that. All participants were University of Pitts­
of their level of background knowledge. burgh undergraduate students enrolled in an
introductory psychology course and were recruited via
Sona Systems in exchange for partial course credit for
Experiment 1 completion of both study sessions. Participants were
All of the experiments reported here consisted of two ses­ required to speak fluent English, be eighteen years or
sions separated by one week. In the initial learning session, older, and have access to a computer. There was approxi­
participants sequentially read a series of four expository mately 23% attrition between session one (N = 223) and
passages and completed a learning activity; they then session two (N = 172).
completed a final test session a week later. We manipu­
lated (a) whether or not each passage in the learning
Materials
session was preceding by an additional passage contain­
ing background information and (b) whether the learning Session one background and main passages
activity following a passage was retrieval practice or Learners read about four different science topics: dino­
restudying. saurs, comets, the Great Barrier Reef, and acupuncture.
In Experiment 1, this was done by randomly assigning The dinosaurs, comets, and acupuncture passages were
each participant to one of two between-participant con­ from Norberg (2022), and the Great Barrier Reef passages
ditions corresponding to the hypotheses discussed were created from texts from The Great Barrier Reef,
above. In the retrieve – background condition, participants Eighth Grade Reading Passage (2013), Clownfish Facts
6 J. A. MACALUSO AND S. H. FRAUNDORF

Figure 1. Flow of activities for session one in Experiments 1, 2A, 2B, and 3. In Experiments 1, 2A, and 3, participants did not receive immediate feedback. In
Experiment 2B, participants did receive immediate feedback in the retrieval practice study strategy condition for session one.

(2021), Humpback Whale (2022), and the Types of Coral sentence form. The “answer” portion of the sentence was
Reefs (2021). Within each topic, there were three main pas­ bolded to emphasise the important part of the sentence,
sages and three background passages, one per subtopic. as one would focus on the answer options in a retrieval
Each main passage was approximately 200 words (see practice question, such as It was once believed that they
Appendix A), and each background reading was about were slow and not likely to survive, but we now believe
400 words (see Appendix B). they could move quickly and easily. (See Appendix D
for all restudy items.)
Session one retrieval practice questions and Reading
sentence facts Session two materials
During session one, participants studied two topics via Session two consisted of a final test containing the same
retrieval practice and two by reading sentence facts, forty-eight questions as the session one retrieval practice
depending on condition assignment. For each topic, we questions.
created twelve items for participants to learn (forty-eight
questions total). Three items corresponded to each subto­
Procedure
pic, and three to the passage as a whole.
Across the retrieval practice and reading sentence facts All study methodology was approved by the University of
conditions, we sought to closely control the presented Pittsburgh’s Institutional Review Board (IRB). All sessions of
content and vary only how participants interacted with the experiments took place online on Qualtrics (https://
it. For the retrieval practice conditions, each item was www.qualtrics.com/).
encountered in the form of a three-alternative multiple- Learners participated in session one at any time; session
choice practice question, such as How has the perception two was made available to each participant one week after
of dinosaurs changed? (a) It was once believed that they they completed the first session and was available for up
were slow and not likely to survive, but we now believe to twenty-four hours. Session one consisted of the back­
they could move more quickly and easily. (b) It was once ground topic knowledge intervention, reading the main
believed that they only existed for 300,000 years, but now texts for all four topics, and the study strategy manipu­
we know they lasted 160 million years. (c) It was once lation (see Figure 1). Session two consisted of a final retrie­
believed that dinosaurs could swim, but now we know they val practice test for all four topics (see Figure 2).
were too big to swim, where (a) is the correct answer.
(See Appendix C for all retrieval practice items.) For the Session one
reading sentence facts condition, participants read a sen­ First, participants encountered the background topic
tence that mimicked the information shown in the mul­ knowledge intervention. Participants covered each of the
tiple-choice question and answers but was instead in four topics in sequence. For each topic, participants

Figure 2. Flow of activities for session two in Experiments, 1, 2A, 2B, and 3.
MEMORY 7

received background material (potentially), then the main order of presentation of these twelve sentence facts was
passage, and then one of the study activities (read sen­ also randomised.
tence facts or retrieval practice). Then, the procedure Critically, participants were only tested on information
repeated for the next topic. provided in the main passage; prior norming by Norberg
Two of the four topics – randomly varied across partici­ (2022) established that each of these test questions
pants – were first introduced with a background knowl­ could be answered above chance even without exposure
edge passage. Participants had 115 s to read each to the background material. Thus, none of the information
background passage before Qualtrics auto advanced. provided during the study strategy intervention phase of
This timing was determined by prior norming conducted the experiment required the background passage2;
by Norberg (2022) as well as participant feedback during rather, the information in the background passage was
our initial piloting. For the remaining two topics, partici­ given to provide additional support for the participants’
pants did not receive background material. learning of the main passage.
Following the background topic knowledge interven­ Participants repeated this flow of activities for all four
tion, a main text passage was presented. Participants topics. Figure 3 shows an example of four possible list
had 60 s to read the main passage before Qualtrics auto­ options for a given topic for session one. The topics and
matically proceeded to the next page of the experiment. condition assignments were counterbalanced across
As with the background passages, this timing was deter­ eight presentation lists so that, across participants, each
mined by the timing used in Norberg (2022) as well as par­ topic appeared in each serial position and in each con­
ticipant feedback during initial piloting. dition of the background topic knowledge and study strat­
The study strategy manipulation followed the main pas­ egy manipulations.
sages. For two of the topics, learners were randomly
assigned to use retrieval practice to study, and for the Session two
other two topics, learners read sentence facts. The retrieval Session two occurred one week after session one, and par­
practice condition was self-paced, but participants needed ticipants had twenty-four hours to complete it. In session
to answer all twelve of the topic questions to proceed.1 two, participants took a final multiple-choice test regard­
The order of both the twelve practice questions and the ing the passages all the participants read (i.e., the main
three response options within each question was random­ passages, not the background passages). The final test
ised. Participants in the reading sentence facts condition included twelve questions per topic, resulting in forty-
had 60 s to read all of the twelve sentence facts affiliated eight total questions. Participants were tested on each of
with that topic before Qualtrics automatically advanced. the four topics one at a time, with all twelve questions
This timing was used because the number of words read for that topic appearing simultaneously. The final quiz
during the reading sentence condition was comparable was self-paced, but participants needed to answer all the
to the number of words in a main passage, so participants presented questions in order to proceed. These questions
were allocated the same amount of time. Further, the fixed were identical to the questions that would be presented in
presentation time prevented participants from clicking the session one retrieval practice condition. The presen­
through the restudy items as quickly as possible. The tation order of the topics was counterbalanced across

Figure 3. Four possible list options for each topic for session one in Experiments 1, 2A, and 2B. Topic (i.e., “Topic 1”) is dependent on a given participant’s
condition assignment for session one.
8 J. A. MACALUSO AND S. H. FRAUNDORF

stimulus lists. The question order and question answers Table 1. Results from the mixed-effects model for Experiment 1.
were presented in a re-randomised order (i.e., not necess­ z- p-
arily the same order as in session one). Estimate SE value value
Study Strategy −0.09 .09 −0.96 .338
Background Topic Knowledge 0.16 .05 3.09 .002**
Summary of design Study Strategy × Background Topic 0.04 .15 0.24 .812
Knowledge
Experiment 1 employed a 2 × 2 design in which we
Note: “Study Strategy” refers to retrieval practice relative to reading sen­
manipulated study strategy (retrieval practice3 or reading tence facts. Coefficients = the estimate of each effect of each variable
sentence facts) and background topic knowledge (i.e., on accuracy on the session two final test, as reported in log odds; SE =
receiving or not receiving a background topic passage). standard error of the estimate; **p < .01.
In Experiment 1, each participant encountered two of
the four cells of the design; participants randomly assigned
Table 1 presents the results from the main model. There
to the retrieve – background condition saw (1a) back­
was a significant main effect of background topic material
ground and retrieval practice and (1b) no background
(p = .002), confirming the efficacy of our manipulation of
and restudying whereas participants assigned to the
background topic knowledge: when participants received
retrieve – no background condition saw (2a) background
background topic material, they performed significantly
and restudying and (2b) no background and retrieval prac­
better on the session two test compared to when they
tice. (To preview, Experiment 2A was run with a fully
did not receive background topic material. However,
within-subjects design.) The assignment of specific topics
there was not a significant main effect of study strategy
to conditions was counterbalanced across presentation
(p = .34); that is, we did not find that retrieval practice
lists, resulting in a total of eight different presentation
resulted in better test performance than restudying. Nor
lists for both sessions.4
was there a significant two-way interaction between
study strategy and background topic material (p = .81).
Results Figure 4 presents a graphical representation of the mean
accuracies.6
Data were analysed using mixed-effects logit models using
the R packages lme4 for frequentist analysis (Bates et al.,
2015) and brms for Bayesian analysis (Bürkner, 2017). The Bayesian analyses
dependent variable was the accuracy of each item on The analysis reported above did not find that background
the session two test, scored on a binary basis as either 1 topic knowledge significantly moderated whether a
(correct) or 0 (incorrect). There were forty-eight questions testing effect occurred – indeed, no overall testing effect
total, twelve per topic. emerged at all. However, a limitation of such a frequentist
analysis is that it cannot directly provide evidence for the
absence of a testing effect or a testing x background
Mixed-effects logit models
interaction.
Fixed effects reflected experimental condition assignment Thus, we conducted an additional Bayesian analysis
and were contrast-coded in analyses to obtain main effects using the default priors in the brms package to compare
analogous to that of an ANOVA. For Experiment 1, there Bayes Factors of the various competing models.7
were two fixed effects: study strategy and the background
BF12 = P(data|M1 )/P(data|M2 )
topic knowledge (i.e., receiving or not receiving a back­
ground topic passage). The study strategy variable was We used the guidelines of Kass and Raftery (1995) to
contrast-coded with retrieval practice coded as 0.5 and define weak versus strong evidence for the various
reading sentence facts as −0.5, and the background models (see Table 2). Broadly, for values greater than
topic knowledge variable was contrast-coded as 0.5 for one, the evidence favours the model in the numerator
receiving a background topic passage and −0.5 for not (typically the null hypothesis; M1) over the model in the
receiving a background topic passage. denominator (typically the alternative hypothesis; M2). In
The random effects of the model initially included contrast, for values less than one, the evidence favours
random intercepts for participant, the forty-eight final test the model in the denominator over the model in the
questions, and the broader passage topic. The maximal numerator.
model with all possible random slopes did not converge; fol­ The Bayes Factors were compared using the bridgesam­
lowing Matuschek et al. (2017) we obtained the maximal pling package (Gronau et al., 2020). Four models were com­
random effects structure justified by the data by removing pared. Each model had four chains, each with 4000
random effects that did not contribute significantly to the iterations and 2000 iterations for warm up.
model. The final model only contained random intercepts The full model contained both main effects (study strat­
for participant (variance = .13) and test question (variance egy and background topic material), an interaction term,
= .72).5 The final model had an adjusted ICC of .21 for all and the random intercepts of participant and question.
the random effects and a marginal R2 of .002. The 95% credible intervals for each fixed effect of the full
MEMORY 9

Figure 4. Mean accuracy on the session two test of Experiment 1 based session one condition assignments. Error bars signify the 95% confidence interval.

model were: study strategy [−0.27, 0.09], background full model (M2). This yielded a Bayes Factor of 2.45,
[−0.04, 0.32], and interaction term [−0.26, 0.33]. suggesting weak evidence favouring the no interaction
The no interaction model was identical to that of the full model, which contained only main effects, over the
model, except without the interaction term. The 95% cred­ model with both the main effects and the interaction
ible intervals for each fixed effect of the no interaction term. To test the effect of background knowledge, we
model were: study strategy [−0.17, 0.03] and background compared the no interaction (M1) model, which contained
[0.06, 0.26]. Given that the credible interval for the fixed both main effects, to the model that contained only the
effect of background did not contain zero, there was main effect of study strategy (M2). This comparison
meaningful evidence of a positive effect of background yielded a Bayes Factor of 15.81 in favour of the no inter­
knowledge. action model, suggesting strong evidence for the effect
The only study strategy model contained only the main of background knowledge. Lastly, to test the effect of
effect of study strategy and the two random intercepts. study strategy, we compared the no interaction model
The 95% credible interval for the fixed effect of study strat­ (M1), which contained both main effects, to the only back­
egy was [−0.19, 0.00]. Given that the credible interval for ground model (M2), which contained only the effect of
the fixed effect of study strategy was at zero, there was background knowledge. This comparison yielded a Bayes
weak evidence of a possible negative effect of retrieval. Factor of 0.31 in favour of the only background model,
The only background model contained only the main suggesting moderate evidence against an effect of study
effect of background topic material and the two random strategy.
intercepts. The 95% credible interval for the fixed effect In summary, the Bayesian analysis provided strong evi­
of background was [0.07, 0.27]. Given that the credible dence for a main effect of background topic knowledge,
interval for the fixed effect of background did not which aligns with our mixed-effects analyses above. In
contain zero, there was meaningful evidence of a positive addition, the Bayesian analysis provided moderate evi­
effect of background knowledge. dence against the presence of a testing effect in Exper­
To test the retrieval practice effect x background inter­ iment 1.
action, we compared the no interaction model (M1) to the

Table 2. Bayes Factor guidelines by Kass and Raftery (1995) for constituting Discussion
weaker versus stronger evidence for various models.
Bayes factor (BF) Interpretation of evidence
In Experiment 1, we orthogonally manipulated both study
BF < 0.007 Very strong evidence favouring M2 over M1
strategy (retrieval practice versus reading sentence facts)
0.007 ≤ BF < 0.05 Strong evidence favouring M2 over M1 and availability of background material for a given topic
0.05 ≤ BF < 0.33 Moderate evidence favouring M2 over M1 to explore the possible moderating effect of background
0.33 ≤ BF < 1 Weak evidence favouring M2 over M1
1 ≤ BF < 3 Weak evidence favouring M1 over M2
topic knowledge on the testing effect. In our frequentist
3 ≤ BF < 10 Moderate evidence favouring M1 over M2 analysis, we found a significant main effect of background
10 ≤ BF < 30 Strong evidence favoring M1 over M2 topic knowledge where participants performed signifi­
30 ≤ BF < 100 Very strong evidence favouring M1 over M2
cantly better on the session two test if they had received
10 J. A. MACALUSO AND S. H. FRAUNDORF

background topic material during session one, and Baye­ = .04, power = .80) for a main effect of retrieval practice. The
sian analysis indicated strong evidence in favour of this effect size detection was adjusted due to online participant
effect. This confirms that our manipulation was successful recruitment availability. Participants were recruited via the
in creating meaningful differences in knowledge, prior to same methods as Experiment 1 with the same inclusion cri­
retrieval practice, that could assist in later learning and teria. There was approximately 14% attrition between
retention. Nevertheless, even before removing partici­ session one (N = 138) and session two (N = 119).
pants answering at or below chance8, participants who
did not receive the background topic material still per­
formed above chance, demonstrating that the background Procedure
topic material was not required to understand the texts As with Experiment 1, both sessions of Experiment 2A took
(consistent with the norming by Norberg, 2022). place online via Qualtrics. The materials and procedure for
Critically, however, we did not find a significant main Experiment 2A were identical to Experiment 1.
effect of study strategy, nor did we find a significant inter­
action between our two variables of interest. This con­
clusion was further supported by our Bayesian analyses, Results
which found moderate evidence against the testing
Mixed-effects logit models
effect and weak evidence against the interaction.
In sum, while we created meaningful differences in For Experiment 2A, there were two fixed effects: study
background knowledge via our lab manipulation, we strategy and background topic knowledge. Study strategy
found that such differences did not moderate the exist­ and background topic knowledge were coded the same as
ence of a retrieval practice effect. One major caveat, in Experiment 1. Random effects of participant ID (variance
however, was that we did not find an overall testing = .14) and question (variance = .52) were included in the
effect. Another is that each participant did not experience model, as with Experiment 1. The final model had an
all four cells of the 2 × 2 design. adjusted ICC of .17 for all the random effects and a mar­
In Experiment 2, we sought to address both of these ginal R 2 of .002.
questions. We ran two versions of Experiment 2: Exper­ Table 3 presents the results from the main model. As in
iment 2A was a near replication of Experiment 1 that Experiment 1, there was a significant main effect of back­
was fully within-subjects while Experiment 2B tested our ground topic material (p = .046): when participants received
question under conditions expected to yield a stronger background topic material, they performed significantly
testing effect. better on the session two test compared to when they
did not receive background topic material. There was
neither a significant main effect of study strategy (p = .16),
Experiment 2A nor a significant interaction between study strategy and
Experiment 2A largely replicated the design of Experiment background topic material (p = .25). Figure 5 presents a
1 in that it employed 2 × 2 design: a within-subjects study graphical representation of the mean accuracies.
strategy intervention (i.e., testing or reading sentence
facts) and a within-subjects background topic knowledge
Bayesian analyses
intervention (i.e., receiving a background passage for a
given topic, or not). The difference was that the full factor­ The models for the Bayesian analyses were the same as those
ial design was presented within each participant; that is, in Experiment 1. The 95% credible intervals for each fixed
each participant encountered one topic in each of the effect of the full model were: study strategy [−0.32, 0.01],
four cells of the design. Also, unlike Experiment 1, Exper­ background [−0.12, 0.22], and the interaction term [−0.10,
iment 2A was pre-registered. 0.37]. The 95% credible intervals for each fixed effect of the
The assignment of texts to conditions was counterba­ no interaction model were: study strategy [−0.20, 0.03] and
lanced across lists, resulting in four different presentation background [0.00, 0.24]. Given that the credible interval for
lists for session one (compared to the eight presentation the fixed effect of background was at zero, there was weak
lists in Experiment 1 session one) and eight different presen­
tation lists for session two (as with Experiment 1). Session
Table 3. Results from the mixed-effects model for Experiment 2.
one of Experiment 2A had fewer presentation lists com­
z- p-
pared to Experiment 1 due to its within-subjects design. Estimate SE value value
Study Strategy −0.08 .06 −1.40 .162
Background Topic Knowledge 0.12 .06 2.00 .046*
Method Study Strategy × Background Topic 0.14 .12 1.16 .248
Knowledge
Participants Note: “Study Strategy” refers to retrieval practice relative to reading sen­
tence facts. Coefficients = the estimate of each effect of each variable
A sensitivity analysis conducted in G*Power revealed a sample on accuracy on session two final test reported in log odds; SE = standard
size of 105 participants to detect a medium effect size (R 2 error of the estimate; *p < .05.
MEMORY 11

Figure 5. Mean accuracy on the session two test of Experiment 2A based session one condition assignments. Error bars signify the 95% confidence interval.

evidence of a possible positive effect of background knowl­ effect of background knowledge and weak evidence
edge. The 95% credible interval for the fixed effect of study against a main effect of retrieval practice and its inter­
strategy for the only study strategy model was [−0.21, 0.03]. action with background knowledge.
The 95% credible interval for the fixed effect of background Although we did not find any evidence that the testing
for the only background model was [0.00, 0.24]. Given that effect was moderated by background knowledge, both
the credible interval for the fixed effect of background was Experiments 1 and 2A could be viewed as relatively unin­
at zero, there was weak evidence of a possible positive formative grounds for testing this question because no
effect of background knowledge. overall testing effect was observed. To better produce a
When comparing the no interaction model (M1) and the testing effect and thereby test for its potential moderators,
full model (M2), we found a Bayes Factor of 1.74, suggesting in Experiment 2B we included another manipulation
weak evidence against an interaction of study strategy and known to strengthen the testing effect: feedback.
background knowledge. We then tested the effect of back­
ground knowledge by comparing the no interaction (M1)
model to the model with only the study strategy main Experiment 2B
effect (M2). The Bayes Factor of 1.80 suggests weak evidence Experiment 2B largely replicated the design of Experiment
in favour of the main effect of background knowledge. 2A, but participants also received feedback in the session
Lastly, to test the effect of retrieval practice, we compared one retrieval practice study strategy condition. There was
the no interaction model (M1) to the only background a 2 × 2 design: a within-subjects study strategy interven­
model (M2) and found a Bayes Factor of 0.42, suggesting tion (i.e., testing and receiving immediate feedback or
weak evidence against a retrieval practice effect. reading sentence facts) and a within-subjects background
In sum, as with Experiment 1, we find weak (but signifi­ topic knowledge intervention (i.e., receiving a background
cant in the frequentist analysis) evidence in favour of an passage for a given topic, or not). All variables were coun­
effect of background knowledge, but no effects of retrieval terbalanced, resulting in four different presentation lists
practice – and, if anything, evidence against such an effect. for session one and eight different presentation lists for
session two (as Experiment 2A).
Discussion
Experiment 2A compared the same conditions as Exper­ Method
iment 1 in a fully within-participants design. We replicated
Participants
the overall conclusions: a significant main effect of back­
ground topic knowledge where participants performed Given the sample size we expected to be able to recruit, a
significantly better on the session two test if they received sensitivity analysis conducted in G*Power revealed a
background topic material during session one, no signifi­ sample size of 105 participants to detect a medium
cant main effect of study strategy, and no significant inter­ effect size (R 2 = .04, power = .80) for a main effect of retrie­
action between our two variables of interest. We also val practice. Participants were recruited via the same
found, with Bayesian analyses, weak evidence for the methods as Experiments 1 and 2A with the same inclusion
12 J. A. MACALUSO AND S. H. FRAUNDORF

Table 4. Results from the mixed-effects model for Experiment 2B. versus restudying) and background topic knowledge
z- (receiving versus not receiving background knowledge).
Estimate SE value p-value Study strategy and background topic knowledge were
Study Strategy 0.48 .06 8.07 <.001*** coded the same as in Experiments 1 and 2A. Random
Background Topic Knowledge 0.05 .06 0.82 .413
Study Strategy × Background Topic −0.09 .12 −0.79 .431 effects remained the same as those in Experiments 1 and
Knowledge 2A: participant ID (variance = .25) and question (variance
Note: “Study Strategy” refers to retrieval practice relative to reading sen­ = .49). The final model had an adjusted ICC of .18 for all
tence facts. Coefficients = the estimate of each effect of each variable the random effects and a marginal R2 of .015.
on accuracy on session two final test reported in log odds; SE = standard
error of the estimate; ***p < .001. Table 4 presents the results from the main model. There
was a significant main effect of study strategy (p < .001)
when participants studied via retrieval practice with
criteria. There was approximately 21% attrition between immediate feedback, they performed significantly better
session one (N = 148) and session two (N = 117). on the session two test compared to when they studied
by reading sentence facts. There was not a significant
Procedure main effect of background topic material (p = .41), and
there was not a significant two-way interaction between
As with Experiments 1 and 2A, both sessions of Experiment study strategy and background topic material (p = .43).
2B took place online via Qualtrics. The materials used in Figure 6 presents a graphical representation of the mean
Experiment 2B were nearly the same as Experiments 1 accuracies.
and 2A, with the exception that, in the retrieval practice
condition, participants received immediate feedback fol­
lowing each session one retrieval practice item. The feed­ Bayesian analyses
back would restate the question, state “Correct answer:”,
The models for the Bayesian analyses were the same as
and then state the correct answer to the question. This
those in Experiments 1 and 2A. The 95% credible intervals
feedback block was shown for 10 s before Qualtrics auto
for each fixed effect of the full model were: study strategy
advanced to the next question. The restudying condition
[0.37, 0.70], background [−0.07, 0.26], and the interaction
in session one in Experiment 2B was the same as in Exper­
term [−0.33, 0.14]. The 95% credible intervals for each
iment 1 and Experiment 2A.
fixed effect of the no interaction model were: study strat­
egy [0.36, 0.60] and background [−0.07, 0.17]. The 95%
Results credible interval for the fixed effect of study strategy for
the only study strategy model was [0.37, 0.60]. The 95%
Mixed-Effects logit models
credible interval for the fixed effect of background for
For Experiment 2B, there were two fixed effects: study the only background model was [−0.07, 0.16]. All the
strategy (retrieval practice with immediate feedback models that included the fixed effect of study strategy

Figure 6. Mean accuracy on the session two test of Experiment 2B based session one condition assignments. Error bars signify the 95% confidence interval.
*Indicates that participants received immediate feedback following the retrieval practice study strategy condition in session one.
MEMORY 13

had credible intervals for study strategy that did not between-subjects feedback experimental condition (i.e.,
contain zero, indicating there was meaningful evidence receiving immediate feedback on the session one retrieval
of a positive effect of retrieval. practice questions, or not).
When comparing the no interaction model (M1) and the
full model (M2), we found a Bayes Factor of 2.55,
Method
suggesting weak evidence favouring against an inter­
action of study strategy and retrieval practice. When com­ Participants
paring the no interaction (M1) model and the only study
Experiment 2A had a sample size of N = 119 and Exper­
strategy model (M2), we found a Bayes Factor of 0.20,
iment 2B had a sample size of N = 117, resulting in a
suggesting weak evidence against an effect of background
total sample size of N = 236 for our post-hoc analyses.
knowledge. When comparing the no interaction model
(M1) and the only background model (M2), we found a
Bayes Factor greater than 1000, suggesting very strong Results
evidence in favour of a main effect of study strategy.
Mixed-effects logit models
That is, aligning with our significant main effect of retrieval
in our mixed-effects analyses, we now do see clear evi­ For this set of analyses, there were three fixed effects:
dence for a testing effect. study strategy, background topic knowledge, and an
experimental variable of feedback (feedback or no feed­
back). Study strategy and background topic knowledge
Discussion
were contrast-coded the same as for previous exper­
After introducing immediate feedback to our retrieval iments. The between-subjects feedback experiment vari­
practice condition, we now found a significant testing able was contrast-coded to compare Experiment 2B with
effect: participants, when they studied via retrieval prac­ feedback (0.5) to Experiment 2A without feedback (−0.5).
tice and received immediate feedback, performed signifi­ Random effects remained the same as with previous
cantly better on the session two test compared to when experiments: participant ID (variance = .18) and question
they studied by reading sentence facts. However, this (variance = .50). The final model had an adjusted ICC of
testing effect was not moderated by background knowl­ .17 for all the random effects and a marginal R 2 of .011.
edge. Further, unlike Experiments 1 and 2A, we did not Table 5 presents the results from the main model. There
find a significant main effect of background topic was a significant main effect of study strategy (p < .001)
knowledge. where participants performed significantly better on the
Even though Experiment 2B yielded a significant testing session two test when they studied via retrieval practice
effect, we still found no evidence that the testing effect compared to restudying, replicating the classic testing
was moderated by background knowledge. Indeed, our effect. Inspection of the means in Figures 5 and 6 indicates
Bayesian analysis found evidence (albeit weak) against sig­ this difference was driven entirely by Experiment 2B, in
nificant interaction between the testing effect and back­ which feedback was provided; indeed, there was a signifi­
ground knowledge. cant two-way interaction between study strategy and
feedback (p < .001). Similarly, there was also a significant
main effect for the between-subjects experimental feed­
Experiments 2A and 2B
back variable (p < .001) in which participants performed
Experiments 2A and 2B were run concurrently during the better in Experiment 2B, where they received feedback
same academic term, and participants were randomly on their session one retrieval quiz questions, than in Exper­
assigned to one or the other. Further, these two exper­ iment 2A; this difference was driven entirely by the feed­
iments varied only in the presence of feedback: in Exper­ back by study strategy interaction (since the restudy
iment 2A, participants did not receive any feedback on conditions did not differ across experiments).
the retrieval practice questions in session one (as with There was a significant main effect of background topic
Experiment 1) and in Experiment 2B, immediate feedback knowledge (p = .049), where participants’ learning of the
was provided. Thus, we combined the data from the two main text was improved if they had access to background
experiments post-hoc into a single analysis and treated material. Critically, however, background topic knowledge
feedback as a between-subjects experiment variable that did not significantly interact with any other variables; that
captured the impact of either having or not having is, while background knowledge enhanced overall learn­
immediate feedback on the session one retrieval practice ing, it did not moderate the testing effect.
questions.
Combining the data of Experiments 2A and 2B yields a
Bayesian analyses
2 × 2 × 2 mixed design: a within-subjects study strategy
intervention (i.e., testing or restudying), a within-subjects The models for the Bayesian analyses were the same as
background topic knowledge intervention (i.e., receiving those in Experiments 1, 2A, and 2B. We did not include an
a background passage for a given topic, or not), and a experimental feedback fixed effect in our Bayesian
14 J. A. MACALUSO AND S. H. FRAUNDORF

Table 5. Results from the mixed-effects model for Experiment 2.


Estimate SE z-value p-value
Study Strategy 0.20 .04 4.68 <.001***
Background Topic Knowledge 0.08 .04 1.97 .049*
Experiment Feedback Manipulation 0.22 .06 3.51 <.001***
Study Strategy × Background Topic Knowledge 0.02 .08 0.27 .786
Study Strategy × Experiment Feedback Manipulation 0.57 .08 6.66 < .001***
Background Topic Knowledge × Experiment Feedback Manipulation −0.07 .08 −0.84 .403
Study Strategy × Background Topic Knowledge × Experiment Feedback Manipulation −0.23 .17 −1.36 .174
Note: “Study Strategy” refers to retrieval practice relative to reading sentence facts. Coefficients = the estimate of each effect of each variable on accuracy
on session two final test reported in log odds; SE = standard error of the estimate; *p < .05, ***p < .001.

models, given that the effect of feedback was captured by retrieval practice (Experiment 2B), a point we return to in
the retrieval practice condition of the study strategy inter­ the General Discussion.
vention in our mixed-effects models. The 95% credible Similarly, we also found a main effect of experiment,
intervals for each fixed effect of the full model were: study where performance was higher overall in Experiment 2B
strategy [0.07, 0.31], background [−0.04, 0.19], and the inter­ (where feedback was provided) than in Experiment 2A
action term [−0.15, 0.19]. Given that the credible interval for (where no feedback was provided). This main effect was
the fixed effect of study strategy did not contain zero, there presumably driven by the retrieval practice condition,
was meaningful evidence of a positive effect of retrieval. which was the only condition that differed across exper­
The 95% credible intervals for each fixed effect of the no iments, and indeed, we found a significant two-way inter­
interaction model were: study strategy [0.12, 0.19] and back­ action where the testing effect was amplified if
ground [0.00, 0.17]. Given that the credible interval for the participants received feedback.
fixed effect of study strategy did not contain zero, there Most critically for our present purposes, there was no
was meaningful evidence of a positive effect of retrieval. interaction of the testing effect with background topic
Given that the credible interval for the fixed effect of back­ knowledge, consistent with the findings of Buchin and
ground was at zero, there was weak evidence of a possible Mulligan (2023); indeed, our Bayesian analysis found mod­
positive effect of background knowledge. The 95% credible erate evidence against such an effect. Is this simply
interval for the fixed effect of study strategy for the only because the background material was irrelevant to lear­
study strategy model was [0.12, 0.28]. Given that the cred­ ners’ understanding of the text? No – we found a signifi­
ible interval for the fixed effect of study strategy does not cant main effect of background topic knowledge where
contain zero, there was meaningful evidence of a positive participants’ learning of the main text was improved if
effect of retrieval. The 95% credible interval for the fixed they had access to background material. Rather, back­
effect of background for the only background model was ground topic knowledge did not significantly interact
[0.00, 0.17]. Given that the credible interval for the fixed with the retrieval practice or feedback manipulations.
effect of background was at zero, there was weak evidence That is, we observed that retrieval practice with feedback
of a possible positive effect of background knowledge. to be equally beneficial across a range of prior knowledge.
When comparing the no interaction model (M1) and the However, one caveat of Experiments 1, 2A, and 2B is
full model (M2), we found a Bayes Factor of 5.10, that we observed a testing effect only when feedback
suggesting moderate evidence against an interaction was provided. Why did we not observe the testing effect
between retrieval practice and background knowledge. more robustly? One possibility is that these experiments
When comparing the no interaction (M1) model and the were subject to a confound known as the forward
model with only a main effect of study strategy (M2), we testing effect, where testing on prior material can also
found a Bayes Factor of 0.74, suggesting weak evidence enhance the learning of new information (Gupta et al.,
against a main effect of background knowledge. When 2024; Wissman et al., 2011; Yang et al., 2019). In particular,
comparing the no interaction model (M1) and the only if a participant gets a retrieval practice condition, then the
background model (M2), we found a Bayes Factor of restudy condition, the restudy condition could also benefit
greater than 1000, suggesting very strong evidence from the previous use of retrieval practice, thus eliminating
favouring a main effect of retrieval practice. the apparent benefits of retrieval practice. As a result, it is
possible that the forward testing effect is concealing the
benefits of testing on retention in Experiments 1, 2A,
Discussion and 2B.
When looking at Experiments 2A and 2B together post-
hoc, we found a significant testing effect: participants per­
Experiment 3
formed significantly better on the session two test when
they studied via retrieval practice, compared to when To rule out the potential confounding of the forward
they restudied. However, this was driven entirely by the testing effect, Experiment 3 used a 2 × 2 mixed design.
cases where participants received feedback on their Background topic knowledge (i.e., receiving a background
MEMORY 15

passage for a given topic, or not) was manipulated within- Table 6. Results from the mixed-effects model for Experiment 3.
subjects, as in previous experiments, but the study strat­ z- p-
egy manipulation was now conducted between-subjects: Estimate SE value value
participants either only restudied by reading the sentence Study Strategy −0.04 .07 −0.50 .620
Background Topic Knowledge 0.14 .05 2.64 .008**
facts or only used retrieval practice, not both. In Exper­ Study Strategy × Background Topic −0.12 .10 −1.14 .254
iment 3, participants did not receive immediate feedback Knowledge
when studying via retrieval practice. Since each participant Note: “Study Strategy” refers to retrieval practice relative to reading sen­
experienced only a single study strategy, this design elim­ tence facts. Coefficients = the estimate of each effect of each variable
on accuracy on session two final test reported in log odds; SE = standard
inates any potential carryover effects from retrieval prac­ error of the estimate; **p < .01.
tice onto a subsequent restudy condition.
There were eight different presentation lists for session
one, and eight different presentation lists for session two. and the combination of Experiments 2A and 2B; if individ­
To maintain varied presentation of the different topics, uals had more background topic knowledge, they per­
there were four different presentation lists for participants formed better on the session two test compared to
who only studied via retrieval practice and four different when they had less background topic knowledge.
presentation lists for participants who only studied via However, even after eliminating potential forward
reading sentence facts, resulting in the eight different testing effects, there was neither a significant main effect
presentation lists for session one. of study strategy (p = .62), nor was there a significant
two-way interaction between study strategy and back­
ground topic material (p = .25). Figure 7 presents a graphi­
Method cal representation of the mean accuracies.
Participants
Given the sample size we expected to recruit, a sensitivity Bayesian analyses
analysis conducted in G*Power revealed a sample size of
The models for the Bayesian analyses were the same as
129 participants to detect a small-to-medium effect size
those in Experiments 1, 2A, and 2B. The 95% credible inter­
(R 2 = .02, power = .80) for a main effect of retrieval practice.
vals for each fixed effect of the full model were: study strat­
Participants were recruited via the same methods as pre­
egy [−0.16, 0.20], background [0.06, 0.33], and the
vious experiments with the same inclusion criteria. There
interaction term [−0.31, 0.08]. The 95% credible intervals
was approximately 26% attrition between session one
for each fixed effect of the no interaction model were:
(N = 201) and session two (N = 152).
study strategy [−0.19, 0.11] and background [0.04, 0.24].
The 95% credible interval for the fixed effect of study strat­
Procedure egy for the only study strategy model was [−0.18, 0.10].
The 95% credible interval for the fixed effect of back­
As with previous experiments, both sessions of Experiment
ground for the only background model was [0.04, 0.24].
3 took place online via Qualtrics. The materials and pro­
All the models that included the fixed effect of background
cedure for Experiment 3 were comparable to those of pre­
had credible intervals for background that did not contain
vious experiments, except that participants either only
zero, indicating there was meaningful evidence of a posi­
studied via retrieval practice (n = 73) or only studied via
tive effect of background knowledge.
restudying of the sentence facts (n = 79). The background
When comparing the no interaction model (M1) and the full
passages remained the same as with previous studies, and
model (M2), we found a Bayes Factor of 2.17, suggesting weak
no feedback was included in Experiment 3.
evidence against an interaction. When comparing the no
interaction (M1) model and the only study strategy model
Results (M2), we found a Bayes Factor of 4.73, suggesting moderate
evidence favouring in favour of a main effect of background
Mixed-effects logit models knowledge. When comparing the no interaction model (M1)
For Experiment 3, there were two fixed effects: study strat­ and the only background model (M2), we found a Bayes
egy and background topic knowledge. Study strategy and Factor of 0.20, suggesting weak evidence against the main
background topic knowledge were coded the same as in effect of study strategy. That is, we again find evidence for
Experiments 1, 2A, and 2B. Random effects remained the an effect of background knowledge, but some evidence
same as those in Experiments 1, 2A, and 2B: participant against a retrieval practice effect (in the absence of feedback),
ID (variance = .12) and question (variance = .61). The final which aligns with our frequentist mixed-effects analyses.
model had an adjusted ICC of .18 for all the random
effects and a marginal R2 of .001.
Discussion
Table 6 presents the results from the main model. We
replicated the significant main effect of background As with Experiments 1, 2A, and the combination of Exper­
topic material (p = .008) as seen in Experiments 1, 2A, iments 2A and 2B, we found a significant main effect of
16 J. A. MACALUSO AND S. H. FRAUNDORF

Figure 7. Mean accuracy on the session two test of Experiment 3 based session one condition assignments. Error bars signify the 95% confidence interval.

background topic knowledge where participants per­ We found significant main effects of background topic
formed significantly better on the session two test when knowledge in all experiments except 2B. Participants per­
they received background topic material during session formed significantly better on the session two test when
one, confirming the relevance of our manipulation of back­ they received background topic knowledge during
ground knowledge. session one versus when they did not receive background
Experiment 3 was conducted in addition to Exper­ topic knowledge. This finding implies that background
iments 1, 2A, and 2B to control for the possible confound topic material, or perhaps expertise more broadly, helps
of the forward testing effect. A potential problem with the learning and retention. This is consistent with the rich-
design of the previous experiments is that if a participant get-richer phenomenon from Witherby and Carpenter
initially experiences the retrieval practice condition fol­ (2022), where prior domain knowledge helped participants
lowed by the restudy condition, the retrieval practice con­ learn new information about that same topic. It also
dition may create a forward testing effect that benefits the confirms the efficacy and relevance of our manipulation
subsequent restudied material, eliminating the apparent of background topic knowledge.
benefits of testing. By having participants either only In Experiments 1, 2A, and 3, we did not find an overall
study via reading sentence facts or only via testing, this testing effect whereby retrieval practice led to superior
confound was removed. However, even after eliminating learning and retention relative to restudying intact infor­
the potential forward testing effect confound with our mation. This was true even in Experiment 3, where we con­
between-subjects design, we neither found a significant trolled for a potential confounding of a forward testing
overall effect of retrieval practice, nor that the potential effect using a between-subjects design. We found a signifi­
benefits of testing were moderated by prior knowledge. cant testing effect only in Experiment 2B only, where we
Indeed, our Bayesian analysis provided evidence (albeit added feedback, given that prior work (Rowland, 2014;
weak) against both such effects. Yang et al., 2021; c.f., Adesope et al., 2017) has shown
that feedback enhances the effect of testing. This had a
significant effect on session two performance: learners per­
General discussion formed significantly better on the session two test if they
In four experiments, we tested college students’ learning received feedback on their session one retrieval practice
of expository science texts and their memory retention quiz questions compared to not.
one week later to experimentally examine how the Our critical question was whether background knowl­
testing effect may be moderated by background topic edge would moderate the testing effect. While we did
knowledge. We did not find any such moderation in our not find an overall testing effect in Experiments 1, 2A,
experiments but did find informative results regarding and 3, it is possible that the testing effect might have
our main effects of study strategy and background topic emerged specifically for participants with more or less
knowledge. We discuss each of those effects in turn (see prior knowledge. Recall, for instance, that the bifurcation
Table 7 for a summary of each experiment and their model suggests that benefits of retrieval practice are con­
findings). ditioned on successful remembering during retrieval
MEMORY 17

Table 7. Summary of experimental designs and findings for Experiments 1, 2A, 2B, and 3.
Experiment Design Main effect of study strategy Main effect of background topic knowledge Interaction
Exp.1 Mixed; 4 possible lists: Not significant Significant Not significant
1. retrieve – background: p < .01**
(1a) background and RP
(1b) no background and RS
2. retrieve – no background:
(2a) background and RS
(2b) no background and RP
Exp. 2A Within-subjects; 4 possible lists: Not significant Significant Not significant
1. background and RP p < .05*
2. no background and RP
3. background and RS
4. no background and RS
Exp. 2B Within-subjects; 4 possible lists: Significant Not significant Not significant
1. background and RP with FB p < .001***
2. no background and RP with FB
3. background and RS
4. no background and RS
Exp. 2A and Mixed; 8 possible lists: Significant Significant Not significant
Exp. 2B 1. Exp. 2A, RP without FB: p < .001*** p < .05*
(1a) background and RP
(1b) no background and RP
(1c) background and RS
(1d) no background and RS
2. Exp. 2B, RP with FB:
(2a) background and RP with FB
(2b) no background and RP with FB
(2c) background and RS
(2d) no background and RS
Exp. 3 Mixed; 4 possible lists: Not significant Significant Not significant
1. retrieval only: p < .01**
(1a) background and RP
(1b) no background and RP
2. restudying only:
(2a) background and RS
(2b) no background and RS
Note: RP refers to retrieval practice, RS refers to restudying via reading sentence facts, and FB refers to immediate feedback. Immediate feedback during
retrieval practice only occurred in Experiment 2B.

practice; thus, testing effects might only emerge for par­ restudy condition most commonly reread the same
ticipants equipped with the background knowledge. passage they studied initially. Instead, we had participants
Indeed, such possibilities are exactly what we sought out read sentences that mimicked the information shown in
to test. However, we found no interaction between study the multiple-choice questions and answers but instead in
strategy with background knowledge. Further, even in sentence form. That is, in the retrieval practice condition,
Experiment 2B, where there was a significant main effect participants practiced the multiple-choice questions they
of testing, it was not moderated by background knowl­ would later be tested on, without certainty in what the
edge. Thus, regardless of whether there was or was not correct answer was (with the exception of Experiment
an overall testing effect, in no circumstances did we find 2B); in the reading sentence facts condition, participants
the use of retrieval practice to interact with background saw the same items presented as intact sentences with
knowledge. Indeed, Bayesian analyses indicated weak the correct answer filled in. This procedure more closely
(Experiment 3) to moderate (Experiments 1 and 2) evi­ controlled the presented content and varied only how par­
dence against an interaction between retrieval practice ticipants interacted with it. It is possible that by instituting
and background knowledge. this additional level of control, we in fact controlled some
of the mechanisms underlying the traditional testing
effect. For example, when testing is compared against
The absence of a testing effect
rereading of an entire text, it is possible that some of the
It is somewhat surprising that we found a significant putative benefits of retrieval in fact stem from identifi­
testing effect only in Experiment 2B, given that the cation of key ideas and materials, which our current pro­
testing effect is generally considered a robust phenom­ cedure controlled.
enon (Adesope et al., 2017; Rowland, 2014; Yang et al., Another relevant element of our design may be that the
2021). One explanation is that our experiments contained retrieval practice used a multiple-choice format.9 This prac­
additional controls that are not present in the traditional tice format was chosen to match the final test, but mul­
testing effect paradigm, in which participants in a tiple-choice tests also expose learners to incorrect lure
18 J. A. MACALUSO AND S. H. FRAUNDORF

statements that could be misremembered as the correct that their participants performed better when they
information on the final test (Butler & Roediger, 2008; studied via a practice test with corrective feedback com­
but see also Little et al., 2012). Although the testing pared to restudying, regardless of prior knowledge.
effect can certainly occur even with multiple-choice and These results align with our Experiment 2 findings where
other recognition tests (for meta-analytic evidence, see we found that retrieval practice with feedback was most
Adesope et al., 2017; Rowland, 2014; Yang et al., 2021), it beneficial and was not affected by background material
is possible that the effect was diminished in this case by accessibility.
intrusions from the lures. This is consistent with the fact Buchin and Mulligan (2023) produced the only other
that, once feedback was provided in Experiment 2B, a study we know of in which prior knowledge was manipu­
robust testing effect emerged. Both this possibility and lated experimentally to study its moderating effect on the
the one above suggest a need to more closely examine testing effect. They also found that prior knowledge does
which elements of the classic testing-effect paradigm not moderate the testing effect. One contrast with Buchin
undergird the effect. and Mulligan (2023) is that we only found a significant
A final possible explanation for the absence of a testing testing effect when participants received immediate feed­
effect is that, in Experiments 1, 2A, and 3, participants back following their session one retrieval questions (Exper­
rushed through the retrieval practice questions since iment 2B) whereas Buchin and Mulligan found that
there was no minimum time required to complete the participants always had greater performance in the retrie­
retrieval practice. By comparison, the restudying condition val practice condition compared restudying. Another
forced participants to spend at least a full minute before difference is the manipulation of topic knowledge;
advancing (although there was of course no guarantee Buchin and Mulligan trained people to a criterion level of
that participants spent any or all of this time reading the performance (to guarantee they were experts), whereas
material; they could have not read the sentence facts at we gave everyone the same text for the same amount of
all or perhaps only read the sentences once before time (e.g., what you might get in a class). It is noteworthy
waiting for the experiment to proceed). In Experiment that, despite these differences in methodology and other
2B, where feedback is provided, we do see a significant findings, our studies converge on the conclusion that
testing effect. Here, perhaps participants needed to slow background topic knowledge does not moderate the
down a bit to read the feedback for each question and testing effect.
answer each question individually. Nevertheless, it seems This conclusion also aligns with the tentative findings of
unlikely that our pattern of results is a result of participants the review by Smith-Peirce and Butler (2025), who found
completely ignoring the to-be-learned material; overall that – in the limited literature thus far – the testing
performance was consistently above chance, and we effect appears to generalise across multiple learner charac­
removed those individual participants who answered at teristics, including prior knowledge as well as working
or below chance before comparing conditions. memory and personality variables, like grit or need for
cognition.
Comparison to prior work
Theories regarding prior knowledge and testing
These findings extend the existing literature exploring
prior knowledge and the testing effect. Although some Our findings are somewhat in contrast with the elaborative
studies have varied prior knowledge of the exact to-be- retrieval account (Carpenter, 2009), which proposes that
learned materials by manipulating the number of study- retrieval of information increases a learner’s retention of
test cycles or training participants to a criterion level of material because the retrieval process produces semantic
performance (e.g., Karpicke & Roediger, 2008; Soderstrom mediators that connect cues to targeted information.
et al., 2015), only more recently have researchers examined Instead, we did not find a significant testing effect in
how the testing effect might be moderated by broader Experiments 1, 2A, or 3. On the other hand, when feedback
background knowledge that is related, but not to identical, is provided during retrieval practice in session one, we see
to the to-be-learned information. better retention in session two, compared to not receiving
As reviewed in the introduction, some correlational feedback. Perhaps feedback is necessary to facilitate more
studies have explored these variables, resulting in mixed elaborative activation of material, resulting in the superior
findings. For instance, Cogliano and colleagues (2019) performance on the session two test in Experiment 2B. It
found that the testing effect was compensatory for lear­ seems possible that feedback leads to deeper and more
ners with low knowledge, whereas Carpenter and col­ focused processing of retrieval practice information, and
leagues (2016) conversely found that retrieval practice is prior research has shown that deeper processing of
more beneficial for learners with high knowledge. Our material results in better retention of the information,
results aligned with neither of these studies in that we compared to more shallow processing (Adesope et al.,
found that background topic knowledge did not moderate 2017; Rowland, 2014; Yang et al., 2021). This rationale is
the testing effect at all. Rather, our results were more also in line with the bifurcation model, where in the
similar to those of Glaser and Richter (2023), who found absence of feedback, retrieval practice largely boosts
MEMORY 19

one’s memory only for correctly retrieved information the level of background topic knowledge that could be
(Kornell et al., 2011). Feedback helps to close this gap by created in these laboratory manipulations is small relative
allowing testing to benefit memory even for material to some people’s out-of-laboratory expertise. In our
that was not correctly retrieved. ongoing work, we are examining whether the testing
effect is moderated by pre-existing expertise in topics
some people tend to know a lot about while others less
Limitations and future directions
so (e.g., Marvel superheroes and Harry Potter). Indeed,
One limitation of our study is that we tested participants’ prior research has found that some undergraduate stu­
learning only of the exact material they practiced, and dents are substantially more knowledgeable about some
we did not assess transfer to related knowledge. Transfer of these topics than other students (Troyer et al., 2020),
could have been assessed via questions that are similar which we can leverage as a more ecologically valid com­
to those found in session one, but not quite the same. It parison of untrained expertise.
is important to note that transfer is often difficult to
achieve (Gick & Holyoak, 1987; Haskell, 2001; Singley &
Educational relevance
Anderson, 1989), and its existence has been debated
(e.g., Barnett & Ceci, 2002; Detterman, 1993; Singley & By better understanding the effect of background
Anderson, 1989). In a meta-analysis, Pan and Rickard materials on the testing effect, educators can enhance stu­
(2018) found that positive transfer from the testing effect dents’ educational success. Educational activities usually
is dependent on (a) response congruency between initial need to reach a large range of students with varying
and final testing (i.e., having the same or similar responses levels of expertise, so it is crucial to determine whether
at the first and last tests), (b) whether there was elaborative testing is helpful for everyone or just for some. Thus, it is
retrieval practice (e.g., elaborative feedback that explains important to explore possible moderators of the testing
why a selected retrieval practice answer is right or effect that may impact its effectiveness, like background
wrong), and (c) initial test performance. Given that we topic knowledge. Here, however, we found that in situ­
did not include more elaborative feedback in either of ations where retrieval practice enhances learning – that
our experiments and that evidence for transfer is mixed, is, when feedback is included – it does so across a range
we decided to not test for transfer. of background knowledge.
Another limitation is that we deliberately picked topics However, we did find one other important constraint on
for which our participants were likely to have little or no using testing to enhance learning: in our study, retrieval
background topic knowledge. Although this was beneficial practice only significantly enhanced learning when feed­
in allowing us to experimentally manipulate people’s level back was provided. Consequently, while it could be ben­
of knowledge via the additional texts, it also limits the eco­ eficial to have students take intermittent smaller practice
logical validity insofar as people often come into a given quizzes to test their knowledge and enhance learning,
learning task with at least some baseline knowledge. For but to always provide feedback on their responses.
instance, when taking a class on genetics, the majority of Lastly, our findings indicate that background knowl­
learners will already have some prior knowledge about edge can lead to better learning of information regardless
biology (e.g., rudimentary knowledge about DNA replica­ of study strategy used. It is possible that participants in
tion based on introductory biology coursework that is typi­ these studies were able to draw on background material
cally required before taking higher-level coursework). and draw connections with the information in the main
An additional limitation is that in our experiments, par­ text, leading to better comprehension and retention.
ticipants engaged in a single round of retrieval practice, Perhaps additional readings should be encouraged for stu­
rather than multiple rounds. Previous work has found dents who have less knowledge on a given topic.
that repeated retrieval benefits memory (for a review,
see Carpenter et al., 2022). Engaging in multiple sessions
Conclusion
of retrieval practice allows participants to revisit the
material repeatedly, promoting additional opportunities Across four experiments, in our mixed-effects analyses, we
to strengthen learning (spacing effect or distributed practice found no evidence that an experimental manipulation of
effect, e.g., Dempster, 1988). In future work, we could background topic knowledge moderates the testing
conduct another experiment where participants engaged effect when applied to reading expository science texts.
in several rounds of retrieval practice. Perhaps more retrie­ In our Bayesian analyses, we indeed found evidence
val practice rounds would yield a significant testing effect against such an effect. Further, this effect cannot be attrib­
in this paradigm. uted to the irrelevancy of the background topic knowl­
Finally, in the present experiments, we experimentally edge because having more background information on a
manipulated access to background knowledge of a given given topic did lead to better learning and memory reten­
topic. A clear advantage of this design is that random tion overall. In conclusion, the testing effect – at least
assignment allowed us to make causal inferences about when feedback is provided – is robust across even mean­
the effect of background knowledge. On the other hand, ingful differences in background knowledge, and that may
20 J. A. MACALUSO AND S. H. FRAUNDORF

be one reason why testing with feedback is such a power­ Disclosure statement
ful effect in learning. No potential conflict of interest was reported by the author(s).

Notes
Funding
1. There was no feedback following each retrieval practice ques­
tion in Experiments 1, 2A, or 3. As discussed above, this con­ This project did not receive external funding.
trols the number activities and exposures to the material
and is standard in retrieval practice literature. Experiment 2B
will explore the role of feedback on accuracy with testing. Open practices data availability statement
2. By contrast, however, participants could not answer the retrie­
The appendices, supplemental materials, and data for all experiments
val practice questions with the background passage alone;
is available at https://osf.io/pg4ku/. Experiments 1 and 3 were not pre-
norming by Norberg (2022) established that participants per­
registered. Experiment 2A (https://osf.io/eq67w), Experiment 2B
formed at chance if only the background text were presented
(https://osf.io/w6mjr) were pre-registered.
and needed the main passage in order to accurately answer
the presented questions.
3. In all experiments, during session one in the retrieval practice
condition, participants used retrieval practice to study only Informed consent
one time.
Informed consent was obtained from all subjects in the study.
4. Session two consisted of the final test questions for all four
topics, which factorially would require only four possible pres­
entation lists. Additional exploratory measures were collected
in session two (e.g., the Motivated Strategies for Learning ORCID
Questionnaire and the Big 5 Questionnaire), resulting in
Jessica A. Macaluso http://orcid.org/0000-0003-0527-4142
additional presentation list options. Results regarding these
Scott H. Fraundorf http://orcid.org/0000-0002-0738-476X
exploratory measures can be found in the supplemental
materials.
5. The findings from the more complex models that included
random slopes remained consistent with the findings of the References
simpler models although the simpler model was always pre­ Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking
ferred according to BIC. the use of tests: A meta-analysis of practice testing. Review of
6. Mean performance on session one for all experiments can be Educational Research, 87(3), 659–701. https://doi.org/10.3102/
found in the supplemental materials. 0034654316689306
7. The brms package is designed to handle complex hierarchical Agarwal, P. K., Bain, P. M., & Chamberlain, R. W. (2012). The value of
structures, such as the mixed-effects models done here. applied research: Retrieval practice improves classroom learning
Default priors are set at a normal distribution and with a and recommendations from a teacher, a principal, and a scientist.
large standard deviation, assuming that most effects are Educational Psychology Review, 24(3), 437–448. https://doi.org/10.
small to moderate in size but also allows for a wide range of 1007/s10648-012-9210-2
plausible values. Agarwal, P. K., Finley, J. R., Rose, N. S., & Roediger, H. L. (2017). Benefits
8. Participants in all experiments who were performing at or from retrieval practice are greater for students with lower working
below chance on the session two test (i.e., having an overall memory capacity. Memory (Hove, England), 25(6), 764–771. https://
mean accuracy of 33% or lower) were removed from our doi.org/10.1080/09658211.2016.1220579
final analyses. The removal of these participants did not Barnett, S. M., & Ceci, S. J. (2002). When and where do we apply what
change the significance of any of the main effects or inter­ we learn?: A taxonomy for far transfer. Psychological Bulletin, 128(4),
actions for any experiment. 612–637. https://doi.org/10.1037/0033-2909.128.4.612
9. We thank an anonymous reviewer for this point. Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear
mixed-effects models using lme4. Journal of Statistical Software,
67(1), 1–48. https://doi.org/10.18637/jss.v067.i01
Open Scholarship Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old
theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin
(Eds.),, learning processes to cognitive processes: Essays in honors
of william estes (Vol. 2, pp. 35–67). Erlbaum.
This article has earned the Center for Open Science badges for Open Buchin, Z. L., & Mulligan, N. W. (2023). Retrieval-Based learning and
Data, Open Materials and Preregistered. The data and materials are prior knowledge. Journal of Educational Psychology, 115(1), 22–
openly accessible at https://osf.io/pg4ku/ or DOI 10.17605/OSF.IO/ 35. https://doi.org/10.1037/edu0000773
PG4KU., DOI 10.17605/OSF.IO/PG4KU and https://osf.io/eq67w or Bürkner, P. (2017). Brms: An R package for Bayesian multilevel models
DOI 10.17605/OSF.IO/PG4KU. using stan. Journal of Statistical Software, 80(1), 1–28. https://doi.
org/10.18637/jss.v080.i01
Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2007). The effect of type
and timing of feedback on learning from multiple-choice tests.
Author contributions
Journal of Experimental Psychology Applied, 13(4), 273–281.
Conceptualisation: J.A.M. & S.H.F.; Data curation: J.A.M. & S.H.F.; https://doi.org/10.1037/1076-898X.13.4.273
Formal analysis: J.A.M; Funding acquisition: S.H.F.; Investigation: Butler, A. C., Karpicke, J. D., & Roediger, H. L. (2008). Correcting a meta­
J.A.M. & S.H.F.; Methodology: J.A.M. & S.H.F.; Project administration: cognitive error: Feedback increases retention of low-confidence
S.H.F.; Resources: J.A.M. & S.H.F.; Software: J.A.M. & S.H.F.; Supervision: correct responses. Journal of Experimental Psychology. Learning,
S.H.F.; Validation: J.A.M. & S.H.F.; Visualisation: J.A.M.; Roles/Writing – Memory, and Cognition, 34(4), 918–928. https://doi.org/10.1037/
original draft: J.A.M. & S.H.F.; Writing – review & editing: J.A.M. & S.H.F. 0278-7393.34.4.918
MEMORY 21

Butler, A. C., & Roediger, H. L. (2008). Feedback enhances the positive Glenberg, A. M., Wilkinson, A. C., & Epstein, W. (1982). The illusion of
effects and reduces the negative effects of multiple-choice testing. knowing: Failure in the self-assessment of comprehension.
Memory & Cognition, 36(3), 604–616. https://doi.org/10.3758/MC. Memory & Cognition, 10(6), 597–602. https://doi.org/10.3758/
36.3.604 BF03202442
Carpenter, S. K. (2009). Cue strength as a moderator of the testing The Great Barrier Reef, Eighth Grade Reading Passage. (2013).
effect: The benefits of elaborative retrieval. Journal of Retrieved April 5, 2022, from https://www.readworks.org.
Experimental Psychology. Learning, Memory, and Cognition, 35(6), Gronau, Q. F., Singmann, H., & Wagenmakers, E. (2020).
1563–1569. https://doi.org/10.1037/a0017021 Bridgesampling: An R package for estimating normalizing con­
Carpenter, S. K., Lund, T. J. S., Coffman, C. R., Armstrong, P. I., Lamm, M. stants. Journal of Statistical Software, 92(10), 1–29. https://doi.org/
H., & Reason, R. D. (2016). A classroom study on the relationship 10.18637/jss.v092.i10
between student achievement and retrieval-enhanced learning. Gupta, M. W., Pan, S. C, & Rickard, T. C. (2024). Interaction between the
Educational Psychology Review, 28(2), 353–375. https://doi.org/10. testing and forward testing effects in the case of Cued-Recall:
1007/s10648-015-9311-9 Implications for Theory, individual difference studies, and appli­
Carpenter, S. K., Pan, S. C., & Butler, A. C. (2022). The science of cation. Journal of Memory and Language, 134, 1–14. https://doi.
effective learning with spacing and retrieval practice. Nat Rev org/10.1016/j.jml.2023.104476
Psychol, 1(9), 496–511. https://doi.org/10.1038/s44159-022-00089-1 Haskell, R. E. (2001). Chapter 2 - transfer of learning: What It Is and Why
Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The effects of it’s important. In R. E. Haskell (Ed.), Transfer of learning (pp. 23–39).
tests on learning and forgetting. Memory & Cognition, 36(2), 438– Academic Press. https://doi.org/10.1016/B978-012330595-4/
448. https://doi.org/10.3758/MC.36.2.438 50003-2.
Chase, W. G., & Simon, H. A. (1973). Perception in chess. Cognitive Huet, N., & Mariné, C. (2005). Clustering and expertise in a recall task:
Psychology, 4(1), 55–81. https://doi.org/10.1016/0010- The effect of item organization criteria. Learning and Instruction,
0285(73)90004-2 15(4), 297–311. https://doi.org/10.1016/j.learninstruc.2005.07.005
Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and rep­ Humpback Whale. (2022). Great Barrier Reef Foundation. Retrieved
resentation of physics problems by experts and novices. Cognitive April 5, 2022, from https://www.barrierreef.org/the-reef/animals/
Science, 5(2), 121–152. https://doi.org/10.1207/ humpback-whale.
s15516709cog0502_2 Kalakoski, V., & Saariluoma, P. (2001). Taxi drivers’ exceptional memory
Clownfish Facts. (2021). Great Barrier Reef Foundation. Retrieved April of street names. Memory & Cognition, 29(4), 634–638. https://doi.
5, 2022, from https://www.barrierreef.org/the-reef/animals/ org/10.3758/BF03200464
clownfish. Karpicke, J. D., Lehman, M., & Aue, W. R. (2014). Retrieval-based learn­
Cogliano, M., Kardash, C. M., & Bernacki, M. L. (2019). The effects of ing: An epieric context account. In B. H. Ross (Ed.), The Psychology of
retrieval practice and prior topic knowledge on test performance learning and motivation (pp. 237–284). Elsevier Academic Press.
and confidence judgments. Contemporary Educational Psychology, Karpicke, J. D., & Roediger III H. L. (2008). The critical importance of
56, 117–129. https://doi.org/10.1016/j.cedpsych.2018.12.001 retrieval for learning. Science, 319(5865), 966–968. https://doi.org/
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A frame­ 10.1126/science.1152408
work for memory research. Journal of Verbal Learning and Verbal Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American
Behavior, 11(6), 671–684. https://doi.org/10.1016/S0022- Statistical Association, 90(430), 773–795. https://doi.org/10.1080/
5371(72)80001-X 01621459.1995.10476572
Dempster, F. N. (1988). The spacing effect: A case study in the failure Kornell, N., Bjork, R. A., & Garcia, M. A. (2011). Why tests appear to
to apply the results of psychological research. American prevent forgetting: A distribution-based bifurcation model.
Psychologist, 43(8), 627–634. https://doi.org/10.1037/0003-066X. Journal of Memory and Language, 65(2), 85–97. https://doi.org/10.
43.8.627 1016/j.jml.2011.04.002
Detterman, D. (1993). The case for the prosecution: Transfer as an epi­ Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval
phenomenon. In D. K. Detterman & R. J. Sternberg (Eds.), Transfer attempts enhance subsequent learning. Journal of Experimental
on Trial: Intelligence, Cognition, and Instruction (pp. 1–24). Ablex Psychology: Learning, Memory, and Cognition, 35(4), 989–998.
Publishing. https://doi.org/10.1037/a0015729
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*power 3: A Little, J. L., Bjork, E. L., Bjork, R. A., & Angello, G. (2012). Multiple-choice
flexible statistical power analysis program for the social, behav­ tests exonerated, at least of some charges: Fostering test-induced
ioral, and biomedical sciences. Behavior Research Methods, 39(2), learning and avoiding test-induced forgetting. Psychological
175–191. https://doi.org/10.3758/BF03193146 Science, 23(11), 1337–1344. https://doi.org/10.1177/
Fazio, L. K., Huelser, B. J., Johnson, A., & Marsh, E. J. (2010). Receiving 095679761244337
right/wrong feedback: Consequences for learning. Memory (Hove, Macaluso, J. A., & Fraundorf, S. H. (under review). The educational
England), 18(3), 335–350. https://doi.org/10.1080/ implications of research exploring prior knowledge and new learn­
09658211003652491 ing: what we know, what we do not know, and where we go from
Francis, A. P., Wieth, M. B., Zabel, K. L., & Carr, T. H. (2020). A classroom here.
study on the role of prior knowledge and retrieval tool in the Matuschek, H., Kliegl, R., Vasishth, S., Baayen, H., & Bates, D. (2017).
testing effect. Psychology Learning & Teaching, 19(3), 258–274. Balancing type I error and power in linear mixed models. Journal
https://doi.org/10.1177/1475725720924872 of Memory and Language, 94, 305–315. https://doi.org/10.1016/j.
Gick, M. L., & Holyoak, K. J. (1987). The cognitive basis of knowledge jml.2017.01.001
transfer. In S. M. Cormier, & J. D. Hagman (Eds.), Transfer of learning: Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing
Contemporary research and applications (pp. 9–46). Academic versus transfer appropriate processing. Journal of Verbal Learning
Press. and Verbal Behavior, 16(5), 519–533. https://doi.org/10.1016/
Gignac, G. E., & Szodorai, E. T. (2016b). Effect size guidelines for indi­ S0022-5371(77)80016-9
vidual differences researchers. Personality and Individual Norberg, K. A. (2022). Avoiding miscomprehension: A metacognitive
Differences, 102, 74–78. https://doi.org/10.1016/j.paid.2016.06.069 perspective for how readers identify and overcome comprehen­
Glaser, J., & Richter, T. (2023). The testing effect in the lecture Hall: sion failure [Unpublished doctoral dissertation]. University of
Does it depend on learner prerequisites? Psychology Learning & Pittsburgh.
Teaching, 22(2), 159–178. https://doi.org/10.1177/ Overoye, A. L., James, K. K., & Storm, B. C. (2021). A little can go a long
14757257221136660 way: Giving learners some context can enhance the benefits of
22 J. A. MACALUSO AND S. H. FRAUNDORF

pretesting. Memory (Hove, England), 29(9), 1206–1215. https://doi. Smith-Peirce, R. N., & Butler, A. C. (2025). A scoping review of research
org/10.1080/09658211.2021.1974048 on individual differences in the testing effect paradigm. Learning
Pan, S. C., & Rickard, T. C. (2018). Transfer of test-enhanced learning: and Individual Differences, 118, 102602. https://doi.org/10.1016/j.
Meta-analytic review and synthesis. Psychological Bulletin, 144(7), lindif.2024.102602
710–756. https://doi.org/10.1037/bul0000151 Soderstrom, N. C., Kerr, T. Y., & Bjork, R. A. (2015). The critical impor­
Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When does tance of retrieval–and spacing–for learning. Psychological Science,
feedback facilitate learning of words? Journal of Experimental 27(2), 223–230. https://doi.org/10.1177/0956797615617778
Psychology: Learning, Memory, and Cognition, 31(1), 3–8. https:// Troyer, M., Urbach, T. P., & Kutas, M. (2020). Lumos!: Electrophysiological
doi.org/10.1037/0278-7393.31.1.3 tracking of (wizarding) world knowledge use during reading. Journal
Pyc, M. A., & Rawson, K. A. (2010). Why testing improves memory: of Experimental Psychology: Learning, Memory, and Cognition, 46(3),
Mediator effectiveness hypothesis. Science, 330(6002), 335. 476–486. https://doi.org/10.1037/xlm0000737
https://doi.org/10.1126/science.1191465 Types of Coral Reefs. (2021). Coral Reef Alliance. Retrieved April 5,
Quene, H., & van den Bergh, H. (2004). On multi-level modeling of data 2022, from https://coral.org/en/coral-reefs-101/types-of-coral-
from repeated measures designs: A tutorial. Speech reef-formations/.
Communication, 43(1-2), 103–121. https://doi.org/10.1016/j. Vicente, K. J., & Wang, J. H. (1998). An ecological theory of expertise
specom.2004.02.004 effects in memory recall. Psychological Review, 105(1), 33–57.
Roediger, H. L., & Karpicke, J. D. (2006a). Test-enhanced learning: https://doi.org/10.1037/0033-295X.105.1.33
Taking memory tests improves long-term retention. Psychological Wissman, K. T., Rawson, K. A., & Pyc, M. A. (2011). The interim test
Science, 17(3), 249–255. https://doi.org/10.1111/j.1467-9280.2006. effect: Testing prior material can facilitate the learning of new
01693.x material. Psychonomic Bulletin and Review, 18, 1140–1147. https://
Roediger, H. L., & Karpicke, J. D. (2006b). The power of testing doi.org/10.3758/s13423-011-0140-7
memory: Basic research and implications for educational practice. Witherby, A. E., & Carpenter, S. K. (2022). The rich-get-richer effect:
Perspectives on Psychological Science, 1(3), 181–210. https://doi.org/ Prior knowledge predicts new learning of domain-relevant
10.1111/j.1745-6916.2006.00012.x information. Journal of Experimental Psychology. Learning,
Roediger, H. L. III, Putnam, A. L., & Smith, M. A. (2011). Ten benefits of Memory, and Cognition, 48(4), 483–498. https://doi.org/10.1037/
testing and their applications to educational practice. In J. P. xlm0000996
Mestre & B. H. Ross (Eds.), The psychology of learning and motiv­ Xiaofeng, M., Xiao-e, Y., Yanru, L., & AiBao, Z. (2016). Prior knowledge
ation: Cognition in education (pp. 1–36). Elsevier Academic Press. level dissociates effects of retrieval practice and elaboration.
https://doi.org/10.1016/B978-0-12-387691-1.00001-6 Learning and Individual Differences, 51, 210–214. https://doi.org/
Rohrer, D., & Pashler, H. (2010). Recent research on human learning 10.1016/j.lindif.2016.09.012
challenges conventional instructional strategies. Educational Yang, C., Chew, S.-J., Sun, B., & Shanks, D. R. (2019). The forward effects
Researcher, 39(5), 406–412. https://doi.org/10.3102/ of testing transfer to different domains of learning. Journal of
0013189X10374770 Educational Psychology, 111(5), 809–826. https://doi.org/10.1037/
Rowland, C. A. (2014). The effect of testing versus restudy on reten­ edu0000320
tion: A meta-analytic review of the testing effect. Psychological Yang, C., Luo, L., Vadillo, M. A., Yu, R., & Shanks, D. R. (2021). Testing
Bulletin, 140(6), 1432–1463. https://doi.org/10.1037/a0037559 (quizzing) boosts classroom learning: A systematic and meta-
Singley, M. K., & Anderson, J. R. (1989). The transfer of cognitive skill. analytic review. Psychological Bulletin, 147(4), 399–435. https://
Harvard University Press. doi.org/10.1037/bul0000309

You might also like