Cognitive Science of Learning: The Testing Effect (Retrieval Practice)

by Justin Skycak on

The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice.

This post is part of the book The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Jan 2024). Suggested citation: Skycak, J., advised by Roberts, J. (2024). Cognitive Science of Learning: The Testing Effect (Retrieval Practice). In The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Jan 2024). https://justinmath.com/cognitive-science-of-learning-the-testing-effect/


Retrieval is the Most Effective Method of Review

To maximize the amount by which your memory is extended when solving review problems, it’s necessary to avoid looking back at reference material unless you are totally stuck and cannot remember how to proceed. This is called the testing effect (also known as the retrieval practice effect): the best way to review material is to test yourself on it. As Yang et al. (2023b) summarize:

  • "...[P]ractice testing (i.e., practice retrieval) is one of the most effective strategies to consolidate long-term retention of studied information and facilitate subsequent learning of new information, a phenomenon labeled the testing effect, the retrieval practice effect, or test-enhanced learning (Carpenter et al., 2022; Pan & Rickard, 2018; Roediger & Butler, 2011; Shanks et al., 2023; Yang et al., 2021).

    It has been firmly established that retrieval practice is more beneficial by comparison with many other learning strategies, such as restudying (Roediger & Karpicke, 2006b), note-taking (Heitmann et al., 2018; Rummer et al., 2017), concept-mapping (Karpicke & Blunt, 2011) and other elaborative strategies (Larsen et al., 2013)."

In other words, the testing effect exposes that “following along” is not the same as learning. Students often mistakenly believe that if they can follow along with a video, book, lecture, or any other resource, without feeling confused, then they’re learning. However, if you define learning as a positive change in long-term memory, then you haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems.

This doesn’t happen when you just “follow along,” even if you understand perfectly. It’s the act of retrieving information from memory that transfers the information to long-term memory. If you don’t practice retrieval, then the information quickly dissipates. It stays with you only briefly – just long enough to trick you into thinking it’ll stick with you, when it’s really on the way out the door.

Amusingly, the testing effect is one of the oldest cognitive learning strategies known to humankind – records date back as far as 1620, when Francis Bacon noted (pp. 76) the following:

  • "...[Y]ou won't learn a passage as well by reading it straight through·twenty times as you will by reading it only ten times and trying each time to recite it from memory and looking at the text only when your memory fails."

Since the early 1900s, this observation has been experimentally supported by hundreds of studies across widely different memory tasks, content domains, and experimental methodologies, which have indicated that the benefits of retrieval practice are caused by increased cognitive effort (Rowland, 2014). In particular, the testing effect has been shown to carry over to classroom settings, where frequent quizzing (with feedback) promotes greater learning on both tested and non-tested material (McDaniel et al., 2007). Its reliability has even been explicitly demonstrated across individual cognitive differences like working memory capacity (Pastötter & Frings, 2019). As Yang et al. (2023b) summarize:

  • "The classroom testing effect generalizes to students across different educational levels (including elementary school, middle school, high school, and university/college), and across 18 subject categories (e.g., Education, Medicine, Psychology, etc.). More importantly, the results showed that classroom quizzes not only benefit retention of factual knowledge, but also promote concept comprehension and facilitate knowledge transfer in the service of solving applied problems. Test-enhanced knowledge transfer has also been observed in many other studies (for a review, see Carpenter, 2012)."

Spaced Retrieval Practice

What’s more, as Kang (2016) notes, the testing effect can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice:

  • "Testing or spaced practice, each on its own, confers considerable advantages for learning. But, even better, the two strategies can be combined to amplify the benefits: Reviewing previously studied material can be accomplished through testing (often followed by corrective feedback) instead of rereading.

    In fact, many studies of the spacing effect compared spaced against massed retrieval practice, not just rereading (e.g., Bahrick, 1979; Cepeda, Vul, Rohrer, Wixted, & Pashler, 2008). Spaced retrieval practice (with feedback) leads to better retention than spaced rereading.

    One study examined how type of review (reread vs. test with feedback), along with timing of review (massed vs. spaced), affected eighth-grade students' retention of history facts (Carpenter, Pashler, & Cepeda, 2009). On a final test 9 months later, spaced retrieval practice yielded the highest performance (higher than spaced rereading)."

As Halpern & Hakel (2003) elaborate:

  • "The single most important variable in promoting long-term retention and transfer is 'practice at retrieval.' This principle means that learners need to generate responses, with minimal cues, repeatedly over time with varied applications so that recall becomes fluent and is more likely to occur across different contexts and content domains. Simply stated, information that is frequently retrieved becomes more retrievable.
    ...
    The effects of practice at retrieval are necessarily tied to a second robust finding in the learning literature -- spaced practice is preferable to massed practice. For example, Bjork and his colleagues recommend spacing the intervals between instances of retrieval so that the time between them becomes increasingly longer -- but not so long that retrieval accuracy suffers."

And as Yang et al. (2023a, pp.257) emphasize, frequent tests are ideal:

  • "Although it has been widely documented that a single test is sufficient to enhance memory compared to restudying, many laboratory studies have observed that repeated tests (i.e., with studied content tested repeatedly) produce a larger enhancing effect on knowledge retention and transfer than a single test (e.g., Butler, 2010; Dunlosky et al., this volume; Roediger & Karpicke, 2006b).

    The enhancing effect of repeated tests has been re-confirmed by many classroom studies. Moreover, Yang et al.'s (2021) metaanalysis coded the number of test repetitions (i.e., how many times the studied information was tested), and conducted analyses to quantify the relation between the magnitude of test-enhanced learning and the number of test (quiz) repetitions. The results showed a clear trend that the more occasions on which class content is quizzed, the more effectively quizzing aids exam performance."

Reducing Anxiety and Promoting a Growth Mindset

Unfortunately, the testing effect remains underused in traditional classrooms, where usually only a handful of tests are given throughout the entire duration of a course. As McDaniel et al. (2007) lament:

  • "...[D]espite this impressive body of evidence, the implications of the testing effect literature for educational practice have been virtually ignored by the educational community and educational research."

Why might this be? Perhaps the most obvious reason is that many people view tests, especially timed tests, as anxiety-inducing and consequently something to be avoided.

However, it is important to realize that test anxiety can be mitigated, and often even reduced, by giving frequent, low-stakes quizzes on skills that a student is ready to be tested on.

Appropriate vs Inappropriate Usage of Timed Tests

Often, negative feelings toward timed tests are the result of inappropriate usage of the timed test, such as introducing it too early in the student’s skill development process. A prerequisite for timed testing is that the student should be able to perform the tested skills successfully in an untimed setting. Timed testing demands a high level of proficiency, and anxiety can be produced if there is a mismatch between a student’s level of proficiency and the performance expectations that are placed on them.

As Codding, Peltier, & Campbell (2023) summarize:

  • "Learners may benefit more or less from various instructional strategies or tactics, depending on the learners' stage of skill development (Burns et al., 2010). That is, are learners working on acquiring a math skill or concept, building skill fluency, generalizing or transferring a skill or concept, or using known skills and concepts to solve novel problems?

    Just because timed practice opportunities have been proven effective to build fluency, for example, does not mean that timed trials always benefit learners (Fuchs et al., 2021). Using timed trials with students who are working to acquire new knowledge or skills is an instructional mismatch; rather, students need to display accuracy with skills and concepts before building fluency. It is not the fault of the strategy; it is an issue with when to implement the strategy."

Desirable vs Undesirable Difficulties

More generally, while desirable difficulties are a necessary component of effective practice, they are only effective insofar as the learner is able to overcome them. Introducing an insurmountable difficulty is never desirable, even if that type of difficulty may be desirable later on in the learning process once the student has increased their proficiency. It is the act of overcoming a desirable difficulty that leads to greater learning. As echoed by Brown, Roediger, & McDaniel (2014, pp.98-99):

  • "Elizabeth and Robert Bjork, who coined the phrase 'desirable difficulties,' write that difficulties are desirable because 'they trigger encoding and retrieval processes that support learning, comprehension, and remembering. If, however, the learner does not have the background knowledge or skills to respond to them successfully, they become undesirable difficulties.'
    ...
    Clearly, impediments that you cannot overcome are not desirable. ... To be desirable, a difficulty must be something learners can overcome through increased effort."

As Bjork & Bjork (2023, pp.22) elaborate:

  • "...[I]t is necessary to consider what level of difficulty is appropriate in order for that level to enhance a given student's learning, and the appropriate level that is optimal may vary considerably based on a student's background and prior level of knowledge.

    To illustrate, while it is typically desirable to have learners generate a skill or some knowledge from memory, rather than simply showing them that skill or presenting that knowledge to them, a given learner needs to be equipped via prior learning to succeed at the generation task -- or at least succeed in activating relevant aspects of the necessary skill or knowledge -- for the act of generating to then potentiate their subsequent practice or study (e.g., Little & Bjork, 2016; Richland, Kornell, & Kao, 2009)."

Indeed, Codding, VanDerHeyden, & Chehayeb (2023) found that when the type of instruction is mismatched against a student’s level of proficiency, the instruction will not only be ineffective, but can also lead to anxiety:

  • "This study illustrated that when instructional strategies are misaligned with students' stage of skill development, even when the instructional target is appropriate, students' math performance will not improve. Furthermore, as suggested in this study, students may exhibit higher levels of anxiety and lower acceptability of misaligned instructional practices."

Appropriate Timed Testing Can Reduce Math Anxiety

However, when used appropriately, timed testing can be a valuable tool for overcoming math anxiety by building fluency and automaticity. According to VanDerHeyden & Codding (2020), who have extensive experience researching academic intervention in mathematics, the relationship between math anxiety and timed testing is unclear, but there is a clear relationship between math anxiety and math proficiency (lower proficiency promotes anxiety, which further hinders skill development), and timed tests are useful for building proficiency:

  • "Teachers and parents worry about math anxiety, and some math education experts caution against tactics used in math class, such as timed tasks and tests, that might theoretically stoke anxiety (Boaler, 2012). First, the evidence does not support that people are naturally anxious or not anxious in the context of math assessment and instruction (Hart & Ganley, 2019). Second, simply avoiding math or certain math tactics should not be expected to ameliorate anxiety in the long term. Third, preventing a student from full exposure to math assessment and intervention costs the student the opportunity to develop adaptive coping mechanisms to deal with possible anxiety in the face of challenging academic content.
    ...
    Gunderson, Park, Maloney, Bellock, and Levine (2018) found a reciprocal relationship between skill proficiency and anxiety, such that weak skill reliably preceded anxiety and anxiety further contributed to weak skill development. They found that anxiety could be attenuated by two strategies: improving skill proficiency (this cannot be done by avoiding challenging math work and timed assessment) and promoting a growth mindset (as opposed to a fixed ability mindset) using specific language and instructional arrangements to promote the idea that I, as a student, can work hard and beat my score; I can grow today; my brain is like a muscle that gets stronger when I work it with challenging math content.
    ...
    There is very little empirical evidence examining whether timed tests have a causal impact on anxiety, and the existing few studies that include school-age participants do not support the idea (Grays, Rhymer, & Swartzmiller, 2017; Tsui & Mazzocco, 2006). What is clear is there is a modest, negative bidirectional relationship between math anxiety and math performance (Namkung et al., 2019). These correlational data suggest that poor mathematics performance can lead to high math anxiety and that high math anxiety can lead to poor mathematics performance. The remedy that school psychologists can advocate for is to identify, through effective and efficient screening, the presence of high math anxiety and determine which students would benefit from supplemental and targeted mathematics supports. Intervention approaches should target math skill deficits, address high anxiety, and promote a growth mindset as well as monitor progress toward clearly defined objectives using tools that are brief (often timed), reliable, and valid."

These sentiments are echoed by the U.S. Department of Education (Fuchs et al., 2021, pp.58), which recommends regularly using timed review activities to promote automatic retrieval of previously-learned material, since students will struggle to learn more advanced material unless they are able to automatically retrieve previously-learned material:

  • "Regularly include timed activities as one way to build students' fluency in mathematics. ... [However,] Do not use timed activities to introduce and teach mathematics concepts and operations.
    ...
    Quickly retrieving basic arithmetic facts (addition, subtraction, multiplication, and division) is not easy for students who experience difficulties in mathematics. Without such retrieval, students will struggle to follow their teachers' explanations of new mathematical ideas. Automatic retrieval gives students more mental energy to understand relatively complex mathematical tasks and execute multistep mathematical procedures.

    Thus, building automatic fact retrieval in students is one (of many) important goals of intervention. In addition to basic facts, timed activities may address other mathematical subtasks important for solving complex problems."

As summarized by Yang et al. (2023b), quizzes can increase students’ skill proficiency and familiarity with the format of assessment, which can reduce their test anxiety:

  • "...[I]t is well-known that tests motivate students to study harder (Yang et al., 2017a), encourage them to read the assigned textbook materials to prepare for the lecture (Heiner et al., 2014), reduce mind wandering while learning (Szpunar et al., 2013), and increase class attendance (Schrank, 2016).

    These beneficial effects of practice tests [i.e., quizzes] may make students more prepared for tests and reduce their worry about poor test performance, therefore alleviating TA [test anxiety] (Brown & Tallon, 2015; Yusefzadeh et al., 2019). Furthermore, tests may inform students about the formats and contents of future assessments, hence reducing uncertainty (i.e., uncertainty about how and what content will later be assessed) and mitigating anxiety (Jerrell & Betty, 2005)."

What’s more, as Hattie & Yates (2013, pp.59) explain, performing well on a timed test has been shown to build confidence and promote positive feelings:

  • "...[S]tudies conducted under laboratory conditions show that, for both adults and children, speed of access in memory functions strongly predicts two other attributes: confidence and positive feelings. Whenever people are able to recall important information quickly there is an inherent sense in that the information is correct, together with a momentary flush of pleasure."

Indeed, in a study of thousands of middle and high schoolers’ reactions to frequent (at least weekly), low-stakes, immediate-feedback quizzes during class, Agarwal et al. (2014) found that most students felt it made them less nervous for higher-stakes tests, and students were more likely to report a decrease in overall test anxiety than an increase:

  • "We asked students whether clicker quizzes (i.e., retrieval practice) made them more or less nervous for unit tests and exams ... Remarkably, 72% of students reported that retrieval practice made them less nervous for tests and exams, 22% said they experienced about the same level of nervousness, and only 6% of students said clickers made them more nervous.

    Next, we asked students whether they experienced more, less, or about the same level of test anxiety for the class with retrieval practice compared to other classes in which they did not have retrieval practice ... [O]nly 19% of students reported experiencing more anxiety, while 81% of students said they experienced about the same level of test anxiety or less in the class with retrieval practice compared to their other classes (33% reported less nervousness).
    ...
    [T]he use of clicker response systems reduced self-reported test anxiety. ... We hypothesize that students became familiar with taking quizzes, knew the course material better, and hence were less anxious when facing the unit test on which they would receive a grade."

As echoed by Yang et al. (2023a):

  • "...[F]requent testing has little impact on or even reduces (rather than increases) test anxiety. For instance, in a large sample study (over 1,000 college participants), Yang et al. (2020) observed that interpolating tests across a study phase has minimal influence on participants' test anxiety. Szpunar et al. (2013) found that frequent tests significantly relieve test anxiety (for related findings, see Khanna, 2015). Furthermore, in a large-scale survey conducted by Agarwal et al. (2014), 72% of 1,306 middle and 102 high school students reported that frequent quizzes made them less anxious about exams, with only 8% reporting the opposite."

In a separate meta-analysis, Yang et al. (2023b) summarized some other empirical studies observing that quizzes reduced test anxiety:

  • "...[I]n a quasi-experimental study conducted by Piroozmanesh and Imanipour (2018), two classes of nursing undergraduates took a coronary care course, with the experimental class taking class quizzes across the semester, whereas the control class did not take these quizzes. For both classes, students' TA was measured at the beginning of the semester (pretest) and one week before the final exam (posttest). The results showed that although there was minimal difference in TA during the pretest between the two classes, students in the experimental class were much less anxious before the final exam than those in the control class.
    ...
    Szpunar et al. (2013) obtained consistent findings. Specifically, after both the test and the no-test group completed the interim test on Segment 4, both groups were told that they would take a cumulative test on all four segments and were instructed to report how anxious they were about the cumulative test. Consistent with the findings from Piroozmanesh and Imanipour (2018) and Brown and Tallon (2015), Szpunar et al. (2013) observed that participants in the test group were much less anxious than those in the no-test group."

The meta-analysis, which included 24 studies across thousands of participants, ultimately concluded that quizzes reduce test anxiety about as much as they increase academic performance (in both cases, a medium effect size of about 0.5).

  • "The current review integrates results across 24 studies (i.e., 25 effects based on 3,374 participants) to determine the effect of practice tests (quizzes) on test anxiety (TA) and explore potential moderators of the effect. The results show strong Bayesian evidence (BF10>25,000) that practice tests appreciably reduce TA to a medium extent (Hedges' g=-0.52), with minimal evidence of publication bias.
    ...
    In a recent meta-analytic review, Yang et al. (2021) integrated data from over 48,000 students, extracted from 222 classroom studies, to determine whether class quizzes improve students' academic performance. The answer is affirmative: Class quizzes enhance students' academic attainment to a medium extent (Hedges' g=0.50)."

Implementing Appropriate Timed Testing

Granted, in a traditional classroom, it is difficult to keep instructional practices aligned to student proficiencies because each student develops their skills at a different rate. For any given skill, at any given time, some students may be ready for timed testing while others may need additional practice – but the teacher generally does not have enough bandwidth to manage different learning tasks for different students on different skills, and the best they can do is provide learning tasks that feel appropriate for the class “on average.” Of course, those learning tasks will be inappropriate for some students and may lead to decreased learning and increased anxiety.

Advances in educational technology, however, should aim to better adapt the level of instruction to each individual student on each individual skill. Students should initially learn skills during highly-scaffolded lessons, where they are given as much practice as they need to master the skills. Only after they demonstrate their ability to perform the skills should they begin seeing those skills on higher-intensity forms of practice like timed quizzes.

The quizzes should be low-stakes and frequent, and are structured in a way that promotes an “I can do this” growth mindset. Whenever a student misses a question on a quiz, they should receive a remedial review on the corresponding topic so that they can increase their proficiency in that area. If a student does less than “well” on a quiz, then they should also given the opportunity to retake the quiz to demonstrate their improved proficiency. The goal should be not only to give students realistic feedback about their skill proficiency, but also to demonstrate to students that they can improve their proficiency by putting forth effort on their learning tasks.

References

Agarwal, P. K., D’antonio, L., Roediger III, H. L., McDermott, K. B., & McDaniel, M. A. (2014). Classroom-based programs of retrieval practice reduce middle school and high school students’ test anxiety. Journal of applied research in memory and cognition, 3(3), 131-139.

Bacon, F. (1620). The new organon: Or true directions concerning the interpretation of nature.

Bjork, E. L., & Bjork, R. A. (2023). Introducing Desirable Difficulties Into Practice and Instruction: Obstacles and Opportunities. In C. Overson, C. M. Hakala, L. L. Kordonowy, & V. A. Benassi (Eds.), In Their Own Words: What Scholars and Teachers Want You to Know About Why and How to Apply the Science of Learning in Your Academic Setting (pp. 111-21). Society for the Teaching of Psychology.

Brown, P. C., Roediger III, H. L., & McDaniel, M. A. (2014). Make it stick: The science of successful learning. Harvard University Press.

Codding, R. S., Peltier, C., & Campbell, J. (2023). Introducing the Science of Math. TEACHING Exceptional Children, 00400599221121721.

Codding, R. S., VanDerHeyden, A., & Chehayeb, R. (2023). Using Data to Intensify Math Instruction: An Evaluation of the Instructional Hierarchy. Remedial and Special Education, 07419325231194354.

Fuchs, L. S., Bucka, N., Clarke, B., Dougherty, B., Jordan, N. C., Karp, K. S., … & Morgan, S. (2021). Assisting Students Struggling with Mathematics: Intervention in the Elementary Grades. Educator’s Practice Guide. WWC 2021006. What Works Clearinghouse.

Halpern, D. F., & Hakel, M. D. (2003). Applying the Science of Learning. Change, 37.

Hattie, J., & Yates, G. C. (2013). Visible learning and the science of how we learn. Routledge.

Kang, S. H. (2016). Spaced repetition promotes efficient and effective learning: Policy implications for instruction. Policy Insights from the Behavioral and Brain Sciences, 3(1), 12-19.

McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European journal of cognitive psychology, 19(4-5), 494-513.

Pastötter, B., & Frings, C. (2019). The forward testing effect is reliable and independent of learners’ working memory capacity. Journal of cognition, 2(1).

Rowland, C. A. (2014). The effect of testing versus restudy on retention: a meta-analytic review of the testing effect. Psychological bulletin, 140(6), 1432.

VanDerHeyden, A. M., & Codding, R. S. (2020). Belief-Based versus Evidence-Based Math Assessment and Instruction. Communique, 48(5).

Yang, C., Shanks, D. R., Zhao, W., Fan, T., & Luo, L. (2023a). Frequent Quizzing Accelerates Classroom Learning. In C. Overson, C. M. Hakala, L. L. Kordonowy, & V. A. Benassi (Eds.), In Their Own Words: What Scholars and Teachers Want You to Know About Why and How to Apply the Science of Learning in Your Academic Setting (pp. 252-62). Society for the Teaching of Psychology.

Yang, C., Li, J., Zhao, W., Luo, L., & Shanks, D. R. (2023b). Do practice tests (quizzes) reduce or provoke test anxiety? A meta-analytic review. Educational Psychology Review, 35(3), 87.


This post is part of the book The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Jan 2024). Suggested citation: Skycak, J., advised by Roberts, J. (2024). Cognitive Science of Learning: The Testing Effect (Retrieval Practice). In The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Jan 2024). https://justinmath.com/cognitive-science-of-learning-the-testing-effect/