# Precision Spaced Repetition in Hierarchical Knowledge Structures

*Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should "trickle down" to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a proprietary model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit "trickle-down" repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions "knock out" other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors).*

*This post is part of the book The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Oct 2023).* *Suggested citation:* Skycak, J., advised by Roberts, J. (2023). Precision Spaced Repetition in Hierarchical Knowledge Structures. In *The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Oct 2023).* https://justinmath.com/precision-spaced-repetition-in-hierarchical-knowledge-structures/

To my best knowledge, the current literature on mathematical models for determining optimal repetition spacing is limited to the setting of independent flashcard-like tasks. Many subjects, in contrast, are highly connected bodies of knowledge. This introduces significant modeling complexities: for instance, repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition).

To overcome this hurdle at Math Academy, I developed a proprietary model of **Fractional Implicit Repetition (FIRe)** that not only accounts for implicit “trickle-down” repetitions but also

- minimizes the number of reviews by choosing reviews whose implicit repetitions "knock out" other due reviews (like dominos), and
- calibrates the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors -- high student ability speeds up the overall student-topic learning speed, while high topic difficulty slows it down).

While the specific implementation of the model is proprietary, I can talk about the high-level ideas here.

## Repetition Compression

A common criticism of spaced repetition is that it requires an overwhelming number of reviews. While this can be true if spaced repetition is used to learn unrelated flashcards, there is something special about the subject of mathematics that can be used to avoid this issue.

Unlike independent flashcards, mathematics is a hierarchical and highly connected body of knowledge. Whenever a student solves an advanced mathematical problem, there are many simpler mathematical skills that they practice implicitly. In other words, in mathematics, advanced skills tend to **encompass** many simpler skills.

As a result, whenever a student has due reviews, they can typically be compressed into a much smaller set of learning tasks that implicitly covers (i.e. provides repetitions on) all of the due reviews. I call this process **repetition compression**.

To illustrate, consider the following multiplication problem, in which we multiply the two-digit number 39 by the one-digit number 6:

In order to perform the multiplication above, we have to multiply one-digit numbers and add a one-digit number to a two-digit number:

- First, we multiply 6 × 9 = 54. We carry the 5 and write the 4 at the bottom.
- Then, we multiply 6 × 3 = 18 and add 18 + 5 = 23. We write 23 at the bottom.

In other words, *Multiplying a Two-Digit Number by a One-Digit Number* encompasses *Multiplying One-Digit Numbers and Adding a One-Digit Number to a Two-Digit Number*.

We can visualize this using an **encompassing graph** as shown below. The encompassing graph is similar to a prerequisite graph, except the arrows indicate that a simpler topic is encompassed by a more advanced topic. (Encompassed topics are usually prerequisites, but prerequisites are often not fully encompassed.)

Now, suppose that a student is due for reviews on all three of these topics. Because of the encompassings, the only review that they will actually have to do is *Multiplying a Two-Digit Number by a One-Digit Number*. When they complete this review, it will implicitly provide repetitions on the topics that it encompasses because the student has effectively practiced those skills as well.

In general, the more encompassings there are, the fewer reviews are actually required. And mathematics has *lots* of encompassings!

## Learning Efficiency

In physics, nothing can travel faster than the speed of light. It is the theoretical maximum speed that any physical object can attain. A universal constant.

In the context of spaced repetition, there is an analogous concept: **theoretical maximum learning efficiency**. In theory, given a sufficiently encompassed body of knowledge, it is possible to complete all your spaced repetitions without ever having to explicitly review previously-learned material.

As a simple demonstration, consider a sequence of topics where the first topic is fully encompassed by the second, which is fully encompassed by the third, which is fully encompassed by the fourth, and so on.

Each time you learn the next topic, all the topics below receive full implicit repetitions. Assuming that you never run out of new topics to learn, the only reason you would ever need to do an explicit repetition is if you get stuck repeatedly attempting and failing to learn the next topic.

By contrast, there is also a concept of theoretical minimum learning efficiency. This is precisely the setting of independent flashcards – or equivalently, a set of topics without any encompassings.

In this setting, no topic can receive implicit repetitions from any other topic. Every single review must be done explicitly.

It’s important to realize that *a graph does not have to be fully encompassed, or even nearly fully encompassed, for its maximum learning efficiency to approach the theoretical limit.* Even if most relationships between topics are non-encompassing, a considerable minority of encompassings goes a long way.

## Fractional Implicit Repetition (FIRe)

Now that we’ve introduced the idea of repetition compression, let’s extend to the general idea of **fractional implicit repetitions (FIRe)** flowing through an encompassing graph.

FIRe generalizes spaced repetition to hierarchical bodies of knowledge where

- repetitions on advanced topics "trickle down" implicitly to simpler topics through encompassing relationships, and
- simpler topics receiving lots of implicit repetitions discount the repetitions appropriately (since they are often too early to count for full credit towards the next repetition).

**Concrete Example**

As a concrete example, recall that *Multiplying a Two-Digit Number by a One-Digit Number* encompasses *Multiplying One-Digit Numbers* and *Adding a One-Digit Number to a Two-Digit Number*.

If you pass a review on *Multiplying a Two-Digit Number by a One-Digit Number*, then the repetition will also flow backward to reward *Multiplying One-Digit Numbers* and *Adding a One-Digit Number to a Two-Digit Number* because you’ve just shown evidence that you still know how to perform these skills.

On the other hand, if you fail a repetition on *Adding a One-Digit Number to a Two-Digit Number*, then the failed repetition will also flow forward to penalize *Multiplying a Two-Digit Number by a One-Digit Number*. If you can’t add a one-digit number to a two-digit number, then there’s no way you’re able to multiply a two-digit number by a one-digit number. The same thing happens if you fail a repetition on *Multiplying One-Digit Numbers*.

**Visualizing Repetition Flow**

Note that repetition flows can extend many layers deep – not just to directly encompassed topics, but also to “second-order” topics that are encompassed by the encompassed topics, and then to third-order topics that are encompassed by second-order topics, and so on.

Visually, credit travels downwards through the knowledge graph like lightning bolts.

Penalties travel upwards through the knowledge graph like growing trees.

**Partial Encompassings**

FIRe also naturally handles cases of **partial encompassings**, in which only some part of a simpler topic is practiced implicitly in an advanced topic. This occurs more frequently in higher-level math.

For instance, in calculus, advanced integration techniques like integration by parts require you to calculate integrals of a variety of mathematical functions such as polynomials, exponentials, and trigonometric functions. But some of those functions might only appear in a portion of the integration by parts problems. So, if you complete a repetition on integration by parts, you should only receive a fraction of a repetition towards each partially-encompassed topic.

In the diagram below, we label encompassings with numerical **weights** that represent what fraction of each simpler topic is practiced during the more advanced topic. You can loosely interpret each weight as representing the probability that a random problem from the advanced topic encompasses a random problem from the simpler topic.

FIRe extends repetition flows many layers deep through fractional encompassings as well. The end result is that repetitions

- travel unhindered along a "trunk" of full encompassings, and
- fade off along partial encompassings branching outwards from the trunk.

**Setting Encompassing Weights Manually**

Because encompassing weights are set manually, it is not feasible to set an explicit weight between every pair of topics in the graph. We have thousands of topics, so the full pairwise weight matrix would contain tens of millions of entries. How do we set all those weights?

It turns out that it is not actually necessary to explicitly set every weight in the matrix. It suffices to set only the weights for topic pairs where

- the weight has a nontrivial value,
- the weight cannot otherwise be inferred using repetition flow, and
- the distance between the topics in the prerequisite graph is low,

and assume that all other weights not computed implicitly during repetition flow are zero. The reasoning behind these conditions is as follows:

- The magnitude of the weight represents the magnitude of the implicit repetition credit. In order for an implicit repetition to make an impact on staving off explicit reviews, it has to be associated with a nontrivial amount of credit.
- If repetition flow can infer a weight, then nothing will change if the weight is set manually (unless the manually-set weight is being used to correct a value that would otherwise be inferred by repetition flow).
- If two topics are far apart in the prerequisite graph, then their weight will not make much of an impact on staving off reviews, even if it is a full encompassing. In that case, by the time the student reaches the more advanced topic, they will already have done most of their explicit reviews on the simpler topic.

Conveniently, the weights that satisfy the above conditions tend to be those along direct and key prerequisite edges, the number of which scales linearly with the number of topics. This makes it feasible to set encompassing weights manually: one weight for each direct or key prerequisite

Note that it is not unusual to find direct and key prerequisite edges with a weight as low as zero. This can happen when a topic requires some amount of conceptual familiarity with the prerequisite, but does not require the student to actually have mastered the prerequisite to the point of being able to solve problems in the prerequisite topic.

## Calibrating to Individual Students and Topics

The speed at which students learn (and remember what they’ve learned) varies from student to student. It has been shown that stronger students learn faster and remember longer, while weaker students learn slower and forget more quickly (e.g., Kyllonen & Tirre, 1988; Zerr et al., 2018; McDermott & Zerr, 2019). Similarly, learning speed also varies across topics: easier topics are learned faster and remembered longer, while harder topics take longer to learn and are forgotten more quickly. Similarly, learning speed also varies across topics: easier topics are learned faster and remembered longer, while harder topics take longer to learn and are forgotten more quickly.

So, for each student, each topic has a learning speed that depends on the student’s ability and the topic’s difficulty. Student ability and topic difficulty are competing factors – high student ability speeds up the overall student-topic learning speed, while high topic difficulty slows it down. In this view, a student-topic learning speed can be measured as a ratio between

- the speedup due to student ability, and
- the slowdown due to topic difficulty.

Student-topic learning speeds can be used to adjust the speed of the spaced review process.

- For instance, if a student does a review on a topic for which their learning speed is 2x, then that review counts as being worth 2 spaced repetitions.
- Likewise, if a student does a review on a topic for which their learning speed is 0.5x, then that review counts as being worth 0.5 spaced repetitions.

Student-topic learning speeds can also be considered within the fractional implicit repetition algorithm. Weaker students often have trouble absorbing implicit repetitions on difficult topics – they have a harder time generalizing that “what I learned earlier is a special case (or component) of what I’m learning now.”

So, whenever a topic’s spaced repetition process is being slowed down (i.e. whenever the student-topic learning speed is less than 1), it may make sense to discard all incoming implicit repetition credit, instead forcing explicit reviews. In other words, the topic would not receive any implicit repetition credit that would normally “trickle down” from more advanced topics that encompass it.

It makes sense for the decision of whether or not to force explicit reviews to be based on the student-topic learning speed because when a topic’s spaced repetition process is being slowed down, it indicates that the topic is considered rather difficult relative to the student’s ability.

## References

Kyllonen, P. C., & Tirre, W. C. (1988). Individual differences in associative learning and forgetting. *Intelligence, 12*(4), 393-421.

McDermott, K. B., & Zerr, C. L. (2019). Individual differences in learning efficiency. *Current Directions in Psychological Science, 28*(6), 607-613.

Zerr, C. L., Berg, J. J., Nelson, S. M., Fishell, A. K., Savalia, N. K., & McDermott, K. B. (2018). Learning efficiency: Identifying individual differences in learning rate and retention in healthy adults. *Psychological science, 29*(9), 1436-1450.

*This post is part of the book The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Oct 2023).* *Suggested citation:* Skycak, J., advised by Roberts, J. (2023). Precision Spaced Repetition in Hierarchical Knowledge Structures. In *The Math Academy Way: Using the Power of Science to Supercharge Student Learning (Working Draft, Oct 2023).* https://justinmath.com/precision-spaced-repetition-in-hierarchical-knowledge-structures/