I’m Writing a Book on the Science of Learning
With the science of learning, it’s less about “keeping up” with what’s happening, and more about “catching up” with what’s already happened. Read more...
With the science of learning, it’s less about “keeping up” with what’s happening, and more about “catching up” with what’s already happened. Read more...
Most people can tell when their practice is too easy, but what about when your tasks are too hard? That’s often less obvious. Read more...
When you’re knowledgeable/skilled enough to grapple with problems in a more directly applicable field, math gives you the superpower of being able to compress those problem representations into an abstract space where they’re easier to solve. Read more...
A silly bug turned genius hack. Read more...
The only way to argue against the existence of learning loss and grade inflation is to argue against the very idea of measuring learning objectively (i.e., radical constructivism). Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
When students are not given the opportunity to learn math seriously, and are instead presented with watered-down courses and told that they’re doing a great job, they’re being set up for failure later in life when it matters most. Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Research mathematicians are like professional athletes. Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
Some shortcomings in my personal experience self-studying a bunch of math on MIT OpenCourseWare (OCW) when I was in high school, that motivated me to help build Math Academy. These shortcomings are pretty general and would also apply to someone learning from miscellaneous textbooks or Khan Academy. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
Beginners benefit more from direct instruction. Read more...
… and they should be treated as such. Read more...
Our AI system is one of those things that sounds intuitive enough at a high level, but if you start trying to implement it yourself, you quickly run into a mountain of complexity, numerous edge cases, lots of counterintuitive low-level phenomena that take a while to fully wrap your head around. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
Learning is the incremental gain in your ability to perform a tangible, reproducible skill. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
Perform the desired transformation on identity matrix to get a left-multiplier, and maybe transpose the output. Read more...
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field. Read more...
It highlights the aversion that people have to doing hard things. People will do unbelievable mental gymnastics to convince themselves that doing an easy, enjoyable thing that is unrelated to their supposed goal somehow moves the needle more than doing a hard, unpleasant thing that is directly related to said goal. Read more...
In general, when you feel yourself running up against a ceiling in life, the solution is typically to pivot and into a direction where the ceiling is higher. Read more...
Loosely inspired by the German tank problem: several witnesses reported seeing a UFO during the given time intervals, and you want to quantify your certainty regarding when the UFO arrived and when it left. Read more...
No matter what skill is being trained, improving performance is always an effortful process. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
The strongest people lift weights heavy enough to make them feel weak. Read more...
There’s only so much you can hone your math skills by working on a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
Gamification, integrating game-like elements into learning environments, proves effective in increasing student learning, engagement, and enjoyment. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Imitating without analyzing produces a robot / ape who can’t think critically; analyzing without imitating produces a critic who can’t act on their own advice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
Six weeks of pure review and six official practice exams. Read more...
It’s easier to run into roadblocks, but also easier to maintain what you’ve learned. Read more...
Passive consumption. Lack of depth. Lack of rigorous assessments. Failing upwards. Lack of skill development. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
It’s like going to the gym without a solid workout plan in place. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
… is to present a problem where known simpler techniques fail. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
Not everybody can learn every level of math, but most people can learn the basics. In practice, however, few people actually reach their full mathematical potential because they get knocked off course early on by factors such as missing foundations, ineffective practice habits, inability or unwillingness to engage in additional practice, or lack of motivation. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
Students and teachers are often not aligned with the goal of maximizing learning, which means that in the absence of accountability and incentives, classrooms are pulled towards a state of mediocrity. Accountability and incentives are typically absent in education, which leads to a “tragedy of the commons” situation where students pass courses (often with high grades) despite severely lacking knowledge of the content. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
Many students who pattern-match will tend to prefer solutions requiring fewer and simpler operations, especially if those solutions yield ballpark-reasonable results. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not true. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Q: Draw a 10 x 10 square grid. How many squares are there in total? Not just 1 x 1 squares, but also 2 x 2 squares, 3 x 3 squares, and so on. A: The total number of square shapes is the total sum of square numbers 1 + 4 + 9 + 16 + … + 100. Read more...
First, you want to form a habit. Second, you want to operate at peak productivity during your session. Third, you want to minimize the amount you forget between sessions. Read more...
Answer: It’s not very useful (not in practice, not in theory). Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
Everyone has some level of abstraction beyond which they are incapable of engaging in first-principles reasoning. That level is different for everyone, and it’s not a hard threshold, but beyond it the time and mental effort required to perform first-principles reasoning skyrockets until first-principles reasoning becomes completely infeasible. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
Drawing –> Latex commands –> ChatGPT summary –> Google more info Read more...
While some may view Feynman-style pedagogy as supporting inclusive learning for all students across varying levels of ability, Feynman himself acknowledged that his methods only worked for the top 10% of his students. Read more...
Type I pairs with the variable that runs vertically in the usual representation of the coordinate system. The remaining types are paired with the rest of the variables in ascending order. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Two subtypes of coders that I watched students grow into. Read more...
Effective learning strategies sometimes go against our human instincts about conversation. Read more...
A way to visualize some cognitive learning strategies. Read more...
… are summarized in the following table. Read more...
An aha moment with object-oriented programming. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
How to avoid some of the most common pitfalls leading to ugly LaTeX. Read more...
The behavior of a multivariable function can be highly specific to the path taken. Read more...
Every inscribed triangle whose hypotenuse is a diameter is a right triangle. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
A prototype web app to automatically assist students in self-correcting small errors and minor misconceptions. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
Category theory provides a language for explicitly describing indirect relationships in graphs. Read more...
Framing complex systems in the language of category theory. Read more...
The main ideas behind computers can be understood by anyone. Read more...
The brain is a neuronal network integrating specialized subsystems that use local competition and thresholding to sparsify input, spike-timing dependent plasticity to learn inference, and layering to implement hierarchical predictive learning. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
Montaigne’s education, strictly dictated by his parents and university studies, resulted in an isolative work with scholarly impact but limited public reach. Conversely, Benjamin Franklin’s goal-oriented self-teaching led to influential creations and roles benefiting his community and nation. Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
A workbook I created to explain the math and physics behind an Iron Man suit to a student who was interested in the comics / movies. Read more...
A workbook I created to explain the math and physics behind an egg drop experiment to a student who was interested in Lord of the Rings and Star Wars. Read more...
And a proof via double induction. Read more...
A brief overview of sound waves and how they interact with things. Read more...
A brief overview of the experimental search for dark matter (XENON, CDMS, PICASSO, COUPP). Read more...
Mass discrepancies in galaxies and clusters, cosmic background radiation, the structure of the universe, and big bang nucleosynthesis’s impact on baryon density. Read more...
With the science of learning, it’s less about “keeping up” with what’s happening, and more about “catching up” with what’s already happened. Read more...
Most people can tell when their practice is too easy, but what about when your tasks are too hard? That’s often less obvious. Read more...
Accumulating mathematical knowledge gaps can lead students to reach a tipping point where further learning becomes overwhelming, ultimately causing them to abandon math entirely. Read more...
The only way to argue against the existence of learning loss and grade inflation is to argue against the very idea of measuring learning objectively (i.e., radical constructivism). Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
If you depend on a massive base of learners, most of whom are unserious, that puts hard constraints on how you teach. You have to employ ineffective learning strategies that do not repel unserious students. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
When students are not given the opportunity to learn math seriously, and are instead presented with watered-down courses and told that they’re doing a great job, they’re being set up for failure later in life when it matters most. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
Research mathematicians are like professional athletes. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
Some shortcomings in my personal experience self-studying a bunch of math on MIT OpenCourseWare (OCW) when I was in high school, that motivated me to help build Math Academy. These shortcomings are pretty general and would also apply to someone learning from miscellaneous textbooks or Khan Academy. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
Beginners benefit more from direct instruction. Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets us move on to other components of the AI that are just as important. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
If all the knowledge you show up with is high school math and AP Calculus, then you’re going to get your ass handed to you. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
Learning is the incremental gain in your ability to perform a tangible, reproducible skill. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
Solving equations feels smooth when basic arithmetic is automatic – it’s like moving puzzle pieces around, and you just need to identify how they fit together. But without automaticity on basic arithmetic, each puzzle piece is a heavy weight. You struggle to move them at all, much less figure out where they’re supposed to go. Read more...
It highlights the aversion that people have to doing hard things. People will do unbelievable mental gymnastics to convince themselves that doing an easy, enjoyable thing that is unrelated to their supposed goal somehow moves the needle more than doing a hard, unpleasant thing that is directly related to said goal. Read more...
In general, when you feel yourself running up against a ceiling in life, the solution is typically to pivot and into a direction where the ceiling is higher. Read more...
But in talent development, the optimization problem is clear: an individual’s performance is to be maximized, so the methods used during practice are those that most efficiently convert effort into performance improvements. Read more...
No matter what skill is being trained, improving performance is always an effortful process. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
The strongest people lift weights heavy enough to make them feel weak. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
Gamification, integrating game-like elements into learning environments, proves effective in increasing student learning, engagement, and enjoyment. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
Six weeks of pure review and six official practice exams. Read more...
It’s easier to run into roadblocks, but also easier to maintain what you’ve learned. Read more...
Passive consumption. Lack of depth. Lack of rigorous assessments. Failing upwards. Lack of skill development. Read more...
It’s like going to the gym without a solid workout plan in place. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
Not everybody can learn every level of math, but most people can learn the basics. In practice, however, few people actually reach their full mathematical potential because they get knocked off course early on by factors such as missing foundations, ineffective practice habits, inability or unwillingness to engage in additional practice, or lack of motivation. Read more...
Learning math early guards you against numerous academic risks and opens all kinds of doors to career opportunities. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
Students and teachers are often not aligned with the goal of maximizing learning, which means that in the absence of accountability and incentives, classrooms are pulled towards a state of mediocrity. Accountability and incentives are typically absent in education, which leads to a “tragedy of the commons” situation where students pass courses (often with high grades) despite severely lacking knowledge of the content. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
Computing spatial relationships between nodes when edges no longer represent unit distances. Read more...
Using traversals to understand spatial relationships between nodes in graphs. Read more...
Graphs show up all the time in computer science, so it’s important to know how to work with them. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
Under the hood, dictionaries are hash tables. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
You can use the RREF algorithm to compute determinants much faster than with the recursive cofactor expansion method. Read more...
We can use arrays to implement matrices and their associated mathematical operations. Read more...
Merge sort and quicksort are generally faster than selection, bubble, and insertion sort. And unlike counting sort, they are not susceptible to blowup in the amount of memory required. Read more...
Some of the simplest methods for sorting items in arrays. Read more...
Just like single-variable gradient descent, except that we replace the derivative with the gradient vector. Read more...
We take an initial guess as to what the minimum is, and then repeatedly use the gradient to nudge that guess further and further “downhill” into an actual minimum. Read more...
Bisection search involves repeatedly moving one bound halfway to the other. The Newton-Raphson method involves repeatedly moving our guess to the root of the tangent line. Read more...
Backtracking can drastically cut down the number of possibilities that must be checked during brute force. Read more...
Brute force search involves trying every single possibility. Read more...
Implementing the Cartesian product provides good practice working with arrays. Read more...
How to sample from a discrete probability distribution. Read more...
Estimating probabilities by simulating a large number of random experiments. Read more...
Sequences where each term is a function of the previous terms. Read more...
There are other number systems that use more or fewer than ten characters. Read more...
It’s assumed that you’ve had some basic exposure to programming. Read more...
A prototype web app to automatically assist students in self-correcting small errors and minor misconceptions. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
Rather than duplicating such code each time we want to use it, it is more efficient to store the code in a function. Read more...
We often wish to tell the computer instructions involving the words “if,” “while,” and “for.” Read more...
We can store many related pieces of data within a single variable called a data structure. Read more...
We can store and manipulate data in the form of variables. Read more...
Answer: It’s not very useful (not in practice, not in theory). Read more...
Hidden inside of every quadratic, there is a perfect square. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
Compositions of functions consist of multiple functions linked together, where the output of one function becomes the input of another function. Read more...
Inverting a function entails reversing the outputs and inputs of the function. Read more...
When a function is reflected, it flips across one of the axes to become its mirror image. Read more...
When a function is rescaled, it is stretched or compressed along one of the axes, like a slinky. Read more...
When a function is shifted, all of its points move vertically and/or horizontally by the same amount. Read more...
A piecewise function is pieced together from multiple different functions. Read more...
Trigonometric functions represent the relationship between sides and angles in right triangles. Read more...
Absolute value represents the magnitude of a number, i.e. its distance from zero. Read more...
Exponential functions have variables as exponents. Logarithms cancel out exponentiation. Read more...
Radical functions involve roots: square roots, cube roots, or any kind of fractional exponent in general. Read more...
A slant asymptote is a slanted line that arises from a linear term in the proper form of a rational function. Read more...
If we choose one input on each side of an asymptote, we can tell which section of the plane the function will occupy. Read more...
Vertical asymptotes are vertical lines that a function approaches but never quite reaches. Read more...
Rational functions can have a form of end behavior in which they become flat, approaching (but never quite reaching) a horizontal line known as a horizontal asymptote. Read more...
Polynomial long division works the same way as the long division algorithm that’s familiar from simple arithmetic. Read more...
We can sketch the graph of a polynomial using its end behavior and zeros. Read more...
The rational roots theorem can help us find zeros of polynomials without blindly guessing. Read more...
The zeros of a polynomial are the inputs that cause it to evaluate to zero. Read more...
The end behavior of a polynomial refers to the type of output that is produced when we input extremely large positive or negative values. Read more...
To solve a system of inequalities, we need to solve each individual inequality and find where all their solutions overlap. Read more...
Quadratic inequalities are best visualized in the plane. Read more...
When a linear equation has two variables, the solution covers a section of the coordinate plane. Read more...
An inequality is similar to an equation, but instead of saying two quantities are equal, it says that one quantity is greater than or less than another. Read more...
Systems of quadratic equations can be solved via substitution. Read more...
To easily graph a quadratic equation, we can convert it to vertex form. Read more...
Completing the square helps us gain a better intuition for quadratic equations and understand where the quadratic formula comes from. Read more...
To solve hard-to-factor quadratic equations, it’s easiest to use the quadratic formula. Read more...
Factoring is a method for solving quadratic equations. Read more...
Quadratic equations are similar to linear equations, except that they contain squares of a single variable. Read more...
A linear system consists of multiple linear equations, and the solution of a linear system consists of the pairs that satisfy all of the equations. Read more...
Standard form makes it easy to see the intercepts of a line. Read more...
An easy way to write the equation of a line if we know the slope and a point on a line. Read more...
Introducing linear equations in two variables. Read more...
Loosely speaking, a linear equation is an equality statement containing only addition, subtraction, multiplication, and division. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
A function is a scribble that crosses each vertical line only once. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
An intuitive derivation. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
Many differential equations don’t have solutions that can be expressed in terms of finite combinations of familiar functions. However, we can often solve for the Taylor series of the solution. Read more...
To find the Taylor series of complicated functions, it’s often easiest to manipulate the Taylor series of simpler functions. Read more...
Many non-polynomial functions can be represented by infinite polynomials. Read more...
Various tricks for determining whether a series converges or diverges. Read more...
A geometric series is a sum where each term is some constant times the previous term. Read more...
When we know the solutions of a linear differential equation with constant coefficients and right hand side equal to zero, we can use variation of parameters to find a solution when the right hand side is not equal to zero. Read more...
Integrating factors can be used to solve first-order differential equations with non-constant coefficients. Read more...
Undetermined coefficients can help us find a solution to a linear differential equation with constant coefficients when the right hand side is not equal to zero. Read more...
Given a linear differential equation with constant coefficients and a right hand side of zero, the roots of the characteristic polynomial correspond to solutions of the equation. Read more...
Non-separable differential equations can be sometimes converted into separable differential equations by way of substitution. Read more...
When faced with a differential equation that we don’t know how to solve, we can sometimes still approximate the solution. Read more...
The simplest differential equations can be solved by separation of variables, in which we move the derivative to one side of the equation and take the antiderivative. Read more...
Improper integrals have bounds or function values that extend to positive or negative infinity. Read more...
We can apply integration by parts whenever an integral would be made simpler by differentiating some expression within the integral, at the cost of anti-differentiating another expression within the integral. Read more...
Substitution involves condensing an expression of into a single new variable, and then expressing the integral in terms of that new variable. Read more...
To evaluate a definite integral, we find the antiderivative, evaluate it at the indicated bounds, and then take the difference. Read more...
The antiderivative of a function is a second function whose derivative is the first function. Read more...
When a limit takes the indeterminate form of zero divided by zero or infinity divided by infinity, we can differentiate the numerator and denominator separately without changing the actual value of the limit. Read more...
We can interpret the derivative as an approximation for how a function’s output changes, when the function input is changed by a small amount. Read more...
Derivatives can be used to find a function’s local extreme values, its peaks and valleys. Read more...
There are convenient rules the derivatives of exponential, logarithmic, trigonometric, and inverse trigonometric functions. Read more...
Given a sum, we can differentiate each term individually. But why are we able to do this? Does multiplication work the same way? What about division? Read more...
When taking derivatives of compositions of functions, we can ignore the inside of a function as long as we multiply by the derivative of the inside afterwards. Read more...
There are some patterns that allow us to compute derivatives without having to compute the limit of the difference quotient. Read more...
The derivative of a function is the function’s slope at a particular point, and can be computed as the limit of the difference quotient. Read more...
Various tricks for evaluating tricky limits. Read more...
The limit of a function, as the input approaches some value, is the output we would expect if we saw only the surrounding portion of the graph. Read more...
It comes out to roughly a fortieth of that of a truck. Read more...
String art works because the strings are tangent lines to a curve. Read more...
Calculus can show us how our intuition can fail us, a common theme in philosophy. Read more...
Nobody came out of the dispute well. Read more...
When Joseph Fourier first introduced Fourier series, they gave mathematicians nightmares. Read more...
Deriving the “Pert” formula. Read more...
If we know the revenue and costs associated with producing any number of units, then we can use calculus to figure out the number of units to produce for maximum profit. Read more...
Calculus can be used to find the parameters that minimize a function. Read more...
Physics engines use calculus to periodically updates the locations of objects. Read more...
Introducing Kajiya’s rendering equation. Read more...
Deriving the ideal rocket equation. Read more...
Deriving the Gompertz function. Read more...
Understanding why even slight narrowing of arteries can pose such a big problem to blood flow. Read more...
Measuring volume of blood the heart pumps out into the aorta per unit time. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
Integrals give the area under a portion of a function. Read more...
The derivative tells the steepness of a function at a given point, kind of like a carpenter’s level. Read more...
The limit of a function is the height where it looks like the scribble is going to hit a particular vertical line. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
If you start to flail (or, more subtly, doubt yourself and lose interest) after jumping into ML without a baseline level of foundational knowledge, then you need to put your ego aside and re-allocate your time into shoring up your foundations. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
If you know your single-variable calculus, then it’s about 70 hours on Math Academy. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
The type of ensemble model that wins most data science competitions is the stacked model, which consists of an ensemble of entirely different species of models together with some combiner algorithm. Read more...
Decision trees are able to model nonlinear data while remaining interpretable. Read more...
NNs are similar to SVMs in that they project the data to a higher-dimensional space and fit a hyperplane to the data in the projected space. However, whereas SVMs use a predetermined kernel to project the data, NNs automatically construct their own projection. Read more...
A Support Vector Machine (SVM) computes the “best” separation between classes as the maximum-margin hyperplane. Read more...
In linear regression, we model the target as a random variable whose expected value depends on a linear combination of the predictors (including a bias term). Read more...
To visualize the relationship between the MAP and MLE estimations, one can imagine starting at the MLE estimation, and then obtaining the MAP estimation by drifting a bit towards higher density in the prior distribution. Read more...
Naive Bayes classification naively assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Read more...
You haven’t learned unless you’re able to consistently reproduce the information you consumed and use it to solve problems. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Long-term learning is represented by the creation of strategic electrical wiring between neurons. Read more...
Research indicates the best way to improve your problem-solving ability in any domain is simply by acquiring more foundational skills in that domain. The way you increase your ability to make mental leaps is not actually by jumping farther, but rather, by building bridges that reduce the distance you need to jump. Yet, higher math textbooks & courses seem to focus on trying to train jumping distance instead of bridge-building. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
Challenge problems are not a good use of time until you’ve developed the foundational skills that are necessary to grapple with these problems in a productive and timely fashion. Read more...
Beginners benefit more from direct instruction. Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
It’s actually the opposite – to get students actively retrieving information from memory, while minimizing their cognitive load. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
By periodically revisiting content, a spiral curriculum periodically restores forgotten knowledge and leverages the spacing effect to slow the decay of that knowledge. Spaced repetition takes this line of thought to its fullest extent by fully optimizing the review process. Read more...
While there is plenty of room for teachers to make better use of cognitive learning strategies in the classroom, teachers are victims of circumstance in a profession lacking effective accountability and incentive structures, and the end result is that students continue to receive mediocre educational experiences. Given a sufficient degree of accountability and incentives, there is no law of physics preventing a teacher from putting forth the work needed to deliver an optimal learning experience to a single student. However, in the absence of technology, it is impossible for a single human teacher to deliver an optimal learning experience to a classroom of many students with heterogeneous knowledge profiles, each of whom needs to work on different types of problems and receive immediate feedback on each of their attempts. This is why technology is necessary. Read more...
The testing effect (or the retrieval practice effect) emphasizes that recalling information from memory, rather than repeated reading, enhances learning. It can be combined with spaced repetition to produce an even more potent learning technique known as spaced retrieval practice. Read more...
Interleaving (or mixed practice) involves spreading minimal effective doses of practice across various skills, in contrast to blocked practice, which involves extensive consecutive repetition of a single skill. Blocked practice can give a false sense of mastery and fluency because it allows students to settle into a robotic rhythm of mindlessly applying one type of solution to one type of problem. Interleaving, on the other hand, creates a “desirable difficulty” that promotes vastly superior retention and generalization, making it a more effective review strategy. But despite its proven efficacy, interleaving faces resistance in classrooms due to a preference for practice that feels easier and appears to produce immediate performance gains, even if those performance gains quickly vanish afterwards and do not carry over to test performance. Read more...
When reviews are spaced out or distributed over multiple sessions (as opposed to being crammed or massed into a single session), memory is not only restored, but also further consolidated into long-term storage, which slows its decay. This is known as the spacing effect. A profound consequence of the spacing effect is that the more reviews are completed (with appropriate spacing), the longer the memory will be retained, and the longer one can wait until the next review is needed. This observation gives rise to a systematic method for reviewing previously-learned material called spaced repetition (or distributed practice). A repetition is a successful review at the appropriate time. Read more...
Layering is the act of continually building on top of existing knowledge – that is, continually acquiring new knowledge that exercises prerequisite or component knowledge. This causes existing knowledge to become more ingrained, organized, and deeply understood, thereby increasing the structural integrity of a student’s knowledge base and making it easier to assimilate new knowledge. Read more...
Associative interference occurs when related knowledge interferes with recall. It is more likely to occur when highly related pieces of knowledge are learned simultaneously or in close succession. However, the effects of interference can be mitigated by teaching dissimilar concepts simultaneously and spacing out related pieces of knowledge over time. Read more...
Automaticity is the ability to perform low-level skills without conscious effort. Analogous to a basketball player effortlessly dribbling while strategizing, automaticity allows individuals to avoid spending limited cognitive resources on low-level tasks and instead devote those cognitive resources to higher-order reasoning. In this way, automaticity is the gateway to expertise, creativity, and general academic success. However, insufficient automaticity, particularly in basic skills, inflates the cognitive load of tasks, making it exceedingly difficult for students to learn and perform. Read more...
Different students have different working memory capacities. When the cognitive load of a learning task exceeds a student’s working memory capacity, the student experiences cognitive overload and is not able to complete the task. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Effortful processes like testing, repetition, and computation are essential parts of effective learning, and competition is often helpful. Read more...
The most effective learning techniques require substantial cognitive effort from students and typically do not emulate what experts do in the professional workplace. Direct instruction is necessary to maximize student learning, whereas unguided instruction and group projects are typically very inefficient. Read more...
Different people generally have different working memory capacities and learn at different rates, but people do not actually learn better in their preferred “learning style.” Instead, different people need the same form of practice but in different amounts. Read more...
In terms of improving educational outcomes, science is not where the bottleneck is. The bottleneck is in practice. The science of learning has advanced significantly over the past century, yet the practice of education has barely changed. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
Effective learning strategies sometimes go against our human instincts about conversation. Read more...
A way to visualize some cognitive learning strategies. Read more...
There’s only so much you can hone your math skills by working on a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Imitating without analyzing produces a robot / ape who can’t think critically; analyzing without imitating produces a critic who can’t act on their own advice. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
… is to present a problem where known simpler techniques fail. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
Everyone has some level of abstraction beyond which they are incapable of engaging in first-principles reasoning. That level is different for everyone, and it’s not a hard threshold, but beyond it the time and mental effort required to perform first-principles reasoning skyrockets until first-principles reasoning becomes completely infeasible. Read more...
Why it’s common for students to pass courses despite severely lacking knowledge of the content. Read more...
If you look at the kinds of math that most quantitative professionals use on a daily basis, competition math tricks don’t show up anywhere. But what does show up everywhere is university-level math subjects. Read more...
While some may view Feynman-style pedagogy as supporting inclusive learning for all students across varying levels of ability, Feynman himself acknowledged that his methods only worked for the top 10% of his students. Read more...
It’s centered around political ideology rather than the science of learning. Read more...
Two subtypes of coders that I watched students grow into. Read more...
Perform the desired transformation on identity matrix to get a left-multiplier, and maybe transpose the output. Read more...
The matrix exponential can be defined as a power series and used to solve systems of linear differential equations. Read more...
Jordan form provides a guaranteed backup plan for exponentiating matrices that are non-diagonalizable. Read more...
Matrix diagonalization can be applied to construct closed-form expressions for recursive sequences. Read more...
The eigenvectors of a matrix are those vectors that the matrix simply rescales, and the factor by which an eigenvector is rescaled is called its eigenvalue. These concepts can be used to quickly calculate large powers of matrices. Read more...
The inverse of a matrix is a second matrix which undoes the transformation of the first matrix. Read more...
Every square matrix can be decomposed into a product of rescalings and shears. Read more...
How to multiply a matrix by another matrix. Read more...
Matrices are vectors whose components are themselves vectors. Read more...
Solving linear systems can sometimes be a necessary component of solving nonlinear systems. Read more...
Shearing can be used to express the solution of a linear system using ratios of volumes, and also to compute volumes themselves. Read more...
Rich intuition about why the number of solutions to a square linear system is governed by the volume of the parallelepiped formed by the coefficient vectors. Read more...
N-dimensional volume generalizes the idea of the space occupied by an object. We can think about N-dimensional volume as being enclosed by N-dimensional vectors. Read more...
If we interpret linear systems as sets of vectors, then elimination corresponds to vector reduction. Read more...
The span of a set of vectors consists of all vectors that can be made by adding multiples of vectors in the set. We can often reduce a set of vectors to a simpler set with the same span. Read more...
A line starts at an initial point and proceeds straight in a constant direction. A plane is a flat sheet that makes a right angle with some particular vector. Read more...
What does it mean to multiply a vector by another vector? Read more...
N-dimensional space consists of points that have N components. Read more...
If you depend on a massive base of learners, most of whom are unserious, that puts hard constraints on how you teach. You have to employ ineffective learning strategies that do not repel unserious students. Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...
Many educators think that the makeup of every year in a student’s education should be balanced the same way across Bloom’s taxonomy, whereas Bloom’s 3-stage talent development process suggests that the time allocation should change drastically as a student progresses through their education. Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets us move on to other components of the AI that are just as important. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
It’s the act of successfully retrieving fuzzy memory, not clear memory, that extends the memory duration. Read more...
To transfer information into long-term memory, you need to practice retrieving it without assistance. Read more...
I’d start off with some introductory course that covers the very basics of coding in some language that is used by many professional programmers but where the syntax reads almost like plain English and lower-level details like memory management are abstracted away. Then, I’d jump right into building board games and strategic game-playing agents (so a human can play against the computer), starting with simple games (e.g. tic-tac-toe) and working upwards from there (maybe connect 4 next, then checkers, and so on). Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
If you look at the kinds of math that most quantitative professionals use on a daily basis, competition math tricks don’t show up anywhere. But what does show up everywhere is university-level math subjects. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
It’s centered around political ideology rather than the science of learning. Read more...
Good problem = intersection between your own interests/talents, the realm of what’s feasible, and the desires of the external world. Read more...
Stuff you don’t find in math textbooks. Read more...
An intuitive derivation. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
Computing spatial relationships between nodes when edges no longer represent unit distances. Read more...
Using traversals to understand spatial relationships between nodes in graphs. Read more...
Graphs show up all the time in computer science, so it’s important to know how to work with them. Read more...
It comes out to roughly a fortieth of that of a truck. Read more...
String art works because the strings are tangent lines to a curve. Read more...
Calculus can show us how our intuition can fail us, a common theme in philosophy. Read more...
Deriving the “Pert” formula. Read more...
If we know the revenue and costs associated with producing any number of units, then we can use calculus to figure out the number of units to produce for maximum profit. Read more...
Calculus can be used to find the parameters that minimize a function. Read more...
Physics engines use calculus to periodically updates the locations of objects. Read more...
Introducing Kajiya’s rendering equation. Read more...
Deriving the ideal rocket equation. Read more...
Deriving the Gompertz function. Read more...
Understanding why even slight narrowing of arteries can pose such a big problem to blood flow. Read more...
Measuring volume of blood the heart pumps out into the aorta per unit time. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
Equations involving compositions of trigonometric functions can create wild patterns in the plane. Read more...
Lissajous curves use sine functions to create interesting patterns in the plane. Read more...
Absolute value graphs can be rotated to draw stars. Read more...
Non-euclidean ellipses can be used to draw starry-eye sunglasses. Read more...
Euclidean ellipses can be combined with sine wave shading to form three-dimensional shells. Read more...
High-frequency sine waves can be used to draw shaded regions. Read more...
Roots can be used to draw deer. Read more...
Sine waves can be used to draw scales on a fish. Read more...
Parabolas can be used to draw a fish. Read more...
Absolute value can be used to draw a person. Read more...
Slanted lines can be used to draw a spider web. Read more...
Horizontal and vertical lines can be used to draw a castle. Read more...
A silly bug turned genius hack. Read more...
The type of ensemble model that wins most data science competitions is the stacked model, which consists of an ensemble of entirely different species of models together with some combiner algorithm. Read more...
Decision trees are able to model nonlinear data while remaining interpretable. Read more...
NNs are similar to SVMs in that they project the data to a higher-dimensional space and fit a hyperplane to the data in the projected space. However, whereas SVMs use a predetermined kernel to project the data, NNs automatically construct their own projection. Read more...
A Support Vector Machine (SVM) computes the “best” separation between classes as the maximum-margin hyperplane. Read more...
In linear regression, we model the target as a random variable whose expected value depends on a linear combination of the predictors (including a bias term). Read more...
To visualize the relationship between the MAP and MLE estimations, one can imagine starting at the MLE estimation, and then obtaining the MAP estimation by drifting a bit towards higher density in the prior distribution. Read more...
Naive Bayes classification naively assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Read more...
An idea for a paper that I don’t currently have the bandwidth to write. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not true. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
And a proof via double induction. Read more...
When a limit takes the indeterminate form of zero divided by zero or infinity divided by infinity, we can differentiate the numerator and denominator separately without changing the actual value of the limit. Read more...
We can interpret the derivative as an approximation for how a function’s output changes, when the function input is changed by a small amount. Read more...
Derivatives can be used to find a function’s local extreme values, its peaks and valleys. Read more...
There are convenient rules the derivatives of exponential, logarithmic, trigonometric, and inverse trigonometric functions. Read more...
Given a sum, we can differentiate each term individually. But why are we able to do this? Does multiplication work the same way? What about division? Read more...
When taking derivatives of compositions of functions, we can ignore the inside of a function as long as we multiply by the derivative of the inside afterwards. Read more...
There are some patterns that allow us to compute derivatives without having to compute the limit of the difference quotient. Read more...
The derivative of a function is the function’s slope at a particular point, and can be computed as the limit of the difference quotient. Read more...
Various tricks for evaluating tricky limits. Read more...
The limit of a function, as the input approaches some value, is the output we would expect if we saw only the surrounding portion of the graph. Read more...
A silly bug turned genius hack. Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
It’s centered around political ideology rather than the science of learning. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
Accumulating mathematical knowledge gaps can lead students to reach a tipping point where further learning becomes overwhelming, ultimately causing them to abandon math entirely. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
If you understand the interplay between working memory and long-term memory, then then you can actually derive – from first principles – the methods of effective teaching. Read more...
If all the knowledge you show up with is high school math and AP Calculus, then you’re going to get your ass handed to you. Read more...
There are numerous cognitive learning strategies that 1) can be used to massively improve learning, 2) have been reproduced so many times they might as well be laws of physics, and 3) connect all the way down to the mechanics of what’s going on in the brain. Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
Solving equations feels smooth when basic arithmetic is automatic – it’s like moving puzzle pieces around, and you just need to identify how they fit together. But without automaticity on basic arithmetic, each puzzle piece is a heavy weight. You struggle to move them at all, much less figure out where they’re supposed to go. Read more...
But in talent development, the optimization problem is clear: an individual’s performance is to be maximized, so the methods used during practice are those that most efficiently convert effort into performance improvements. Read more...
Learning math early guards you against numerous academic risks and opens all kinds of doors to career opportunities. Read more...
Spaced repetition is complicated in hierarchical bodies of knowledge, like mathematics, because repetitions on advanced topics should “trickle down” to update the repetition schedules of simpler topics that are implicitly practiced (while being discounted appropriately since these repetitions are often too early to count for full credit towards the next repetition). However, I developed a model of Fractional Implicit Repetition (FIRe) that not only accounts for implicit “trickle-down” repetitions but also minimizes the number of reviews by choosing reviews whose implicit repetitions “knock out” other due reviews (like dominos), and calibrates the speed of the spaced repetition process to each individual student on each individual topic (student ability and topic difficulty are competing factors). Read more...
The underlying principle that it all boils down to is deliberate practice. Read more...
Math gets hard for different students at different levels. If you don’t have worked examples to help carry you through once math becomes hard for you, then every problem basically blows up into a “research project” for you. Sometimes people advocate for unguided struggle as a way to improve general problem-solving ability, but this idea lacks empirical support. Worked examples won’t prevent you from developing deep understanding (actually, it’s the opposite: worked examples can help you quickly layer on more skills, which forces a structural integrity in the lower levels of your knowledge). Even if you decide against using worked examples for now, continually re-evaluate to make sure you’re getting enough productive training volume. Read more...
Research mathematicians are like professional athletes. Read more...
“…[D]eliberate practice requires effort and is not inherently enjoyable. Individuals are motivated to practice because practice improves performance.” Read more...
Many educators think that the makeup of every year in a student’s education should be balanced the same way across Bloom’s taxonomy, whereas Bloom’s 3-stage talent development process suggests that the time allocation should change drastically as a student progresses through their education. Read more...
… and they should be treated as such. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
First, fun and exciting playtime. Then, intense and strenuous skill development. Finally, developing one’s individual style while pushing the boundaries of the field. Read more...
Talent development is not only different from schooling, but in many cases completely orthogonal to schooling. Read more...
The average tutored student performed better than 98% of students in the traditional class. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
Under the hood, dictionaries are hash tables. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
Guess some initial clusters in the data, and then repeatedly update the guesses to make the clusters more cohesive. Read more...
You can use the RREF algorithm to compute determinants much faster than with the recursive cofactor expansion method. Read more...
We can use arrays to implement matrices and their associated mathematical operations. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
In many real-life situations, there is more than one input variable that controls the output variable. Read more...
Gradient descent can help us avoid pitfalls that occur when fitting nonlinear models using the pseudoinverse. Read more...
Just because model appears to match closely with points in the data set, does not necessarily mean it is a good model. Read more...
Transforming nonlinear functions so that we can fit them using the pseudoinverse. Read more...
Exploring the most general class of functions that can be fit using the pseudoinverse. Read more...
Using matrix algebra to fit simple functions to data sets. Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
Demonstrating an open-source implementation of persistent homology techniques in the TDA package for R. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
At Aunalytics, Mapper outperformed hierarchical clustering in providing granular insights. Read more...
Ayasdi developed commercial Mapper software and sells a subscription service to clients who wish to create topological network visualizations of their data. Read more...
Demonstrating an open-source implementation of Mapper in the TDAmapper package for R. Read more...
Representing a data space’s topology by converting it into a network. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
Nobody came out of the dispute well. Read more...
When Joseph Fourier first introduced Fourier series, they gave mathematicians nightmares. Read more...
When we know the solutions of a linear differential equation with constant coefficients and right hand side equal to zero, we can use variation of parameters to find a solution when the right hand side is not equal to zero. Read more...
Integrating factors can be used to solve first-order differential equations with non-constant coefficients. Read more...
Undetermined coefficients can help us find a solution to a linear differential equation with constant coefficients when the right hand side is not equal to zero. Read more...
Given a linear differential equation with constant coefficients and a right hand side of zero, the roots of the characteristic polynomial correspond to solutions of the equation. Read more...
Non-separable differential equations can be sometimes converted into separable differential equations by way of substitution. Read more...
When faced with a differential equation that we don’t know how to solve, we can sometimes still approximate the solution. Read more...
The simplest differential equations can be solved by separation of variables, in which we move the derivative to one side of the equation and take the antiderivative. Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
A walkthrough of solving Tower of Hanoi using the approach of one of the earliest AI systems. Read more...
Media outlets often make the mistake of anthropomorphizing or attributing human-like characteristics to computer programs. Read more...
As computation power increased, neural networks began to take center stage in AI. Read more...
Expert systems stored “if-then” rules derived from the knowledge of experts. Read more...
Framing reasoning as searching through a maze of actions for a sequence that achieves the desired end goal. Read more...
Turing test, games, hype, narrow vs general AI. Read more...
834 XP = 834 minutes = 14 hours of work in a single day. You’re probably wondering, what kind of person does that much math in a day? Time for a little story. Read more...
Learning math with little computation is like learning basketball with little practice on dribbling & ball handling techniques. Read more...
Some shortcomings in my personal experience self-studying a bunch of math on MIT OpenCourseWare (OCW) when I was in high school, that motivated me to help build Math Academy. These shortcomings are pretty general and would also apply to someone learning from miscellaneous textbooks or Khan Academy. Read more...
Our AI system is one of those things that sounds intuitive enough at a high level, but if you start trying to implement it yourself, you quickly run into a mountain of complexity, numerous edge cases, lots of counterintuitive low-level phenomena that take a while to fully wrap your head around. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
A convenient technique for computing gradients in neural networks. Read more...
The deeper or more “hierarchical” a computational graph is, the more complex the model that it represents. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
A workbook I created to explain the math and physics behind an Iron Man suit to a student who was interested in the comics / movies. Read more...
A workbook I created to explain the math and physics behind an egg drop experiment to a student who was interested in Lord of the Rings and Star Wars. Read more...
A brief overview of sound waves and how they interact with things. Read more...
A brief overview of the experimental search for dark matter (XENON, CDMS, PICASSO, COUPP). Read more...
Mass discrepancies in galaxies and clusters, cosmic background radiation, the structure of the universe, and big bang nucleosynthesis’s impact on baryon density. Read more...
Improper integrals have bounds or function values that extend to positive or negative infinity. Read more...
We can apply integration by parts whenever an integral would be made simpler by differentiating some expression within the integral, at the cost of anti-differentiating another expression within the integral. Read more...
Substitution involves condensing an expression of into a single new variable, and then expressing the integral in terms of that new variable. Read more...
To evaluate a definite integral, we find the antiderivative, evaluate it at the indicated bounds, and then take the difference. Read more...
The antiderivative of a function is a second function whose derivative is the first function. Read more...
Integrals give the area under a portion of a function. Read more...
Systems of quadratic equations can be solved via substitution. Read more...
To easily graph a quadratic equation, we can convert it to vertex form. Read more...
Completing the square helps us gain a better intuition for quadratic equations and understand where the quadratic formula comes from. Read more...
To solve hard-to-factor quadratic equations, it’s easiest to use the quadratic formula. Read more...
Factoring is a method for solving quadratic equations. Read more...
Quadratic equations are similar to linear equations, except that they contain squares of a single variable. Read more...
Many differential equations don’t have solutions that can be expressed in terms of finite combinations of familiar functions. However, we can often solve for the Taylor series of the solution. Read more...
To find the Taylor series of complicated functions, it’s often easiest to manipulate the Taylor series of simpler functions. Read more...
Many non-polynomial functions can be represented by infinite polynomials. Read more...
Various tricks for determining whether a series converges or diverges. Read more...
A geometric series is a sum where each term is some constant times the previous term. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Combining game-specific human intelligence (heuristics) and generalizable artificial intelligence (minimax on a game tree) Read more...
One of the best ways to get practice with object-oriented programming is implementing games. Read more...
An oval () fits inside a rectangle [ ] with the same width and height. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Drawing –> Latex commands –> ChatGPT summary –> Google more info Read more...
There’s a cognitive principle behind this: associative interference, the phenomenon that conceptually related pieces of knowledge can interfere with each other’s recall. Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
At Aunalytics, Mapper outperformed hierarchical clustering in providing granular insights. Read more...
Ayasdi developed commercial Mapper software and sells a subscription service to clients who wish to create topological network visualizations of their data. Read more...
Demonstrating an open-source implementation of Mapper in the TDAmapper package for R. Read more...
Representing a data space’s topology by converting it into a network. Read more...
A linear system consists of multiple linear equations, and the solution of a linear system consists of the pairs that satisfy all of the equations. Read more...
Standard form makes it easy to see the intercepts of a line. Read more...
An easy way to write the equation of a line if we know the slope and a point on a line. Read more...
Introducing linear equations in two variables. Read more...
Loosely speaking, a linear equation is an equality statement containing only addition, subtraction, multiplication, and division. Read more...
A slant asymptote is a slanted line that arises from a linear term in the proper form of a rational function. Read more...
If we choose one input on each side of an asymptote, we can tell which section of the plane the function will occupy. Read more...
Vertical asymptotes are vertical lines that a function approaches but never quite reaches. Read more...
Rational functions can have a form of end behavior in which they become flat, approaching (but never quite reaching) a horizontal line known as a horizontal asymptote. Read more...
Polynomial long division works the same way as the long division algorithm that’s familiar from simple arithmetic. Read more...
A piecewise function is pieced together from multiple different functions. Read more...
Trigonometric functions represent the relationship between sides and angles in right triangles. Read more...
Absolute value represents the magnitude of a number, i.e. its distance from zero. Read more...
Exponential functions have variables as exponents. Logarithms cancel out exponentiation. Read more...
Radical functions involve roots: square roots, cube roots, or any kind of fractional exponent in general. Read more...
Compositions of functions consist of multiple functions linked together, where the output of one function becomes the input of another function. Read more...
Inverting a function entails reversing the outputs and inputs of the function. Read more...
When a function is reflected, it flips across one of the axes to become its mirror image. Read more...
When a function is rescaled, it is stretched or compressed along one of the axes, like a slinky. Read more...
When a function is shifted, all of its points move vertically and/or horizontally by the same amount. Read more...
If we interpret linear systems as sets of vectors, then elimination corresponds to vector reduction. Read more...
The span of a set of vectors consists of all vectors that can be made by adding multiples of vectors in the set. We can often reduce a set of vectors to a simpler set with the same span. Read more...
A line starts at an initial point and proceeds straight in a constant direction. A plane is a flat sheet that makes a right angle with some particular vector. Read more...
What does it mean to multiply a vector by another vector? Read more...
N-dimensional space consists of points that have N components. Read more...
The inverse of a matrix is a second matrix which undoes the transformation of the first matrix. Read more...
Every square matrix can be decomposed into a product of rescalings and shears. Read more...
How to multiply a matrix by another matrix. Read more...
Matrices are vectors whose components are themselves vectors. Read more...
Implementing a differential equations model that won the Nobel prize. Read more...
A simple differential equations model that we can plot using multivariable Euler estimation. Read more...
Arrays can be used to implement more than just matrices. We can also implement other mathematical procedures like Euler estimation. Read more...
How to sample from a discrete probability distribution. Read more...
Estimating probabilities by simulating a large number of random experiments. Read more...
Just like single-variable gradient descent, except that we replace the derivative with the gradient vector. Read more...
We take an initial guess as to what the minimum is, and then repeatedly use the gradient to nudge that guess further and further “downhill” into an actual minimum. Read more...
Bisection search involves repeatedly moving one bound halfway to the other. The Newton-Raphson method involves repeatedly moving our guess to the root of the tangent line. Read more...
Backtracking can drastically cut down the number of possibilities that must be checked during brute force. Read more...
Brute force search involves trying every single possibility. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed? Read more...
Many students who pattern-match will tend to prefer solutions requiring fewer and simpler operations, especially if those solutions yield ballpark-reasonable results. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not true. Read more...
Students eat meals of information at similar bite rates when each spoonful fed to them is sized appropriately relative to the size of their mouth. (Note that equal bite rates does not imply equal rates of food volume intake.) Read more...
Solving problems, building on top of what you’ve learned, reviewing what you’ve learned, and quality, quantity, and spacing of practice. Read more...
It rests on a critical assumption that the amount of learning that occurs during initial instruction is zero or otherwise negligible, which is not true. Read more...
First, you want to form a habit. Second, you want to operate at peak productivity during your session. Third, you want to minimize the amount you forget between sessions. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
During practice, the elite skaters were over 6 times more active than passive, while non-competitive skaters were nearly as passive as they were active. Read more...
A startup spent months building a sophisticated lecture tool and raising over half a million dollars in investments – but after observing students in the lecture hall, they completely abandoned the product and called up their investors to return the money. Read more...
True active learning requires every individual student to be actively engaged on every piece of the material to be learned. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Cognition involves the flow of information through sensory, working, and long-term memory banks in the brain. Sensory memory temporarily holds raw data, working memory manipulates and organizes information, and long-term memory stores it indefinitely by creating strategic electrical wiring between neurons. Learning amounts to increasing the quantity, depth, retrievability, and generalizability of concepts and skills in a student’s long-term memory. Limited working memory capacity creates a bottleneck in the transfer of information into long-term memory, but cognitive learning strategies can be used to mitigate the effects of this bottleneck. Read more...
The brain is a neuronal network integrating specialized subsystems that use local competition and thresholding to sparsify input, spike-timing dependent plasticity to learn inference, and layering to implement hierarchical predictive learning. Read more...
We solve a special case of how to periodically stimulate a biological neural network to obtain a desired connectivity (in theory). Read more...
The limit of a function is the height where it looks like the scribble is going to hit a particular vertical line. Read more...
To solve a system of inequalities, we need to solve each individual inequality and find where all their solutions overlap. Read more...
Quadratic inequalities are best visualized in the plane. Read more...
When a linear equation has two variables, the solution covers a section of the coordinate plane. Read more...
An inequality is similar to an equation, but instead of saying two quantities are equal, it says that one quantity is greater than or less than another. Read more...
We can sketch the graph of a polynomial using its end behavior and zeros. Read more...
The rational roots theorem can help us find zeros of polynomials without blindly guessing. Read more...
The zeros of a polynomial are the inputs that cause it to evaluate to zero. Read more...
The end behavior of a polynomial refers to the type of output that is produced when we input extremely large positive or negative values. Read more...
Rather than duplicating such code each time we want to use it, it is more efficient to store the code in a function. Read more...
We often wish to tell the computer instructions involving the words “if,” “while,” and “for.” Read more...
We can store many related pieces of data within a single variable called a data structure. Read more...
We can store and manipulate data in the form of variables. Read more...
Solving linear systems can sometimes be a necessary component of solving nonlinear systems. Read more...
Shearing can be used to express the solution of a linear system using ratios of volumes, and also to compute volumes themselves. Read more...
Rich intuition about why the number of solutions to a square linear system is governed by the volume of the parallelepiped formed by the coefficient vectors. Read more...
N-dimensional volume generalizes the idea of the space occupied by an object. We can think about N-dimensional volume as being enclosed by N-dimensional vectors. Read more...
The matrix exponential can be defined as a power series and used to solve systems of linear differential equations. Read more...
Jordan form provides a guaranteed backup plan for exponentiating matrices that are non-diagonalizable. Read more...
Matrix diagonalization can be applied to construct closed-form expressions for recursive sequences. Read more...
The eigenvectors of a matrix are those vectors that the matrix simply rescales, and the factor by which an eigenvector is rescaled is called its eigenvalue. These concepts can be used to quickly calculate large powers of matrices. Read more...
Implementing the Cartesian product provides good practice working with arrays. Read more...
Sequences where each term is a function of the previous terms. Read more...
There are other number systems that use more or fewer than ten characters. Read more...
It’s assumed that you’ve had some basic exposure to programming. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
An aha moment with object-oriented programming. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
Using convolutional layers to create an even better checkers player. Read more...
Extending Fogel’s tic-tac-toe player to the game of checkers. Read more...
Reimplementing the paper that laid the groundwork for Blondie24. Read more...
A method for training neural networks that works even when training feedback is sparse. Read more...
In order to justify using a more complex model, the increase in performance has to be worth the cost of integrating and maintaining the complexity. Read more...
Two subtypes of coders that I watched students grow into. Read more...
Stuff you don’t find in math textbooks. Read more...
… are summarized in the following table. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Bridging the communication gap between academia and industry in the field of TDA. Read more...
Demonstrating an open-source implementation of persistent homology techniques in the TDA package for R. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
An intuitive derivation. Read more...
A simple mnemonic trick for quickly differentiating complicated functions. Read more...
Hidden inside of every quadratic, there is a perfect square. Read more...
Every inscribed triangle whose hypotenuse is a diameter is a right triangle. Read more...
During its operation from 2020 to 2023, Eurisko was the most advanced high school math/CS sequence in the USA. It culminated in high school students doing masters/PhD-level coursework (reproducing academic research papers in artificial intelligence, building everything from scratch in Python). Read more...
Two subtypes of coders that I watched students grow into. Read more...
In 9 months, these students went from initially not knowing how to write helper functions to building a machine learning library from scratch. Read more...
We can algorithmically build classifiers that use a sequence of nested “if-then” decision rules. Read more...
A simple classification algorithm grounded in Bayesian probability. Read more...
One of the simplest classifiers. Read more...
There’s only so much you can hone your math skills by working on a problem that someone else has intentionally set up to be well-posed and elegantly solvable if you think about it the right way. Read more...
Good problem = intersection between your own interests/talents, the realm of what’s feasible, and the desires of the external world. Read more...
Stuff you don’t find in math textbooks. Read more...
My training has been scattered and fuzzy until recently. Here’s the whole story. Read more...
Implementation notes for STDP learning in a network of Hodgkin-Huxley simulated neurons. Read more...
Many existing proofs are not accessible to young mathematicians or those without experience in the realm of dynamic systems. Read more...
Category theory provides a language for explicitly describing indirect relationships in graphs. Read more...
Framing complex systems in the language of category theory. Read more...
A function is a scribble that crosses each vertical line only once. Read more...
A series is the sum of a sequence. Read more...
A sequence is a list of numbers that has some pattern. Read more...
Type I pairs with the variable that runs vertically in the usual representation of the coordinate system. The remaining types are paired with the rest of the variables in ascending order. Read more...
The behavior of a multivariable function can be highly specific to the path taken. Read more...
Merge sort and quicksort are generally faster than selection, bubble, and insertion sort. And unlike counting sort, they are not susceptible to blowup in the amount of memory required. Read more...
Some of the simplest methods for sorting items in arrays. Read more...
Repeatedly choosing the action with the best worst-case scenario. Read more...
Building data structures that represent all the possible outcomes of a game. Read more...
Minor changes to increase workout intensity and caloric surplus. Read more...
Daily 20-30 minute bedroom workout with gymnastic rings hanging from pull-up bar – just as much challenge as weights, but inexpensive and easily portable. Read more...
Won first place in a state-level competition by finding and exploiting a loophole in the points scoring logic. Read more...
The most important things I learned from competing in science fairs had nothing to do with physics or even academics. My main takeaways were actually related to business – in particular, sales and marketing. Read more...
Everyone has some level of abstraction beyond which they are incapable of engaging in first-principles reasoning. That level is different for everyone, and it’s not a hard threshold, but beyond it the time and mental effort required to perform first-principles reasoning skyrockets until first-principles reasoning becomes completely infeasible. Read more...
Is there a standard “order of operations” for parallel vs nested absolute value expressions, in the absence of clarifying notation? Read more...
Hard-coding explanations feels tedious, takes a lot of work, and isn’t “sexy” like an AI that generates responses from scratch – but at least it’s not a pipe dream. It’s a practical solution that lets us move on to other components of the AI that are just as important. Read more...
For many (but not all) students, the answer is yes. And for many of those students, automation can unlock life-changing educational outcomes. Read more...
The network becomes book-smart in a particular area but not street-smart in general. The training procedure is like a series of exams on material within a tiny subject area (your data subspace). The network refines its knowledge in the subject area to maximize its performance on those exams, but it doesn’t refine its knowledge outside that subject area. And that leaves it gullible to adversarial examples using inputs outside the subject area. Read more...
Initial parameter range, data sampling range, severity of regularization. Read more...
Montaigne’s education, strictly dictated by his parents and university studies, resulted in an isolative work with scholarly impact but limited public reach. Conversely, Benjamin Franklin’s goal-oriented self-teaching led to influential creations and roles benefiting his community and nation. Read more...
The main ideas behind computers can be understood by anyone. Read more...
Framing complex systems in the language of category theory. Read more...
In a simplified problem framing, we investigate the (game-theoretical) usefulness of limiting the number of social connections per person. Read more...
Persistent homology provides a way to quantify the topological features that persist over our a data set’s full range of scale. Read more...
The derivative tells the steepness of a function at a given point, kind of like a carpenter’s level. Read more...
How to avoid some of the most common pitfalls leading to ugly LaTeX. Read more...
A technique for maximizing linear expressions subject to linear constraints. Read more...
… are summarized in the following table. Read more...
In general, you can manipulate total derivatives like fractions, but you can’t do the same with partial derivatives. Read more...
Q: Draw a 10 x 10 square grid. How many squares are there in total? Not just 1 x 1 squares, but also 2 x 2 squares, 3 x 3 squares, and so on. A: The total number of square shapes is the total sum of square numbers 1 + 4 + 9 + 16 + … + 100. Read more...
Active learning leads to more neural activation than passive learning. Automaticity involves developing strategic neural connections that reduce the amount of effort that the brain has to expend to activate patterns of neurons. Read more...
Deliberate practice is the most effective form of active learning. It consists of individualized training activities specially chosen to improve specific aspects of a student’s performance through repetition and successive refinement. It is the opposite of mindless repetition. The amount of deliberate practice has been shown to be one of the most prominent underlying factors responsible for individual differences in performance across numerous fields, even among highly talented elite performers. Deliberate practice demands effort and intensity, and may be discomforting, but its long-term commitment compounds incremental improvements, leading to expertise. Read more...
Mastery learning is a strategy in which students demonstrate proficiency on prerequisites before advancing. While even loose approximations of mastery learning have been shown to produce massive gains in student learning, mastery learning faces limited adoption due to clashing with traditional teaching methods and placing increased demands on educators. True mastery learning at a fully granular level requires fully individualized instruction and is only attainable through one-on-one tutoring. Read more...
Loosely inspired by the German tank problem: several witnesses reported seeing a UFO during the given time intervals, and you want to quantify your certainty regarding when the UFO arrived and when it left. Read more...
Sure, accelerating via self-study not as optimal as accelerating within teacher-managed courses, but it’s way better than not accelerating at all. Read more...
When you’re developing skills at peak efficiency, you are maximizing the difficulty of your training tasks subject to the constraint that you end up successfully overcoming those difficulties in a timely manner. Read more...
There are many, many studies that measure variation in WMC vs variation in other metrics. Read more...
First, you need extensive and solid content knowledge. Then, you need to work through tons of practice exams for the specific exam you’re taking. This might sound simple, but every year, countless people manage to screw it up. Read more...