The Situation with AI in STEM Education

by Justin Skycak on May 07, 2024

What are LLMs good for in STEM education? Where do LLMs fall short, and why? What does an educational AI need to do for its students to succeed?

At a glance…

What are LLMs good for in STEM education?
Where do LLMs fall short?
The root problem: LLMs only respond to student questions.
A teacher does more than explain. A teacher also navigates and scaffolds.
But a teacher also does more than navigate and scaffold. A teacher also manages the entire learning process.
Even if a LLM explains as well as Feynman, 99% of students won't be able to learn from it.
How have past AI systems fallen short in education?
What does an educational AI need to do for its students to succeed?

What are LLMs good for in STEM education?

Quite a few people have experienced a sort of “intellectual awakening” thanks to LLMs. In school, they weren’t studious, motivated, or even interested in the material.

But once ChatGPT came out, they started talking to it and eventually ended up asking a bunch of “why/how” questions like a young child might do – e.g., is time travel possible? How does the internet work? What is a neural network?

By chatting with ChatGPT about these topics, they developed an interest and a baseline level of surface-level knowledge about various STEM subjects.

Where do LLMs fall short?

These learners used an LLM to spur interest in STEM subjects and acquire some baseline knowledge. That’s great. But what they’re unable to learn from the LLM is the hard, technical skills and the associated concepts.

(There do exist autodidacts who can teach themselves hard, technical skills just given reference material – but they’re not really who we’re talking about here. We’re focusing on the non-autodidacts who have learned some stuff from LLMs that they would not have learned without LLMs. Autodidacts already have all the information they need on libraries and the internet; LLMs are not a game-changing technology for them.)

Let’s consider a particular one of these learners who used ChatGPT to learn about, well, neural networks and LLMs themselves.

Sure, they can talk about how cool LLMs are and they might know that it’s based on the transformer neural network architecture. They might be familiar with some other architectures like a convolutional neural network, and they might know that convolutional neural networks are often used for image processing.

But can they explain the difference between how different neural network architectures are connected up? Probably not.

Can they talk about the tradeoffs between all the choices for different components of the model including activation functions, loss functions, learning rates, regularization methods, etc? No.

Can they code up a neural network from scratch, including implementing the backpropagation algorithm (which requires applying the chain rule from multivariable calculus)? Heck no.

And if they were given a neural network model with some bug in it, could they figure out what’s going wrong and then fix it? No way in hell.

(Just to give a sense of what would be needed to pull this off: not only would the learner need to be able to verify the backpropagation computations, but they would also have to conceptually understand how different features of the model’s output are indicative of various choices – and issues – in the mathematical machinery under the hood. They would likely need to track statistical distributions throughout the model, and who knows, the bug might not even be in the model itself – the bug might stem from an undesirable statistical property of the data on which the model was trained.)

The root problem: LLMs only respond to student questions.

Can’t our learner just ask the LLM to teach them all the stuff above, including exercises on which to practice?

Not really. The problem is that our learner doesn’t know what to ask for. They don’t know where to start their learning journey, and how to build up an understanding.

This is where the analogy between an LLM and a teacher really starts to break down. Sure, in some sense an LLM is like a teacher because it can respond intelligently to a student’s questions. But on the flipside, all of its responses are contingent upon a query from the student.

Think about what an effective teacher does. Do they just stand up at the front of the class and field questions from students? No. They deliver material in a structured, scaffolded manner so that it actually makes sense to students.

Many times, students’ own questions are not even well-posed – and even when they are, they may not be productive to explore fully given the student’s knowledge.

For instance, it’s common for an expert teacher to respond to a student’s question like this:

"By A, I think what you are really getting at is B, and that's a good question that we're going to explore later down the road once you've developed a better understanding of C and D, and in order to get to C and D, we're going to need to get through E, F, and G first. But to satiate your curiosity as much as I can given what you've learned so far, I'll tell you that H."

A teacher does more than explain. A teacher also navigates and scaffolds.

Think of a gymnastics coach – if a novice signs up for gymnastics lessons and asks the coach to teach them how to do a backflip, does the coach demonstrate the components of a backflip and ask the student to mirror their movements? Heck no!

Chances are, the student can’t even jump high enough off the ground yet. There are numerous component skills, one of which is explosive jumping strength, that need to be built up before the student has any chance of successfully landing a backflip.

The coach knows this, and breaks down the learning process into a scaffolded journey up the hierarchy of these component skills. The coach also determines what constitutes a sufficient level of mastery to advance beyond each skill, which is another thing that students typically struggle with.

But a teacher also does more than navigate and scaffold. A teacher also manages the entire learning process.

LLMs are kind of like human experts. (Not a world-class expert with years of hands-on experience, but more like a book-smart person who is well-read in every subject.)

But if subject expertise were all that it took to be an effective teacher, then we would expect the most renowned mathematicians to be the best math teachers. Is that the case? Heck no!

Every STEM major in college can count off numerous professors who are true experts in their field but whose students do a poor job of learning the material.

Sometimes, this is due to shortcomings in navigation and scaffolding. For instance, I recently tutored a student who took a Real Analysis course from a professor who did not teach from a textbook, and there were no class notes, just problem sets where abstract problems were rarely preceded by simpler cases. I’m told that most of the class was completely lost, but the professor would take the class’s silence as an indication that they fully grasped everything that was said (whereas in actuality, they were so lost that nobody could even pinpoint a specific thing to ask about).

But other times, even if an instructor excels at explanation, poor learning outcomes can still result from neglecting to manage the entire learning process. This is why you can’t actually learn a subject in proper depth just by watching 3Blue1Brown videos.

It may come as a surprise to many that Richard Feynman – widely known as “the great explainer,” one of the greatest lecturers of all time – also belongs in this category. And that’s not just my opinion. That’s coming from Feynman himself.

Even if a LLM explains as well as Feynman, 99% of students won't be able to learn from it.

According to Feynman himself, his classes were a failure for 90% of his students. In his lectures, Feynman did a phenomenal job appealing to intuition and conceptual thinking, making complex physics feel simple and accessible without getting too deep into the math. On the flipside, however – when it came time to solve actual problems on exams, Feynman’s students failed.

Take it from Feynman himself in the preface to his quantum mechanics lectures:

"I don't think I did very well by the students. When I look at the way the majority of the students handled the problems on the examinations, I think that the system is a failure."

Additionally, while some may view Feynman-style pedagogy as supporting inclusive learning for all students across varying levels of ability, Feynman himself acknowledged that his methods only worked for the top 10% of his students – and he even went as far as to admit that those were the only students he was actually trying to engage with his teaching.

"Of course, my friends point out to me that there were one or two dozen students [10% out of a 180-student cohort] who—very surprisingly—understood almost everything in all of the lectures, and who were quite active in working with the material and worrying about the many points in an excited and interested way. These people have now, I believe, a first-rate background in physics—and they are, after all, the ones I was trying to get at."

It’s worth noting that, because Feynman taught at Caltech (which is one of the most selective universities in the world, and possibly the most STEM-focused university in the world), the top 10% of Feynman’s students were well above the top 1% of students in general (and that’s a conservative estimate).

How have past AI systems fallen short in education?

Many people who have (unsuccessfully) attempted to apply AI to education have focused too much on the “explanation” part and not enough on the “scaffolding” and “management” parts. Yes, for an AI system to be successful in education, it has to be able to explain things clearly – but as we’ve discussed above, that’s only one piece of the puzzle.

Pitfall #1: Over-engineering the “explanation” component.

It’s easy to go on a wild goose chase building an “explanation AI.” There are endless fascinating distractions.

For example, it’s easy to fall in love and get lost in the idea of the AI having conversational dialogue with the student. But conversational dialogue opens a can of worms on complexity, and it turns out to not even be necessary.

You can create extremely clear explanations by having humans hard-code them – the trick is that you just need to break them up into bite-sized pieces and serve each one to the student at just the right time. And you can tie up the feedback loop by having the student solve problems (so their response is essentially whether they got the problem correct or not) – which is something that they need to be doing anyway.

Sure, hard-coding bite-sized explanations can feel tedious for a human, and it requires the effort of a full team of humans over the course of years, and it’s not as “sexy” as an AI program that comes up with its own responses from scratch – but unlike the conversational dialogue approach, it’s actually tractable. It’s not just a pipe dream. It’s a practical solution.

If you’re willing to put in the time and effort, then you can solve the problem using this practical approach and move on to building the other components of the AI system, which are just as important.

Pitfall #2: Cutting corners on the other components.

It’s also tempting to cut corners on the less sexy (but still crucial) parts, scaffolding and managing the learning process.

Pitfall #2.1: Cutting corners on scaffolding.

It’s very expensive to create a course textbook from scratch. But guess what? Textbooks typically aren’t scaffolded enough for students to learn on their own.

If you’re developing an education AI, then you have to increase the granularity of the curriculum by an order of magnitude so that it can be consumed by students in bite-sized pieces.

And guess what happens when you increase the granularity of the curriculum? You also increase the cost to develop it.

So, unless you manage to secure a lot of funding for your education AI system, you’re probably going to have an under-scaffolded curriculum, which means students are probably going to get stuck at various places within it.

Pitfall #2.1: Cutting corners on managing the learning process.

There are a lot of components within managing the learning process: forcing students to solve problems, responding to a student’s struggles, reviewing previously-learned material so that the student doesn’t forget it, and transitioning from problem-solving in easier contexts (e.g. with a worked example to look back at) to harder contexts (e.g. on a timed quiz with no reference material available), just to name a few.

To provide a single example of an area where it’s tempting to cut corners, let’s focus on the need to respond to a student’s struggles. An AI education system will need some sort of remediation protocol for when a student struggles with a task that they are asked to accomplish.

In such cases, it’s tempting to take the easy way out and lower the bar for success, whereas what the AI system should really do is provide remedial practice to shore up the student’s weaknesses (so that the student develops the ability to clear the bar where it’s at).

A particular example of this that has plagued prior AI systems is allowing students to request so many hints that it renders the problem trivial. In that case, of course students will request maximum hints by default, solve the now-trivial problem, and learn little or nothing from it!

What does an educational AI need to do for its students to succeed?

Simply put, it has to do a good job of explaining the content AND managing the learning process.

It needs to start out with a minimal dose of explanation (in which intuition and conceptual thinking do have a place) – but then immediately switch over to active problem-solving.

During active problem-solving, students should begin with simple cases but then climb up the ladder of difficulty to cover all cases that the student could reasonably be expected to demonstrate their knowledge of on an assessment.

Assessments should be frequent and broad in coverage, and students should be assigned personalized remedial reviews based on what they answered incorrectly.‌

Students should progress through the curriculum in a personalized and mastery-based manner, only being presented with new topics when they have (as individuals, not just as a group) demonstrated mastery of the prerequisite material.

And even after a student has learned a topic, they should periodically review it using spaced repetition, a systematic way of reviewing previously-learned material to retain it indefinitely into the future.

If a student ever struggles, the system should not lower the bar for success on the learning task (e.g., by giving away hints). Rather, it should take actions that are most likely to strengthen a student’s area of weakness and allow them to clear the bar fully and independently on their next attempt.