LLMs Are the Ultimate Demoware

“Software” is a program that runs on your computer that allows you to accomplish a task. Software is “good” if it frequently allows you to accomplish sa...

Charlie Meyer's Blog

It's wild to me that, of all the things to call LLMs out for, this piece has chosen to include math tutoring. I've been doing Math Academy for a bit over 6 months now, going from (essentially) Algebra II through Calc II (integration by parts, arc lengths, Taylor expansions) and LLMs have been a huge part of what has made that effective:

* Clear explanation of concepts that respond to questions and reformulate when things bounce

* Step-by-step verification of solutions, spotting exactly where calculations have gone

* Instantaneously generating new problem sets to reinforce concepts

LLMs are probably not going to live up to all sorts of claims their proponents make. But I don't think you can ever have tried to use an LLM in a math course and reach the conclusion that it's "demoware" for that application. At what point, over 6 months of continuous work, does it stop being a "demo"?

It seems very hard to maintain the belief that LLMs are useless in the face of the fact that millions of people are using them. It's very much "nobody goes there anymore, it's too crowded"

I think you'd be crazy to say LLMs are blockchain-style hype when it comes to software development but I don't begrudge anybody who believes they're not currently workable for the kinds of problems they work on; I think reasonable people can disagree about how ready for prime time they are for production software development.

But for math tutoring? If you claim LLM math tutoring is demoware, you're very clearly telling on yourself.

This https://www.mathacademy.com/ ? Interesting, hadn't seen that before. I've been thinking I'd like to brush up on a bunch of those topics.
Math Academy

Wholeheartedly recommend it, just remember we're not the core market for it (that's high school students, though the curricula goes all the way through the normal college math sequence).

Minutes later

In case I've spooked anyone, they have an adult course series (Foundations I, II, and III) that's accelerated by trimming out all the material their authors believe are important only for things like school placement exams; the modal adult Math Academy person is doing I, II, and III as a leadup to their Math for Machine Learning course, which is linear algebra and multivariable calc.

I think it's one of the three most mindblowing learning resources I've ever used. One of the other three: Lingua Latina Familia Romana. In both cases, I have the uncanny certainty that I am operating at the limit of my ability to acquire and retain new information, which is a fun place to be.

It's nice that you think it's clear and responsive, but I think it [1] needs to be validated by an expert in both the material and education. Or we need some way to show that people have actually learned the topic. People sometimes prefer explanations that are intuitive and familiar but not accurate.

Meanwhile, there are math education resources like iXL that maybe cost a little money but the lessons and practice problems are fully curated by human experts (AFAICT). I'm not saying these resources are perfect either, but as a mathematician who has experimented a lot with LLMs, including in supposed tutoring modes, they make a lot of mistakes and take a lot of shortcuts that should materially decrease their effectiveness as tutors.

[1] LLM-based tutoring (edit: footnote added to clarify)

What makes you think https://www.mathacademy.com/faq hadn't been evaluated by experts?

That appears to be their whole thing, and they've been in business for longer than LLMs have been around.

Math Academy

I think before that question is useful to ask, we have to know if that FAQ even says anything about LLM-based tutoring. After a few minutes of research, I can't find any evidence that Math Academy offers LLM-based tutoring.

This was linked from the homepage: https://www.mathacademy.com/how-our-ai-works

But more importantly if tptacek says they use LLMs and is a user of the platform that's good enough for me.

Math Academy

I'm using LLMs alongside Math Academy. Math Academy uses machine learning generally (and so now they plug their "AI" technology) but it's not transformer-model-style AI ML; as I understand it, it's just driving their underlying spaced repetition system (which is interleaved through lots of different units).

In the scenario I'm discussing, Math Academy's content is a non-generative source of truth, against which I've benchmarked GPT5 and O4-mini.

You're confused. Math Academy isn't LLM-based. I use an LLM alongside it.
Not sure the condescending tone is really necessary. I’d agree with you if the parent comment was saying they asked an LLM to create a math curriculum and problems for them. But they’re using an established app created by a math major and then using LLMs to ask questions. It’s easier to validate the responses you get back in those cases.
I think students are not a reliable source of information about the effectiveness of LLM tutoring. There is no 100% nice way to say this, but I did my best. You're free to disagree, but I think the tone criticism is off-base.
We found our way to "No True Math Student". I love it!

That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried.

I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.

That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.

Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.

The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].

[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...

OpenAI Researchers Have Discovered Why Language Models Hallucinate

A review of OpenAI's latest research paper

The Algorithmic Bridge
No, we're not going to move the goalposts here. You can tweak any argument so that the thread goes nowhere and nobody can update their mental models by positing a sufficiently misguided user of a piece of technology. I'm saying: LLMs are quite good at math tutoring, in many ways probably significantly better than human tutors (they're tireless, can explain any concept 50 different ways, and can rattle off individualized problem sets in seconds). I made that claim, and you pushed back saying that anything I saw "needed to be validated by an expert". You even said that anything I said was an unreliable narrator because I'm studying math. No, to all of this.
Is it always correct?

In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.

And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.

Isn't this moving the goalposts? It's great that you're learning but MathAcademy appears to be a whole product that may incorporate an LLM but is much more, and it's a paid product none of us can evaluate. It's not possible to tell from looking at their site, or from your comment, what content is generated, or how it is verified before being used as teaching material.

There are probably smart ways to incorporate LLM output into an application like the one you're lauding but your comment is a little like responding "but my cake tastes good" to someone who says you shouldn't eat raw flour.

You're confused. Math Academy isn't LLM-based.
Offtopic, but do you have any comparison to math academy to something like Khan, or other platforms? MA seems a bit expensive for someone just wanting to improve a general skill, but perhaps it's well worth it? I thought Khan was also investing in similar AI offerings, so i'm curious how they intersect
Khan never clicked for me, and while the cost of Math Academy is below my noise floor (when you back it out to $/hr of engagement) as an adult professional in his prime earning years, I should also add that the cost is also a motivator: I've never been tempted to take a break, in part because I'm on the meter.

While I agree, on an unrelated note - I knew I know your nick from somewhere...

And then I realized[0].

[0] https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-arti...

Contra Ptacek's Terrible Article On AI — Ludicity

I had a conversation with that person a couple weeks ago. They're nice. I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.

For the record, I'm a systems programmer and a security person and I don't work for an AI company (you can Six Degrees of Sam Altman any startup to AI now if you want to make the claim, but if you try I'm just going to say "Sir, This Is A Wendy's".)

So .. a person who doesn't know X, is using LLMs to learn X, yet is able to judge that LLMs are doing a good job at teaching X, even though the person doesn't know X?
You're confused. Math Academy is not an LLM.