LLMs are the ultimate demoware
https://blog.charliemeyer.co/llms-are-the-ultimate-demoware/
LLMs are the ultimate demoware
https://blog.charliemeyer.co/llms-are-the-ultimate-demoware/
It's wild to me that, of all the things to call LLMs out for, this piece has chosen to include math tutoring. I've been doing Math Academy for a bit over 6 months now, going from (essentially) Algebra II through Calc II (integration by parts, arc lengths, Taylor expansions) and LLMs have been a huge part of what has made that effective:
* Clear explanation of concepts that respond to questions and reformulate when things bounce
* Step-by-step verification of solutions, spotting exactly where calculations have gone
* Instantaneously generating new problem sets to reinforce concepts
LLMs are probably not going to live up to all sorts of claims their proponents make. But I don't think you can ever have tried to use an LLM in a math course and reach the conclusion that it's "demoware" for that application. At what point, over 6 months of continuous work, does it stop being a "demo"?
I think you'd be crazy to say LLMs are blockchain-style hype when it comes to software development but I don't begrudge anybody who believes they're not currently workable for the kinds of problems they work on; I think reasonable people can disagree about how ready for prime time they are for production software development.
But for math tutoring? If you claim LLM math tutoring is demoware, you're very clearly telling on yourself.
Wholeheartedly recommend it, just remember we're not the core market for it (that's high school students, though the curricula goes all the way through the normal college math sequence).
Minutes later
In case I've spooked anyone, they have an adult course series (Foundations I, II, and III) that's accelerated by trimming out all the material their authors believe are important only for things like school placement exams; the modal adult Math Academy person is doing I, II, and III as a leadup to their Math for Machine Learning course, which is linear algebra and multivariable calc.
I think it's one of the three most mindblowing learning resources I've ever used. One of the other three: Lingua Latina Familia Romana. In both cases, I have the uncanny certainty that I am operating at the limit of my ability to acquire and retain new information, which is a fun place to be.
It's nice that you think it's clear and responsive, but I think it [1] needs to be validated by an expert in both the material and education. Or we need some way to show that people have actually learned the topic. People sometimes prefer explanations that are intuitive and familiar but not accurate.
Meanwhile, there are math education resources like iXL that maybe cost a little money but the lessons and practice problems are fully curated by human experts (AFAICT). I'm not saying these resources are perfect either, but as a mathematician who has experimented a lot with LLMs, including in supposed tutoring modes, they make a lot of mistakes and take a lot of shortcuts that should materially decrease their effectiveness as tutors.
[1] LLM-based tutoring (edit: footnote added to clarify)
What makes you think https://www.mathacademy.com/faq hadn't been evaluated by experts?
That appears to be their whole thing, and they've been in business for longer than LLMs have been around.
This was linked from the homepage: https://www.mathacademy.com/how-our-ai-works
But more importantly if tptacek says they use LLMs and is a user of the platform that's good enough for me.
I'm using LLMs alongside Math Academy. Math Academy uses machine learning generally (and so now they plug their "AI" technology) but it's not transformer-model-style AI ML; as I understand it, it's just driving their underlying spaced repetition system (which is interleaved through lots of different units).
In the scenario I'm discussing, Math Academy's content is a non-generative source of truth, against which I've benchmarked GPT5 and O4-mini.
That's exactly what Math Academy is: I'm operating with a grounded set of correct, validated content, and using LLMs to (1) fill in more conceptual explanation and (2) check where I went off the rails when I get things wrong. You can't play the "hallucination" card here. An LLM can reliably do partial fraction decomposition, spot and solve an ODE that admits direct integration, calculate an arc length, invert a matrix, or resolve a gnarly web of trig identities. If you say a current frontier model can't do this --- and do it from OCR'd screencaps! --- I'll respond that you haven't tried.
I can't think of a single instance where O4 or GPT5 got one of these problems wrong. It sees maybe 6-12 of them per day from me. I've been doing this since February.
That's very interesting. Maybe you are doing this the right way, and my concern as a math educator is for the people who may struggle to stay on the straight and narrow, or know what the straight and narrow is in this brave new world.
Where I see deficiencies is not so much in the calculations. When a problem class has a solution algorithm and 10,000 worked examples online, I'm not too surprised that the LLM generalizes pretty reliably to that problem class.
The problem I find is more when it's tricky, out-of-distribution, not entirely on the "happy path" of what the 10,000 examples are about. In that case, LLM responses quickly become irrelevant, illogical, and Pavlovian. It's the math version of messing up the surgeon riddle when presented with a minor variation that is logically very easy, but isn't the popular version everyone talks about [1].
[1] https://www.thealgorithmicbridge.com/p/openai-researchers-ha...
In my experience, it's 100%. Not 95%, not 99%. Unless GPT5 (and O4-mini) were colluding with Math Academy behind the scenes specifically to be wrong about something, it just doesn't get any of this content wrong.
And keep in mind, what it's getting right is trickier than just answering Calc I questions: it's taking an answer I give it, calculating the correct answer itself, selecting its answer over mine, and then spotting where I e.g. forgot to check the domain of a variable inside a log.
Isn't this moving the goalposts? It's great that you're learning but MathAcademy appears to be a whole product that may incorporate an LLM but is much more, and it's a paid product none of us can evaluate. It's not possible to tell from looking at their site, or from your comment, what content is generated, or how it is verified before being used as teaching material.
There are probably smart ways to incorporate LLM output into an application like the one you're lauding but your comment is a little like responding "but my cake tastes good" to someone who says you shouldn't eat raw flour.
While I agree, on an unrelated note - I knew I know your nick from somewhere...
And then I realized[0].
[0] https://ludic.mataroa.blog/blog/contra-ptaceks-terrible-arti...
I had a conversation with that person a couple weeks ago. They're nice. I think we both would tweak (if just a little bit) how we presented our articles with the benefit of hindsight.
For the record, I'm a systems programmer and a security person and I don't work for an AI company (you can Six Degrees of Sam Altman any startup to AI now if you want to make the claim, but if you try I'm just going to say "Sir, This Is A Wendy's".)