Paul Erdős, one of the most prolific mathematicians of the 20th century,
left behind hundreds of puzzles when he died.
To help keep track of which ones have been solved, Thomas Bloom,
a mathematician at the University of Manchester, UK, set up erdosproblems.com,
which lists more than 1,100 problems and notes that around 430 of them come with solutions.
When Sebastian Bubeck celebrated GPT-5’s erroneous breakthrough, Bloom was quick to call him out.
“This is a dramatic misrepresentation,”
he wrote on X.
Bloom explained that a problem isn’t necessarily unsolved if this website does not list a solution.
That simply means Bloom wasn’t aware of one. There are millions of mathematics papers out there, and nobody has read all of them.
-- But GPT-5 probably has.
It turned out that instead of coming up with new solutions to 10 unsolved problems,
GPT-5 had scoured the internet for 10 existing solutions that Bloom hadn’t seen before. Oops!
There are two takeaways here.
One is that breathless claims about big breakthroughs shouldn’t be made via social media:
Less knee jerk and more gut check.
The second is that GPT-5’s ability to find references to previous work that Bloom wasn’t aware of is also amazing.
The hype overshadowed something that should have been pretty cool in itself.
Mathematicians are very interested in using LLMs to trawl through vast numbers of existing results,
François Charton, a research scientist who studies the application of LLMs to mathematics at the AI startup Axiom Math, told me when I talked to him about this Erdős gotcha.
But literature search is dull compared with genuine discovery,
especially to AI’s fervent boosters on social media. Bubeck’s blunder isn’t the only example.
In August, a pair of mathematicians showed that no LLM at the time was able to solve a math puzzle known as Yu Tsumura’s 554th Problem.
Two months later, social media erupted with evidence that GPT-5 now could.
“Lee Sedol moment is coming for many,” one observer commented,
referring to the Go master who lost to DeepMind’s AI AlphaGo in 2016.
But Charton pointed out that solving Yu Tsumura’s 554th Problem isn’t a big deal to mathematicians.
“It’s a question you would give an undergrad,” he said. “There is this tendency to overdo everything.”
Meanwhile, more sober assessments of what LLMs may or may not be good at are coming in.
At the same time that mathematicians were fighting on the internet about GPT-5,
two new studies came out that looked in depth at the use of LLMs in medicine and law (two fields that model makers have claimed their tech excels at).
Researchers found that LLMs could make certain medical diagnoses,
but they were flawed at recommending treatments.
When it comes to law, researchers found that LLMs often give inconsistent and incorrect advice.
“Evidence thus far spectacularly fails to meet the burden of proof,” the authors concluded.
But that’s not the kind of message that goes down well on X.
“You’ve got that excitement because everybody is communicating like crazy
—nobody wants to be left behind,” Charton said.
X is where a lot of AI news drops first, it’s where new results are trumpeted,
and it’s where key players like Sam Altman, Yann LeCun, and Gary Marcus slug it out in public.
It’s hard to keep up—and harder to look away.
Bubeck’s post was only embarrassing because his mistake was caught.
Not all errors are.
Unless something changes researchers, investors, and non-specific boosters will keep teeing each other up.
“Some of them are scientists, many are not, but they are all nerds,” Charton told me.
“Huge claims work very well on these networks.”
https://www.technologyreview.com/2025/12/23/1130393/how-social-media-encourages-the-worst-of-ai-boosterism/
#paulerdős
https://www.erdosproblems.com/