Mastodawn

Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem

Knuth Claude's Cycles note update: problem now fully solved, by LLMs - https://news.ycombinator.com/item?id=47306926 - March 2026 (2 comments)

https://chatgpt.com/share/69aaab4b-888c-8003-9a02-d1df80f9c7...

Claude's Cycles [pdf] - https://news.ycombinator.com/item?id=47230710 - March 2026 (362 comments)

https://twitter.com/BoWang87/status/2037648937453232504

Knuth Claude's Cycles note update: problem now fully solved, by LLMs | Hacker News

Show thread

vatsachak 19h ago

I've always said this but AI will win a fields medal before being able to manage a McDonald's.

Math seems difficult to us because it's like using a hammer (the brain) to twist in a screw (math).

LLMs are discovering a lot of new math because they are great at low depth high breadth situations.

I predict that in the future people will ditch LLMs in favor of AlphaGo style RL done on Lean syntax trees. These should be able to think on much larger timescales.

Any professional mathematician will tell you that their arsenal is ~ 10 tricks. If we can codify those tricks as latent vectors it's GG

Show thread

vatsachak 19h ago

Tricks are nothing but patterns in the logical formulae we reduce.

Ergo these are latent vectors in our brain. We use analogies like geometry in order to use Algebraic Geometry to solve problems in Number Theory.

An AI trained on Lean Syntax trees might develop it's own weird versions of intuition that might actually properly contain ours.

If this sounds far fetched, look at Chess. I wonder if anyone has dug into StockFish using mechanistic interpretability

Show thread

slopinthebag 19h ago

Stockfish's power comes from mostly search, and the ML techniques it uses are mainly about better search, i.e. pruning branches more efficiently.

Show thread

vatsachak 19h ago

The weights must still have some understanding of the chess board. Though there is always the chance that it makes no sense to us

Show thread

slopinthebag 19h ago

Even that is probably too much. It has no understanding of what "chess" is, or what a chess board is, or even what a game is. And yet it crushes every human with ease. It's pretty nuts haha.

Show thread

anematode 19h ago

Actually, the neural net itself is fairly imprecise. Search is required for it to achieve good play. Here's an example of me beating Stockfish 18 at depth 1: https://lichess.org/XmITiqmi

Blitz Chess • anematode vs BOT StockfishLowDepth

anematode (1802?) played BOT StockfishLowDepth (1462) in a casual Blitz (5+3) game of chess. anematode won by checkmate after 64 moves. Click to replay, analyse, and discuss the game!

lichess.org

Show thread

Sopel 19h ago

chess is just a simple mathematical construct so that's not surprising

Show thread

hollerith 19h ago

Does Stockfish have weights or use a neural net? I know older versions did not.

yes

Why must it involve understanding? I feel like you’re operating under the assumption that functionalism is the “correct” philosophical framework without considering alternative views.

Show thread

Sopel 19h ago

The ML techniques it uses are only about evaluation, but you were close

Show thread

myffical 18h ago

Some DeepMind researchers used mechanistic interpretability techniques to find concepts in AlphaZero and teach them to human chess Grandmasters: https://www.pnas.org/doi/10.1073/pnas.2406675122

Show thread

hodgehog11 18h ago

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and programming tasks. Similar to how traditional RL helps agents explore and learn new strategies, RLVR is believed to enable LLMs to continuously self-improve, thus acquiring novel reasoning abilities beyond those of the corresponding base models. In this study we critically examine the current state of RLVR by systematically probing the reasoning capability boundaries of RLVR-trained LLMs across various model families, RL algorithms, and math, coding, and visual reasoning benchmarks, using pass@k at large k values as the evaluation metric. Surprisingly, we find that the current training setup does not elicit fundamentally new reasoning patterns. While RLVR-trained models outperform their base models at small k (e.g., k = 1), the base models achieve a higher pass@k score when k is large. Coverage and perplexity analyses show that the observed reasoning abilities originate from and are bounded by the base model. Treating the base model as an upper bound, our quantitative analysis shows that six popular RLVR algorithms perform similarly and remain far from optimal in leveraging the potential of the base model. By contrast, we find that distillation can introduce new reasoning patterns from the teacher and genuinely expand the model's reasoning capabilities. Overall, our findings suggest that current RLVR methods have not yet realized the potential of RL to elicit truly novel reasoning abilities in LLMs. This highlights the need for improved RL paradigms, such as continual scaling and multi-turn agent-environment interaction, to unlock this potential.

arXiv.org

Show thread

narrator 17h ago

The search space for the game of Go was also thought to be too large for computers to manage.

Show thread

madrox 19h ago

> I've always said this but AI will win a fields medal before being able to manage a McDonald's.

I love this and have a corollary saying: the last job to be automated will be QA.

This wave of technology has triggered more discussion about the types of knowledge work that exist than any other, and I think we will be sharper for it.

Show thread

bitwize 18h ago

The ownership class will be sharper. They will know how to exploit capital and turn it into more capital with vastly increased efficiency. Everybody else will be hosed.

Show thread

ryanar 18h ago

Are they actually producing new math? In the most recent ACM issue there was an article about testing AI against a math bench that was privately built by mathematicians, and what they found is that even though AI can solve some problems, it never truly has come up with something novel and new in mathematics, it is just good at drawing connections between existing research and putting a spin on it.

Show thread

hodgehog11 18h ago

It's finding constructions and counterexamples. That's different from finding new proof techniques, but still extremely useful, and still gives way to novel findings.

Show thread

hodgehog11 17h ago

As a professional mathematician, I would say that a good proof requires a very good representation of the problem, and then pulling out the tricks. The latter part is easy to get operating using LLMs, they can do it already. It's the former part that still needs humans, and I'm perfectly fine with that.

Show thread

threethirtytwo 16h ago

But are you ok with the trendline of ai improvement? The speed of improvement indicates humans will only get further and further removed from the loop.

I see posts like your all the time comforting themselves that humans still matter, and every-time people like you are describing a human owning an ever shrinking section of the problem space.

Show thread

tartoran 16h ago

Humans needing to ask new question due to curiosity push the boundaries further, find new directions, ways or motivations to explore, maybe invent new spaces to explore. LLMs are just tools that people use. When people are no longer needed AI serves no purpose at all.

Show thread

kelseyfrog 17h ago

As of now, no models have solved a Millennium Prize Problem[1].

1. https://mppbench.com/

MillenniumPrizeProblemBench

Show thread

pks016 19h ago

Interesting but not surprising to me. Once a field expert guides the models, they most likely will reach a solution. The models are good at lazy work for experts. For hard or complicated questions, many a time the models have blind spots.

Show thread

smithcoin 19h ago

When I was younger I remember a point of demarcation for me was learning the 4chan adage “trolls trolling trolls”, and approaching all internet interactions with skepticism. While I have been sure that Reddit for a while has succumbed to being “dead internet”. This thread is another moment for me- I can no longer recognize who is a bot, and who has honest intentions.

Show thread

ftchd 18h ago

https://xcancel.com/BoWang87/status/2037648937453232504

Bo Wang (@BoWang87)

Three weeks ago I shared that Claude had shocked Prof. Donald Knuth by finding an odd-m construction for his open Hamiltonian decomposition problem in about an hour of guided exploration. Prof. Knuth titled the paper Claude’s Cycles. The story didn't end there. The updated paper shows the story got much bigger. For the base case m=3, there are exactly 11,502 Hamiltonian cycles. Of those, 996 generalize to all odd-m, and Prof. Knuth shows there are exactly 760 valid “Claude-like” decompositions in that family. The even case, which Claude couldn’t finish, was then cracked by Dr. Ho Boon Suan using GPT-5.4 Pro to produce a 14-page proof for all even m≥8, with computational checks up to m=2000. Soon after, Dr. Keston Aquino-Michaels used GPT + Claude together to find simpler constructions for both odd and even m, by using the multi-agent workflow. Dr. Kim Morrison also formalized Knuth’s proof of Claude’s odd-case construction in Lean. So yes: the problem now appears fully resolved in the updated paper’s ecosystem of human + AI + proof assistant work! We went from one AI solving one problem to a full mathematical ecosystem (multiple AI systems, multiple humans, formal verification) running in parallel on a problem that stumped experts for weeks. We are living in very interesting times indeed. Paper (updated): https://www-cs-faculty.stanford.edu/~knuth/papers/claude-cycles.pdf

Nitter

Show thread

monkpit 16h ago

So many of the replies are clearly AI. “That’s not X — it’s Y.”

Show thread

breatheoften 17h ago

Like so many things -- the evolution of AI math will I think follow trajectories hinted at in the 90s by the all time great sci-fi author Greg Egan. The nature of math won't change -- but the why of it definitely will. Egan imagined a future ai civilization in Diaspora where "math discovery" -- by nature in the future perhaps accurately described as "mechanistic math discovery" is modeled by society as a kind of salt mine environment in which you can dig for arbitrarily long amounts of time and find new nuggets. The nuggets themselves have a kind of "pure value" as mathematical objects even if they might not have any knowable value outside the mines. Some personalities were interested in and valued the nuggets for their own sake while others didn't but recognized that there were occasionally nuggets found in the mind that had broader appeal.

Research institutes like those founded by Terence Tao in our current present feel like they will align to this future almost perfectly on a long enough timeline -- tho I think on a shorter timeline this area of research is almost certain to provide a ton of useful ways to advance our current ai systems as our current systems are still in a state where literally anything that can generate new information that is "accurate" in some way -- like our current theorem prover engines are enormously valuable parts of our still manually curated training loops.