

Thanks for sharing this. I think the workflow with Claude Code can be very different and much less hands-on than you approach it in the video.
I have recently taken an old paper of mine and tried to let claude code formalize it with only very minimal intervention.
After two weeks one not entirely trivial Gröbner basis proof (Theorem 1.1) is done. Currently it is working on a primary decomposition statement:
https://github.com/tom111/BEI-lean
Will share more details in a forthcoming blog post.
I have limited knowledge of lean, so I let claude read the original .tex source and let it make its own long term and short term plan, write them in todo files and its own memory files and so on.
The only thing I have interfered with are some design choices in the beginning about how to set up the indeterminates in the polynomial ring and their ordering (which determines the term ordering).
In total I am using a lot more tokens than you and I use a dedicated machine with no personal data. Then I can let claude modify files on that machine as it likes.
I think the formalization of the paper is half-way done now and I have spent less than 10 hours of my time with this.
I think the De Bruijn factor will very reasonably if and when this finishes.
I think if humans would ever want to read these formalizations, then one would need to go through the proofs one by one and make the kind of manual optimization and streamlining that you also show in the middle of the video.
At the moment the "interface to human math" in my project are only the statements. I cannot assess how unnecessarily complicated the proofs are. I only ever look at the definitions and statements.
@tomkalei You just have to hope that the machine doesn't find a proof of False and is using that as a lemma. @tao
Proof assistants are real pieces of software, running practical languages; from time to time they have bugs that can affect their core purpose, which is certifying a result as correctly proved. Getting a proof of 'false' is *highly* undesirable, but usually not seen in standard applications. But now people are vibe-coding proofs and LLM-powered agents' workflows can sometimes find and abuse extreme corner cases to get the kernel checker's tick of correctness. So it behoves people to really dig around in a non-critical scenario to uncover these so they can be fixed by the owners/developers of the proof assistant. https://tristan.st/blog/in_search_of_falsehood
Thanks for this link. Very interesting research.
But of course I'm working under the assumption that one cannot prove "False" in lean in the same way I work under the assumption that ZFC is consistent.
At some point you have to rely on an external system, but it's good to keep this in mind.
Ah, and I also ask the machine to follow the actual proofs in the paper, so it is not coming up with new proofs. What I'm doing is auto-formalization of human created proofs.
But I have not checked in detail if it has actually followed these instructions. That is certainly true.