DeepMind’s New AI Found A Strange New Way To Think
https://www.youtube.com/watch?v=Dkqzqw8rxXI
The discussed paper:
https://arxiv.org/html/2605.22763v1
https://github.com/google-deepmind/alphaproof-nexus-results
#AI #LLM #DeepMind #AlphaProof #Math #FormalProof #LeanProver #ErdosProblems #Erdős

@xenaproject Are you no longer posting here?
Your LinkedIn post https://www.linkedin.com/posts/kevin-buzzard-b6617b374_so-heres-a-bit-of-a-mad-story-on-wednesday-share-7468682184111984642-DKZD/ about the formalization of the Golod-Shafarevich => sum-product result by the Formal Frontier team using Claude and ChatGPT is super interesting. Please repost here. Thank you.

So here's a bit of a mad story. On Wednesday last week Bloom, Sawin, Schildkraut, and Zhelezov released a preprint https://lnkd.in/eJhwHW8i disproving the Sum-product conjecture https://lnkd.in/ewwMM-X9 . They used the Golod-Shafarevich theorem (a result in number theory) in an innovative way, inspired by the disproof from two weeks ago by ChatGPT of Erdos' unit distance conjecture. The Golod-Shafarevich theorem (from the 60s) is hard: one needs much of global class field theory to prove it. On Tuesday this week the Formal Frontier team at the Mathlib Initiative released a full formalization of the Golod-Shafarevich => Sum-product result https://lnkd.in/ensirMFh , which they had achieved over the weekend. The majority of the formalization was done using just off the shelf Claude and ChatGPT accounts (one of each) via a completely autonomous process. However there were some intermediate results which could not be done this way, and these were done using Formal Frontier's new multi-agent orchestrator -- in fact this was the first real test for the orchestrator, and it passed, proving e.g. meromorphic continuation and functional equation of the Dedekind zeta function of a number field. This means that the paper itself has been completely formally verified by AI tools, under a week from its release, and mostly (but not completely) entirely autonomously. It might be natural to have questions about the details of the formalization -- exactly what tools were used, how much it cost and so on. The team have a (human-readable) file `formalization.yaml` https://lnkd.in/eMP8jhUr which they would like to encourage everyone autoformalizing mathematics to use. The file has explicit answers to all of these questions and more. | 10 comments on LinkedIn

We describe a library of mathematical finance built in the Lean 4 proof assistant, on top of Mathlib and the BrownianMotion package. It is broad: more than two hundred sorry-free theorems across eleven areas, from the measure-theoretic foundations of continuous-time stochastic calculus through derivative pricing to applied risk, portfolio, and fixed-income theory, and, to our knowledge, the most comprehensive machine-checked development of mathematical finance to date. Breadth is the setting, not the point. Two things make it more than a catalogue. It reaches into the continuous theory far enough to construct the L2 Itô integral as a bounded linear isometry and to derive, rather than assume, the risk-neutral pricing measure. And it audits its own faithfulness: every result is classified by how its Lean statement relates to the mathematics it claims, and a build-enforced gate pins the axioms each proof actually uses, so a reader can see precisely what has been proved and what has only been proved under added hypotheses. We close with a candid finding: a formal base over classical financial mathematics yields certified unification of known results rather than new financial theory. The contribution is therefore methodological and infrastructural, reusable verified foundations for mathematical finance, together with the faithfulness audit.


StarkWare's S-two prover provides an efficient means for establishing, on blockchain, that a program written in the Cairo virtual machine language runs to completion. The latter claim is encoded by an algebraic intermediate representation (AIR) that captures the semantics of the Cairo language. The AIR asserts the existence of tables of values from a finite field satisfying certain algebraic constraints. A cryptographic interactive proof system, circle STARK, provides an efficiently-checked certificate that the AIR is satisfied. We describe our verification, using the Lean 4 proof assistant, that the AIR encoding is sound, which is to say, the satisfiability of the AIR implies the computational claim.

While Large Language Models (LLMs) have shown strong performance in generating formal proofs, their outputs often remain less readable, modular, maintainable, and reusable than proofs in mature formal mathematics libraries. We argue that this gap stems in part from the compile-first objective implicit in most proof-generation pipelines, which encourages monolithic or ad hoc proof scripts rather than library-quality artifacts. Existing approaches to proof-quality improvement often rely on explicit, computable optimization objectives. In practice, however, the most tractable and experimentally validated objectives are largely length-based, while higher-level qualities such as readability, modularity, maintainability, and reusability are difficult to reduce to reliable automatic metrics. Instead of optimizing proof improvement against a single proxy metric, we take a process-guided approach inspired by human proof-refactoring workflows. We propose an agentic framework $\textbf{Proof-Refactor}$ that decomposes proof refactoring into four phases: extracting candidate proof fragments, designing helper declarations, formally proving the extracted and designed components, and repairing the original proof using the verified components. On generated Lean proofs from PutnamBench and Putnam2025, Proof-Refactor improves rubric-based refactoring scores over a strong Claude Code refactoring baseline, with the largest gains in signature quality and human readability. These results suggest that process-guided refactoring can improve proof structure without treating proof length as the primary objective.