Readings shared June 5, 2026

The readings shared in Bluesky on 5 June 2026 are: A formal proof of the Ramanujan-Nagell theorem in Lean 4. ~ Barinder S. Banwait. #LeanProver #ITP #AI4Math Certificate-based verification of derivat

Vestigium
Lean-GAP: A dataset of formalized graduate algebra problems. ~ Seewoo Lee et als. https://arxiv.org/abs/2606.02588 #AI4Math #Autoformalization #LeanProver #ITP
Lean-GAP: A Dataset of Formalized Graduate Algebra Problems

We present Lean-GAP (Lean-Graduate Agebra Problems), 430 formalized graduate-level algebra problems from the textbook Abstract Algebra by Dummit and Foote. We develop a scalable pipeline consisting of PDF-to-LaTeX preprocessing, autoformalization into Lean 4, and verification of informal-formal correspondence. While the preprocessing and autoformalization stages can be largely automated, we find that verification remains the most subtle and labor-intensive component, requiring careful human oversight. Our contributions include (i) the construction of a structured dataset of formalized exercises, (ii) a systematic methodology for formalizing textbook mathematics, and (iii) an analysis of recurring challenges in the formalization process. We also compare the performance of different autoformalization models and highlight key bottlenecks in translating informal statements into formal language.

arXiv.org

DeepMind's new formal-proof-search agent resolved ~9 of 353 open Erdős problems in Lean, ~$100s each. I cloned the repo. The thing I can't stop thinking about: the agent's entire contribution to the statement of one 80-year-old-style question was a single token — False — followed by a sorry-free proof.

New post on what that machine does and doesn't solve: https://korbonits.com/blog/2026-05-28-who-verifies-the-verifier/

#math #Lean #autoformalization

Who Verifies the Verifier

An AI built the machine I said mathematics needed — a compiler that verifies proofs for cents instead of expert weekends. The catch is what it still can't read.

Readings shared April 24, 2026

The readings shared in Bluesky on 24 April 2026 are: Deep Vision: A formal proof of Wolstenholmes theorem in Lean 4. ~ Alexandre Linhares. #LeanProver #ITP #AI4Math Discover and prove: An open-source

Vestigium
Do LLMs game formalization? Evaluating faithfulness in logical reasoning. ~ Kyuhee Kim, Auguste Poiroux, Antoine Bosselut. https://arxiv.org/abs/2604.19459v1 #AI4Math #LeanProver #ITP #Autoformalization
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

Formal verification guarantees proof validity but not formalization faithfulness. For natural-language logical reasoning, where models construct axiom systems from scratch without library constraints, this gap between valid proofs and faithful translations is especially acute. We investigate whether frontier models exploit this gap when generating Lean 4 proofs, a behavior we term formalization gaming. We evaluate GPT-5 and DeepSeek-R1 on 303 first-order logic problems (203 from FOLIO, 100 from Multi-LogiEval), comparing unified generation against a two-stage pipeline that separates formalization from proving. Despite compilation rates of 87-99%, we find no evidence of systematic gaming in unified generation: models prefer reporting failure over forcing proofs, even under prompting designed to encourage it. However, unfaithfulness that evades our detection signals may still occur. The two-stage pipeline reveals two distinct modes of unfaithfulness: GPT-5 fabricates axioms during proof generation, a reactive fallback detectable via cross-stage comparison, while DeepSeek-R1 mistranslates premises during formalization, producing internally consistent outputs that evade detection entirely. These findings show that high compilation rates or accuracies should not be equated with faithful reasoning. Code and data are available at https://github.com/koreankiwi99/formalization-gaming.

arXiv.org
Readings shared April 14, 2026

The readings shared in Bluesky on 14 April 2026 are: A formal proof of the Ramanujan-Nagell theorem in Lean 4. ~ Barinder S. Banwait. #LeanProver #ITP #AI4Math Formalization of De Giorgi-Nash-Moser t

Vestigium
Munkres' general topology autoformalized in Isabelle/HOL. ~ Dustin Bryant, Jonathan Julián Huerta y Munive, Cezary Kaliszyk, Josef Urban. https://arxiv.org/abs/2604.07455v1 #IsabelleHOL #ITP #AI4Math #Autoformalization
Munkres' General Topology Autoformalized in Isabelle/HOL

We describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres' Topology (general topology, Chapters 2--8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. The formalization is complete: all 806 formal results are fully proved with zero sorry's. Proved results include the Tychonoff theorem, the Baire category theorem, the Nagata--Smirnov and Smirnov metrization theorems, the Stone--Čech compactification, Ascoli's theorem, the space-filling curve, and others. The methodology is based on a "sorry-first" declarative proof workflow combined with bulk use of sledgehammer - two of Isabelle major strengths. This leads to relatively fast autoformalization progress. We analyze the resulting formalization in detail, analyze the human--LLM interaction patterns from the session log, and briefly compare with related autoformalization efforts in Megalodon, HOL Light, and Naproche. The results indicate that LLM-assisted formalization of standard mathematical textbooks in Isabelle/HOL is quite feasible, cheap and fast, even if some human supervision is useful.

arXiv.org
Readings shared April 4, 2026

The readings shared in Bluesky on 4 April 2026 are: Why Lean?. ~ Leonardo de Moura. #LeanProver #ITP A formalization of the Gelfond-Schneider theorem. ~ Michail Karatarakis, Freek Wiedijk. #LeanProve

Vestigium
Readings shared March 24, 2026

The readings shared in Bluesky on 24 March 2026 are: Synthetic differential geometry in Lean. ~ Riccardo Brasca, Gabriella Clemente. #LeanProver #ITP #Math The spectral comb and the Riemann hypothesi

Vestigium