Mastodawn

Readings shared April 24, 2026. https://jaalonso.github.io/vestigium/posts/2026/04/25-readings_shared_04-24-26 #AI4Math #Autoformalization #CompSci #CoqProver #FunctionalProgramming #HOL #Haskell #ITP #IsabelleHOL #LeanProver #Logic #Math #Nqthm #RocqProver

Readings shared April 24, 2026

The readings shared in Bluesky on 24 April 2026 are: Deep Vision: A formal proof of Wolstenholmes theorem in Lean 4. ~ Alexandre Linhares. #LeanProver #ITP #AI4Math Discover and prove: An open-source

Vestigium

José A. Alonso Apr 22

Do LLMs game formalization? Evaluating faithfulness in logical reasoning. ~ Kyuhee Kim, Auguste Poiroux, Antoine Bosselut. https://arxiv.org/abs/2604.19459v1 #AI4Math #LeanProver #ITP #Autoformalization

Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

Formal verification guarantees proof validity but not formalization faithfulness. For natural-language logical reasoning, where models construct axiom systems from scratch without library constraints, this gap between valid proofs and faithful translations is especially acute. We investigate whether frontier models exploit this gap when generating Lean 4 proofs, a behavior we term formalization gaming. We evaluate GPT-5 and DeepSeek-R1 on 303 first-order logic problems (203 from FOLIO, 100 from Multi-LogiEval), comparing unified generation against a two-stage pipeline that separates formalization from proving. Despite compilation rates of 87-99%, we find no evidence of systematic gaming in unified generation: models prefer reporting failure over forcing proofs, even under prompting designed to encourage it. However, unfaithfulness that evades our detection signals may still occur. The two-stage pipeline reveals two distinct modes of unfaithfulness: GPT-5 fabricates axioms during proof generation, a reactive fallback detectable via cross-stage comparison, while DeepSeek-R1 mistranslates premises during formalization, producing internally consistent outputs that evade detection entirely. These findings show that high compilation rates or accuracies should not be equated with faithful reasoning. Code and data are available at https://github.com/koreankiwi99/formalization-gaming.

arXiv.org

José A. Alonso Apr 15

Readings shared April 14, 2026. https://jaalonso.github.io/vestigium/posts/2026/04/15-readings_shared_04-14-26 #AI4Math #Autoformalization #FunctionalProgramming #Haskell #ITP #IsabelleHOL #LeanProver #Logic #LogicProgramming #Math #Prolog #RocqProver

Readings shared April 14, 2026

The readings shared in Bluesky on 14 April 2026 are: A formal proof of the Ramanujan-Nagell theorem in Lean 4. ~ Barinder S. Banwait. #LeanProver #ITP #AI4Math Formalization of De Giorgi-Nash-Moser t

Vestigium

José A. Alonso Apr 10

Munkres' general topology autoformalized in Isabelle/HOL. ~ Dustin Bryant, Jonathan Julián Huerta y Munive, Cezary Kaliszyk, Josef Urban. https://arxiv.org/abs/2604.07455v1 #IsabelleHOL #ITP #AI4Math #Autoformalization

Munkres' General Topology Autoformalized in Isabelle/HOL

We describe an experiment in LLM-assisted autoformalization that produced over 85,000 lines of Isabelle/HOL code covering all 39 sections of Munkres' Topology (general topology, Chapters 2--8), from topological spaces through dimension theory. The LLM-based coding agents (initially ChatGPT 5.2 and then Claude Opus 4.6) used 24 active days for that. The formalization is complete: all 806 formal results are fully proved with zero sorry's. Proved results include the Tychonoff theorem, the Baire category theorem, the Nagata--Smirnov and Smirnov metrization theorems, the Stone--Čech compactification, Ascoli's theorem, the space-filling curve, and others. The methodology is based on a "sorry-first" declarative proof workflow combined with bulk use of sledgehammer - two of Isabelle major strengths. This leads to relatively fast autoformalization progress. We analyze the resulting formalization in detail, analyze the human--LLM interaction patterns from the session log, and briefly compare with related autoformalization efforts in Megalodon, HOL Light, and Naproche. The results indicate that LLM-assisted formalization of standard mathematical textbooks in Isabelle/HOL is quite feasible, cheap and fast, even if some human supervision is useful.

arXiv.org

José A. Alonso Apr 5

Readings shared April 4, 2026. https://jaalonso.github.io/vestigium/posts/2026/04/04-readings_shared_04-04-26 #AI #AI4Math #ATP #Agda #AlphaProof #Autoformalization #CategoryTheory #CoqProver #FunctionalProgramming #ITP #IsabelleHOL #LLMs #LambdaCalculus #LeanProver #Lisp #Logic #LogicProgramming #LLMs #Math #Physics #Programming #Prolog #Racket #RocqProver #Vampire

Readings shared April 4, 2026

The readings shared in Bluesky on 4 April 2026 are: Why Lean?. ~ Leonardo de Moura. #LeanProver #ITP A formalization of the Gelfond-Schneider theorem. ~ Michail Karatarakis, Freek Wiedijk. #LeanProve

Vestigium

José A. Alonso Mar 25

Readings shared March 24, 2026. https://jaalonso.github.io/vestigium/posts/2026/03/25-readings_shared_02-24-26 #AI4Math #Autoformalization #FunctionalProgramming #Haskell #ITP #IsabelleHOL #LeanProver #Math

Readings shared March 24, 2026

The readings shared in Bluesky on 24 March 2026 are: Synthetic differential geometry in Lean. ~ Riccardo Brasca, Gabriella Clemente. #LeanProver #ITP #Math The spectral comb and the Riemann hypothesi

Vestigium

José A. Alonso Mar 24

OpenGauss: an open source, state of the art autoformalization harness. https://www.math.inc/opengauss #AI4Math #LeanProver #ITP #Autoformalization

OpenGauss: an open source, state of the art autoformalization harness — Math, Inc.

OpenGauss is a state-of-the-art open-source autoformalization harness for Lean, built for practical proof engineering workflows.

Math, Inc.

#BruceSterling Mar 5

*In some better world, Ukrainian women aren't getting blasted from the sky for ten years, but are just having kids while doing lots of award-winning math about extradimensionality #autoformalization en.wikipedia.org/wiki/Maryna_...

Maryna Viazovska - Wikipedia

Maryna Viazovska - Wikipedia

Thomas Kahle Feb 26

Diary of #autoformalization

The methodology and approaches are not too far from how my brain works or what would be the output if I would record every of my thoughts on such a problem.

But it also becomes clear that claude code is a beginner who does lots of trial-and-error and copy-paste coding.

sayzard Jan 21

fly51fly (@fly51fly)

J. Urban의 논문은 단기간(2주) 동안 13만 줄 규모의 형식적 위상수학(formal topology)을 자동 형식화(autoformalization)로 생성한 작업을 보고합니다. 비용과 복잡도를 낮춘 간단한 방법을 제안해 누구나 자동 형식화에 접근할 수 있게 하는 접근법과 실험 결과를 제시하며 정리된 데이터셋과 파이프라인을 공개합니다 (arXiv:2601.03298).

https://x.com/fly51fly/status/2013735633425146080

#autoformalization #formalization #theoremproving #automatedreasoning

fly51fly (@fly51fly) on X

[LG] 130k Lines of Formal Topology in Two Weeks: Simple and Cheap Autoformalization for Everyone? J Urban [AI4REASON] (2026) https://t.co/pGBD5M2ThS

X (formerly Twitter)