Mastodawn

Tao's equational proof challenge accepted (technical report). ~ Lydia Kondylidou, Jasmin Blanchette, Marijn J. H. Heule. https://arxiv.org/abs/2605.21200v1 #ATP #AI4Math

Tao's Equational Proof Challenge Accepted (Technical Report)

In the context of the Equational Theories Project, Terence Tao posed the challenge of finding alternatives to a complicated 62-step proof found by the Vampire superposition prover. We introduce a proof minimization tool called Krympa. Using a combination of brute force and heuristics, and exploiting both Vampire and the Twee equational prover, the tool reduces the 62-step proof to 20 steps, each corresponding to a rewrite. In an empirical evaluation, it also performs well on 1431 equational problems originating from the same project, reducing in particular a 151-step proof to only 10 steps.

arXiv.org

José A. Alonso 13h ago

Reseña de «An OpenAI model has disproved a central conjecture in discrete geometry». https://jaalonso.github.io/vestigium/posts/2026/05/21-an-openai-model-has-disproved-a-central-conjecture-in-discrete-geometry/ #AI4Math

Reseña de «An OpenAI model has disproved a central conjecture in discr

El artículo «An OpenAI model has disproved a central conjecture in discrete geometry» presenta la resolución por parte de un modelo de IA de la conjetura de Erdős sobre distancias unitarias en el plan

Vestigium

José A. Alonso 13h ago

An OpenAI model has disproved a central conjecture in discrete geometry. https://openai.com/index/model-disproves-discrete-geometry-conjecture/ #AI4Math

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics.

OpenAI

José A. Alonso 1d ago

Using Aristotle API for AI-assisted theorem proving in Lean 4: A formalisation case study of the Grasshopper problem. ~ Gabriel Rongyang Lau. https://arxiv.org/abs/2605.20120v1 #AI4Math #LeanProver #ITP

Using Aristotle API for AI-Assisted Theorem Proving in Lean 4: A Formalisation Case Study of the Grasshopper Problem

AI-assisted theorem proving can now generate substantial Lean developments for olympiad-level mathematics, but the evidential status of such developments depends on which declarations are actually verified. This paper reports a Lean 4 formalization case study of an Aristotle API proof attempt for the Grasshopper problem, originally posed as IMO 2009 Problem 6. The generated artifact states a generalized Lean version of the theorem, contains four verified helper lemmas for local components of a maximality and adjacent-swap exchange strategy, and leaves the main theorem grasshopper closed directly by one unresolved sorry. The verified components establish that the final partial sum equals the total sum, that an adjacent transposition can affect only the relevant intermediate partial sum, that the changed partial sum has the expected form, and that maximality at a position admitting an adjacent successor swap forces a corresponding forbidden-set membership fact. The Aristotle output summary identifies the intended remaining mathematical step as the global counting step needed to show that these membership facts produce at least n distinct forbidden values, contradicting the cardinality assumption |M| < n; the Lean source itself does not reduce the main theorem to a separately encoded counting lemma. This case study gives an inspectable example of a central limitation in AI-assisted formalization, namely that local proof search can succeed while the global combinatorial bookkeeping required for a theorem remains unresolved. The paper contributes a reproducible Lean artifact and a precise analysis of its verified and unverified proof content.

arXiv.org

MDR 1d ago

Hey algorithm, please help me get exposed to people who are interested in type theory, theorem proving and Principia Mathematica. I'm formalizing Principia Mathematica in Rocq.

https://github.com/MudroadWhite/Neo-Principia/

If you want to tame the monster created a century ago by Bertrand Russell, here's your chance to pet the dragon. *pat pat* Comment if you are interested in!

#RocqProver #TheoremProving #formalverification #AI4Math #typetheory #PrincipiaMathematica

GitHub - MudroadWhite/Neo-Principia: Continuation on formalizing Principia Mathematica

Continuation on formalizing Principia Mathematica. Contribute to MudroadWhite/Neo-Principia development by creating an account on GitHub.

GitHub

José A. Alonso 1d ago

Readings shared May 19, 2026. https://jaalonso.github.io/vestigium/posts/2026/05/20-readings_shared_05-19-26 #AI4Math #FunctionalProgramming #ITP #LeanProver #Math #RocqProver

Readings shared May 19, 2026

The readings shared in Bluesky on 19 May 2026 are: A shallow dive into formal verification. ~ Vitalik Buterin. #LeanProver #ITP Decidable (Logic in Lean). #LeanProver #ITP #FunctionalProgramming Lean

Vestigium

José A. Alonso 2d ago

Lean meets theoretical computer science: scalable synthesis of theorem proving challenges in formal-informal pairs. ~ Terry Jingchen Zhang et als. https://arxiv.org/abs/2508.15878v1 #AI4Math #LeanProver #ITP

Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

Formal theorem proving (FTP) has emerged as a critical foundation for evaluating the reasoning capabilities of large language models, enabling automated verification of mathematical proofs at scale. However, progress has been constrained by limited datasets due to the high cost of manual curation and the scarcity of challenging problems with verified formal-informal correspondences. We propose leveraging theoretical computer science (TCS) as a scalable source of rigorous proof problems, where algorithmic definitions enable automated generation of arbitrarily many challenging theorem-proof pairs. We demonstrate this approach on two TCS domains: Busy Beaver problems, which involve proving bounds on Turing machine halting behavior, and Mixed Boolean Arithmetic problems, which combine logical and arithmetic reasoning. Our framework automatically synthesizes problems with parallel formal (Lean4) and informal (Markdown) specifications, creating a scalable pipeline for generating verified proof challenges. Evaluation on frontier models reveals substantial gaps in automated theorem proving: while DeepSeekProver-V2-671B achieves 57.5\% success on Busy Beaver problems, it manages only 12\% on Mixed Boolean Arithmetic problems. These results highlight the difficulty of long-form proof generation even for problems that are computationally easy to verify, demonstrating the value of TCS domains for advancing automated reasoning research.

arXiv.org

José A. Alonso 3d ago

From LLM-generated conjectures to Lean formalizations: automated polynomial inequality proving via sum-of-squares certificates. ~ Ruobing Zuo, Hanrui Zhao, Gaolei He, Zhengfeng Yang, Jianlin Wang. https://arxiv.org/abs/2605.15445v1 #AI4Math #LeanProver #ITP

From LLM-Generated Conjectures to Lean Formalizations: Automated Polynomial Inequality Proving via Sum-of-Squares Certificates

Automated proving of polynomial inequalities is a fundamental challenge in automated mathematical reasoning, where rich algebraic structure and a rapidly growing certificate search space hinder scalability. Purely symbolic approaches provide strong guarantees but often scale poorly as the number of variables or the degree increases, due to expensive algebraic manipulations and rapidly growing intermediate expressions. In parallel, LLM-guided methods have made notable progress, particularly on competition-style inequalities with a small number of variables. To address the remaining scalability challenges, we propose NSPI, a neuro-symbolic framework that combines the complementary strengths of LLMs and symbolic computation for polynomial-inequality proving. Concretely, an LLM proposes a conjecture in the form of an approximate polynomial Sum-Of-Squares (SOS) decomposition; we refine it via symbolic computation to obtain an exact polynomial SOS representation, which directly proves the target inequality, and we further certify the proof in Lean, yielding an end-to-end pipeline from heuristic discovery to machine-checked proof. Experiments on challenging benchmarks involving polynomials with up to 10 variables demonstrate the effectiveness and scalability of the proposed method.

arXiv.org

José A. Alonso 6d ago

Readings shared May 14, 2026. https://jaalonso.github.io/vestigium/posts/2026/05/15-readings_shared_05-14-26 #AI #AI4Math #CoqProver #Emacs #ITP #IsabelleHOL #LeanProver #Logic #LogicProgramming #Math #Prolog

Readings shared May 14, 2026

The readings shared in Bluesky on 14 May 2026 are: An Arrow-theoretic impossibility theorem for the ordinal MVP aggregation problem. ~ Arjun Trivedi. #LeanProver #ITP Formal conjectures: An open and

Vestigium

José A. Alonso May 14

Automated conjecturing in mathematics with TxGraffiti. ~ Randy Davila. https://arxiv.org/abs/2409.19379v1 #AI4Math

Automated conjecturing in mathematics with \emph{TxGraffiti}

\emph{TxGraffiti} is a data-driven, heuristic-based computer program developed to automate the process of generating conjectures across various mathematical domains. Since its creation in 2017, \emph{TxGraffiti} has contributed to numerous mathematical publications, particularly in graph theory. In this paper, we present the design and core principles of \emph{TxGraffiti}, including its roots in the original \emph{Graffiti} program, which pioneered the automation of mathematical conjecturing. We describe the data collection process, the generation of plausible conjectures, and methods such as the \emph{Dalmatian} heuristic for filtering out redundant or transitive conjectures. Additionally, we highlight its contributions to the mathematical literature and introduce a new web-based interface that allows users to explore conjectures interactively. While we focus on graph theory, the techniques demonstrated extend to other areas of mathematics.

arXiv.org