Mastodawn

A new paper with Bogdan Georgiev, Javier Gomez-Serrano, and Adam Zsolt Wagner: "Mathematical exploration and discovery at scale" https://arxiv.org/abs/2511.02864 , in which we record our experiments using the LLM-powered optimization tool #AlphaEvolve to attack 67 different math problems (both solved and unsolved), improving upon the state of the art in some cases and matching preivous literature in others. The data for these experiments can be found at https://github.com/google-deepmind/alphaevolve_repository_of_problems and further discussion is at https://terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale/

Mathematical exploration and discovery at scale

AlphaEvolve (Novikov et al., 2025) is a generic evolutionary coding agent that combines the generative capabilities of LLMs with automated evaluation in an iterative evolutionary framework that proposes, tests, and refines algorithmic solutions to challenging scientific and practical problems. In this paper we showcase AlphaEvolve as a tool for autonomously discovering novel mathematical constructions and advancing our understanding of long-standing open problems. To demonstrate its breadth, we considered a list of 67 problems spanning mathematical analysis, combinatorics, geometry, and number theory. The system rediscovered the best known solutions in most of the cases and discovered improved solutions in several. In some instances, AlphaEvolve is also able to generalize results for a finite number of input values into a formula valid for all input values. Furthermore, we are able to combine this methodology with Deep Think and AlphaProof in a broader framework where the additional proof-assistants and reasoning systems provide automated proof generation and further mathematical insights. These results demonstrate that large language model-guided evolutionary search can autonomously discover mathematical constructions that complement human intuition, at times matching or even improving the best known results, highlighting the potential for significant new ways of interaction between mathematicians and AI systems. We present AlphaEvolve as a powerful new tool for mathematical discovery, capable of exploring vast search spaces to solve complex optimization problems at scale, often with significantly reduced requirements on preparation and computation time.

arXiv.org

Show thread

allendist57 Nov 6

@tao how good was the generalizer for the model. More specifically were the patterns more something that would have been tedious for humans or something that nontrivial to find based on the finite examples.

Show thread

Terence Tao

@allendist57 IMO 2025 problem 6 https://google-deepmind.github.io/alphaevolve_repository_of_problems/problems/65.html was an interesting case of this: see the discussion in section 43 where AlphaEvolve discovered the construction in Figure 34. The majority of human participants at the 2025 IMO, and none of the AI tools applied to the problem, were able to obtain such a construction. (But one should make the caveat that locating the optimal construction is only one half of this problem; the other half is to prove that the construction is optimal, and AlphaEvolve had no capability on its own to accomplish this. On the other hand, one could imagine that a human attacking this problem who was given this example by AlphaEvolve could use it as inspiration to try to establish a rigorous proof of optimality.)

Math Problem Template

Show thread

allendist57 Nov 7

@tao I meant more so what if a human was given the finite cases. How trivial would it be for them to get the models generalization for any n

Show thread

Terence Tao Nov 7

@allendist57 In our experiments we have been incentivizing AlphaEvolve to come up with solutions that are as interpretable as possible, and have as clear a dependency on the parameter n as possible, so the general solutions that have been found are ones which a human could generalize from seeing the code to generate them for small n (although in the finite field Kakeya and Nikodym examples, if a human was just given the raw set of points rather than the code used to procedurally generate them, it would be significantly harder to discern the pattern).

Show thread

allendist57 Nov 7

@tao thank you for your response. Do you think the generalization aspect will improve soon

Show thread

Aries Grant Nov 10

@tao @allendist57

https://docs.google.com/document/d/109g1saVqS3RQMsGn4bHVqFeSdmu0_r2w3JBoLEhwy3Y/edit?usp=drivesdk