Despite not yet being a benchmark, the First Proof project is by far the best measure of model usefulness for science and math research available today, and I very much hope that frontier labs continue to take future rounds seriously.

https://www.daniellitt.com/blog/2026/2/20/mathematics-in-the-library-of-babel

#firstProof #mathematics #AI #machineLearning #research

Mathematics in the Library of Babel — Daniel Litt

Mathematics isn't only about saying true things. It's about asking the right questions, being confused, stumbling about, getting distracted, being wrong, recognizing when you're wrong, being stuck. Mostly being stuck. It's about clinging to a giant edifice and feeling it out until you understand som

Daniel Litt

OpenAI veröffentlicht Lösungsansätze für den First Proof Wettbewerb.

Der Test enthält unveröffentlichte Mathematikaufgaben, um Reasoning ohne Trainingsdaten-Vorwissen zu testen. Laut OpenAI wurden mehrere Probleme gelöst. Die externe Validierung der formalen Beweise durch die Initiatoren steht derzeit noch aus. #OpenAI #FirstProof #JamesRLee
https://www.all-ai.de/news/beitrage2026/mathe-first-proof

OpenAI stellt KI vor die schwerste Mathematik-Aufgabe der Welt

Der First Proof Test prüft künstliche Intelligenz mit völlig neuen Problemen auf Forschungsniveau, fernab von bekannten Trainingsdaten.

All-AI.de

Kimon Fountoulakis (@kfountou)

작성자는 해당 결과가 진정한 일반화였는지, 어떤 의미에서 일반화인지 의문을 제기합니다. 사람들이 'first proof'라 말할 때 보통 문헌에서 완전한 종단 간(end-to-end) 증명을 스스로 찾지 못했을 뿐 핵심 단계들은 이미 존재했을 가능성이 크다고 지적하며, '첫 증명'의 정의와 주장 검증의 중요성을 강조합니다.

https://x.com/kfountou/status/2022670003191902263

#research #proofs #ai #firstproof

Kimon Fountoulakis (@kfountou) on X

@harshit_sikchi Well, was it really a generalization? And if so, in what sense? I think we are about to see that when humans say “first proof”, they usually mean they couldn’t find the complete end-to-end proof in the literature themselves, even though core steps might already exist.

X (formerly Twitter)

Jakub Pachocki (@merettm)

저자는 'First Proof' 챌린지에 큰 기대를 표하며, 차세대 AI 모델 능력 평가에 있어 새로운 최전선 연구가 중요하다고 강조합니다. 내부적으로 제한된 인간 감독 하에 제안된 10개 문제에 대해 자사 모델을 실행했다고 밝히며, 이는 AI의 수학적 증명 능력과 자율성 평가에 관한 중요한 실험임을 시사합니다.

https://x.com/merettm/status/2022517085193277874

#firstproof #ai #theoremproving #research #ml

Jakub Pachocki (@merettm) on X

Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models. We have run our internal model with limited human supervision on the ten proposed problems. The

X (formerly Twitter)
si que engancha esto si... #firstProof

Please help promote this project called "First Proof" led by Mohammed Abouzaid (Stanford), Nikhil Srivastava (Cal), Rachel Ward (UT Austin), and Lauren Williams (Harvard). The goal is to understand the capabilities of AI systems on problems that come up in math research. We have a collection of research problems for which solutions have not yet been posted online, so it's a good testbed. The solutions will come out in just one week. Take a crack at it! #FirstProof #1stProof

https://arxiv.org/abs/2602.05192

First Proof

To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.

arXiv.org