Leanstral: Open-source agent for trustworthy coding and formal proof engineering
Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37
Leanstral: Open-source agent for trustworthy coding and formal proof engineering
Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37
The real world success they report reminds me of Simon Willison’s Red Green TDD: https://simonwillison.net/guides/agentic-engineering-pattern...
> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.
Given the issues with AWS with Kiro and Github, We already have just a few high-profile examples of what happens when AI is used at scale and even when you let it generate tests which is something you should absolutely not do.
Otherwise in some cases, you get this issue [0].
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...