Leanstral: Open-source agent for trustworthy coding and formal proof engineering

Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37

https://mistral.ai/news/leanstral

The real world success they report reminds me of Simon Willison’s Red Green TDD: https://simonwillison.net/guides/agentic-engineering-pattern...

> Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.

Red/green TDD - Agentic Engineering Patterns

Red/green TDD - Agentic Engineering Patterns

Simon Willison’s Weblog
If Agent is writing the tests itself, does it offer better correctness guarantees than letting it write code and tests?
It is definitely not foolproof but IMHO, to some extent, it is easier to describe what you expect to see than to implement it so I don't find it unreasonable to think it might provide some advantages in terms of correctness.
That definitely depends upon the situation. More often than not, properly testing a component takes me more time than writing it.
In my experience, this tends to be more related to instrumentation / architecture than a lack of ability to describe correct results. TDD is often suggested as a solution.