Leanstral: Open-source agent for trustworthy coding and formal proof engineering

Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37

https://mistral.ai/news/leanstral

It’s great to see this pattern of people realising that agents can specify the desired behavior then write code to conform to the specs.

TDD, verification, whatever your tool; verification suites of all sorts accrue over time into a very detailed repository of documentation of how things are supposed to work that, being executable, puts zero tokens in the context when the code is correct.

It’s more powerful than reams upon reams of markdown specs. That’s because it encodes details, not intent. Your intent is helpful at the leading edge of the process, but the codified result needs shoring up to prevent regression. That’s the area software engineering has always ignored because we have gotten by on letting teams hold context in their heads and docs.

As software gets more complex we need better solutions than “go ask Jim about that, bloke’s been in the code for years”.

I feel like the difference is minimal, if not entirely dismissable. Code in this sense is just a representation of the same information as someone would write in an .md file. The resolution changes, and that's where both detail and context are lost.

I'm not against TDD or verification-first development, but I don't think writing that as code is the end-goal. I'll concede that there's millions of lines of tests that already exist, so we should be using those as a foundation while everything else catches up.

Tests (and type-checkers, linters, formal specs, etc.) ground the model in reality: they show it that it got something wrong (without needing a human in the loop). It's empiricism, "nullius in verba"; the scientific approach, which lead to remarkable advances in a few hundred years; that over a thousand years of ungrounded philosophy couldn't achieve.
This assumes that tests are realistic, which for the most part they are not.