Leanstral: Open-source agent for trustworthy coding and formal proof engineering
Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37
Leanstral: Open-source agent for trustworthy coding and formal proof engineering
Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37
It’s great to see this pattern of people realising that agents can specify the desired behavior then write code to conform to the specs.
TDD, verification, whatever your tool; verification suites of all sorts accrue over time into a very detailed repository of documentation of how things are supposed to work that, being executable, puts zero tokens in the context when the code is correct.
It’s more powerful than reams upon reams of markdown specs. That’s because it encodes details, not intent. Your intent is helpful at the leading edge of the process, but the codified result needs shoring up to prevent regression. That’s the area software engineering has always ignored because we have gotten by on letting teams hold context in their heads and docs.
As software gets more complex we need better solutions than “go ask Jim about that, bloke’s been in the code for years”.
That matches what I’ve seen as well — generation is the easy part, validation is the bottleneck.
I’ve been experimenting with a small sparse-regression system that infers governing equations from raw data, and it can produce a lot of plausible candidates quickly. The hard part is filtering out the ones that look right but violate underlying constraints.
For example, it recovered the Sun’s rotation (~25.1 days vs 27 actual) from solar wind data, but most candidate equations were subtly wrong until you enforced consistency checks.
Feels like systems that treat verification as the source of truth (not just an afterthought) are the ones that will actually scale.