Mastodawn

Leanstral: Open-source agent for trustworthy coding and formal proof engineering

Lean 4 paper (2021): https://dl.acm.org/doi/10.1007/978-3-030-79876-5_37

It’s great to see this pattern of people realising that agents can specify the desired behavior then write code to conform to the specs.

TDD, verification, whatever your tool; verification suites of all sorts accrue over time into a very detailed repository of documentation of how things are supposed to work that, being executable, puts zero tokens in the context when the code is correct.

It’s more powerful than reams upon reams of markdown specs. That’s because it encodes details, not intent. Your intent is helpful at the leading edge of the process, but the codified result needs shoring up to prevent regression. That’s the area software engineering has always ignored because we have gotten by on letting teams hold context in their heads and docs.

As software gets more complex we need better solutions than “go ask Jim about that, bloke’s been in the code for years”.

Show thread

bluGill Mar 17

> That’s because it encodes details, not intent.

Be careful here - make sure you encode the right details. I've seen many cases where the tests are encoding the details of how it was implemented and not what it is intended to do. This means that you can't refactor anything because your tests are enforcing a design. (refactor is changing code without deleting tests, the trick is how can you make design changes without deleting tests - which means you have to test as much as possible at a point where changing that part of the design isn't possible anyway)

Show thread

necovek Mar 17

While you are right that you need to be encoding the right details, I disagree on the tests enforcing a design point.

As part of the proper testing strategy, you will have tests that cover individual behavior of a small block/function (real "unit" tests), tests that cover integration points only up to the integration itself, and a small number of end-to-end or multi-component integration tests.

Only the last category should stay mostly idempotent under refactoring, depending on the type of refactor you are doing.

Integration tests will obviously be affected when you are refactoring the interfaces between components, and unit tests will be affected when you are refactoring the components themselves. Yes, you should apply the strategy that keeps it under incremental reverse TDD approach (do the refactor and keep the old interface, potentially by calling into new API from the old; then in second step replace use of old API as well, including in tests).

Tests generally define behavior and implementation in a TDD approach: it'd be weird if they do not need changing at all when you are changing the implementation.

Show thread

bluGill Mar 17

Fine, but don't check in the tests that prove implementation since they will be deleted soon anyway. The only tests to check in are ones that - by failing - informed you that you broke something. We don't know which those tests are and because most tests run fast we tend to check in lots of tests that will never fail in a useful way.

Show thread

necovek Mar 18

Taken to its logical conclusion, what you are saying is do not write (or commit? but in practice, why write them if not to run in CI) any tests except for end-to-end tests covering actual use cases. In theory, even make them generic enough so they are not affected by the implementation. Perhaps even employ LLMs there ("check that a customer can provide their address for their order by using a headless browser").

It is a strong disagree from me: end-to-end tests have always been fragile and slow, and feedback loop time is the boundary at which any coder (agentic or human) needs to operate on. If your agents need to wait 2h to see if their every change is valid, you'll be beat by humans doing properly structured "just enough" testing.

Show thread

bluGill Mar 18

That isn't the logical conclusion though. I specificly said find places where change would be too complex to attempt anyway which breaks your conclusion. This lets you find plenty of places to jump in and write a test. (You will still be wrong but less often and generally you know it will be a hard change before you start making it)

though I find in practice end to end tests are not that fragile. It took us a decade of effort to find and mitigate all the little issues that make them fragile though so perhaps you don't want to go that far. I wish I could make the end to end tests faster though, but fragile they are not.

Show thread

necovek

I am a bit perplexed at your claim that end-to-end tests are not fragile in general (my claim) with a counter how you spent a decade on not making them fragile in one particular case?

I am not disagreeing most projects evolve test suites which have duplicated, useless tests as a majority. But it can be done better.

Show thread

bluGill 6d ago

I have also spent a decade on some tiny unit tests that were fragile.