Hypothesis, Antithesis, synthesis
Hypothesis, Antithesis, synthesis
> property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.
There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.
And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.
I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
> There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included.
Given Curry-Howard isomorphism, couldn't we ask AI to directly prove the property of the binary executable under the assumption of the HW model, instead of running PBTs?
By no means I want to dismiss PBTs - but it seems that this could be both faster and more reliable.
Proofs are a form of static analysis. Static analysis can find interesting bugs, but how a system behaves isn't purely a property of source code. It won't tell you whether the code will run acceptably in a given environment.
For example, if memory use isn't modelled, it won't tell you how big the input can be before the system runs out of memory. Similarly, if your database isn't modelled then you need to test with a real database. Web apps need to test with a real web browser sometimes, rather than a simplified model of one. Databases and web browsers are too complicated to build a full-fidelity mathematical model for.
When testing with real systems there's often the issue that the user's system is different from the one you use to test. You can test with recent versions of Chrome and Firefox, etc, which helps a lot, but what about extensions?
Nothing covers everything, but property tests and fuzzers actually run the code in some test environment. That's going to find different issues than proofs will.