Hypothesis, Antithesis, synthesis
Hypothesis, Antithesis, synthesis
> property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.
There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.
And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.
I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
I actually think there's another angle here where PBT helps, which wasn't explored in the blog post.
That angle is legibility. How do you know your AI-written slop software is doing the right thing? One would normally read all the code. Bad news: that's not much less labor intensive as not using AI at all.
But, if one has comprehensive property-based tests, they can instead read only the property-based tests to convince themselves the software is doing the right thing.
By analogy: one doesn't need to see the machine-checked proof to know the claim is correct. One only needs to check the theorem statement is saying the right thing.