Hypothesis, Antithesis, synthesis
Hypothesis, Antithesis, synthesis
> property-based testing is going to be a huge part of how we make AI-agent-based software development not go terribly.
There's no doubt, I think, testing will remain important and possibly become more important with more AI use, and so better testing is helpful, PBT included. But the problem remains verifying that the tests actually test what they're supposed to. Mutation tests can allow agents to get good coverage with little human intervention, and PBT can make tests better and more readable. But still, people have to read them and understand them, and I suspect that many people who claim to generate thousands of LOC per day don't.
And even if the tests were great and people carefully reviewed them, that's not enough to make sure things don't go terribly wrong. Anthropic's C compiler experiment didn't fail because of bad testing. Not only were the tests good, it took humans years to write the tests by hand, and the agents still failed to converge.
I think good tests are a necessary condition for AI not generating terrible software, but we're clearly not yet at a point where they're a sufficient one. So "a huge part" - possibly, but there are other huge parts still missing.
> t took humans years to write the tests by hand, and the agents still failed to converge.
I think there is some hazard in assuming that what agents fail at today they will continue to fail on in the future.
What I mean is, if we take the optimistic view of agents continuing to improve on the trajectory they have started at for one or two years, then it is worth while considering what tools and infrastructure we will need for them. Companies that start to build that now for the future they assume is coming are going to be better positioned than people who wake up to a new reality in two years.