This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem.
My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.
TBF PBT has been the present in Python for a while now.
10 years ago might have been a little early (Hypothesis 1.0 came out 11 years ago this coming Thursday), but we had pretty wide adoption by year two and it's only been growing. It's just that the other languages have all lagged behind.
It's by no means universally adopted, but it's not a weird rare thing that nobody has heard of.
> But the problem remains verifying that the tests actually test what they're supposed to.
Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.
I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.
> So "a huge part" - possibly, but there are other huge parts still missing.
No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.
Conversation with Will (Antithesis CEO) a couple months ago, heavily paraphrased:
Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."
Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"
Will: "Well someone will definitely complain about it on hacker news."
Me: "That's true. Is that a problem?"
Will: "No, probably not."
(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)