Mastodawn

> I have generally found that agents with property-based tests are much better at not lying to themselves

I also observed the cheating to increase. I recently tried to do a specific optimization on a big complex function. Wrote a PBT that checks that the original function returns the same values as the optimized function on all inputs. I also tracked the runtime to confirm that performance improved. Then I let Claude loose. The PBT was great at spotting edge cases but eventually Claude always started cheating: it modified the test, it modified the original function, it implemented other (easier) optimizations, ...

Official	https://
Support this service	https://www.patreon.com/birddotmakeup