A cruel irony of coding agents is that everyone who blew off automated testing for the past 20 years is now telling the AI to do TDD all the time.

But because LLMs were trained on decades of their shitty tests, the agents are also terrible at testing.

@searls @rayckeith omfg this, so much

And they can't even see how shite the autogenerated tests are. Seriously. The code might be even passable at times, the tests are so bad it isn't even funny. They meet the coverage metrics, though...

By doing things like testing if a constructor returns non-null or by using reflection to check if a method is implemented or to verify internal state... Or my favorite so far, 200 lines of threading code to test multi threading performance... Of a mock setup earlier in the test.

@searls the amount of time I’ve burned trying to coax Sonnet into using FactoryBot properly 🤯
@searls
And people suddenly start to write long documents to explain to the robots how a project works and how it should be developed, after avoiding writing the simplest README for the last 20 years.

@cdamian you mean that "Flingblat is a rewrite of Blat in Fling, using the snarbleglue framework" is as incomprehensible to a machine as it is to a human? I'm shocked and appalled.

@searls

How I, a non-developer, read the tutorial you, a developer, wrote for me, a beginner - annie's blog

“Hello! I am a developer. Here is my relevant experience: I code in Hoobijag and sometimes jabbernocks and of course ABCDE++++ (but never ABCDE+/^+ are you kidding? ha!)  and I like working with Shoobababoo and occasionally kleptomitrons. I’ve gotten to work for Company1 doing Shoobaboo-ing code things and that’s what led me to the Snarfus. So, let’s dive in! 

annie's blog
@jawnsy there is truly nothing worse than having your hoob-tunnel clogged with gramelions.
@cdamian @searls Oh this is too funny and I think possibly what is happening at my job.
@searls tests are failing. remove tests. tests now passing. nice.

@searls I recently overheard a client raving about how great LLMs are for getting rid of the most annoying part of their work: All those pesky tests they need to write after the code is done.

I did quietly think to myself, if I was going to use LLMs to generate code, I’d do the exact opposite: Hand write the tests, then generate the code that makes them pass.
@mfowler

@philip @searls @mfowler it is incredibly tempting to get LLMs to write unit tests, post facto

Unit tests post facto are useful. They are also a lot of repetitive work. On the face of it a perfect candidate for LLMs.

I tried it. Worked well, on the face of it. I read the tests, mostly sensible.

I have deleted them since. Unit tests are a lot of work to maintain, too. And what is the point?