One of the lessons I learned from going back to school for CS was to be suspicious of code that worked as intended the first time.

Writing unit tests before or concurrently was critical to discovering ways the code might fail and in the process understand how the program was operating.

The meta goal became to automatically distrust things that worked without anyone knowing why.

Why?

Because if you don’t know why it worked before you have no idea if it will continue working.

All of the above was, “Everyone knows” status.

And then LLMs came along and everyone seemed to say, “Actually, forget all that and throw your integrity away.”

The transformation was invasive and pernicious.

@CptSuperlative this captures a strong discomfort I’ve felt viscerally with these code generators. Exactly right.

It’s kinda similar to the “Chesterton’s fence” parable of not making a change when you don’t understand the reasons for the state in the first place, except it’s like that times a thousand and just throwing all understanding to the wind and changing everything regardless.

@CptSuperlative I think you're on to something significant here!

After almost 20 years of programming, I mostly write code that works as intended the first time, but that's because I write it in small increments.

Even without test-driven development, an essential part of software development is continuously exercising every line of code to ensure it works as intended. Programming and software design are "whitebox problems"¹: we know the design and can make sure to test all paths in it.

@CptSuperlative Conversely, using LLMs to generate code, turns programming and softwsre design into a blackbox problem, which means we no longer know and exercise all the paths, because we move too fast.

This is especially bad in agentic development, where it is a blackbox approach at a high level of abstraction. There are many layers of abstraction which we don't exercise properly.

Notes:
¹ Whitebox testing https://en.wikipedia.org/wiki/White-box_testing

White-box testing - Wikipedia

@CptSuperlative I think there's a strong argument somewhere here, about why LLM generation of code will result in lower quality, with uncertainties in security, data security, etc… but it needs a bit more fleshing out on the usage side: how LLMs affect the psychology of developers to stop them from evaluating all the code they write.

I think it's implied by the fact that most developers don't even properly read all the code they generate…?

@CptSuperlative (Also sorry for formalizing this in your replies 😁 I know I'm mostly restating and generalizing your very good point!)

@nielsa

No worries, I enjoyed it

@CptSuperlative in my 30 years of coding I only had one case where the code immediately worked and even testing and source review found no issue.
Only happened ONCE in my life.

@CptSuperlative I recently read this article on BYD. Great background, as I wasn’t aware of their history.

This is particular jumped out at me. While focused on a hardware issue, it applies to software as well:

“When defective cells appeared, Wang asked: “Have you found the root cause?” If yes: “Can you reproduce it?” Then the demand: “Make one hundred cells with exactly the same defect. If you can reproduce the failure one hundred times, identically, then and only then have you understood the mechanism.””

https://www.inc.com/howard-yu/the-nail-test-why-this-54-billion-innovation-is-terrifying-western-auto-executives/91317777

The Nail Test: Why This $54 Billion Innovation Is Terrifying Western Auto Executives

BYD says it's fixed yet another major problem with EVs.

Inc
@CptSuperlative
Yes, "throw your integrity, cause ..." always feels bad.
@CptSuperlative Reminds me of this article on 'Residuality Theory' and complexity science for software engineering. https://ericnormand.substack.com/p/residuality-theory
"The industry has come up with a set of best practices that can mitigate many stressors. … Architecture is not “use as many best practices as possible” just like software design is not “use as many design patterns as possible.” …
[W]e should mentally simulate the stressors ahead of time to understand how [the system can survive] in the complex real world."
Residuality Theory

Good idea, bad name

Eric Normand's Newsletter
@CptSuperlative The trend is "maximising the number of design patterns" rather than "simulating disasters in advance".
@CptSuperlative I really appreciate you post. I used your white box testing using the AI itself... so am testing a multi-thousand line script by asking Claude Code to perform white box testing on it. I will use this concept henceforth for all coding tasks. Many thanks for your post, it taught me a lot. The first bug it found was to do with if someone types ë in a machine name and the regex would fail. This sort of minute bug lying in wait is lovely to spot this way.