Look this isn’t at all a defense of slop code, but it has me thinking — how much does code quality matter, and why?

It’s maintenance, right? We care about readability because we know we’ll have to make changes, fix bugs, etc.

But so … imagine a codebase that’s magically bug-free and feature-complete. (I’m aware this is a strawman - that’s the point, it’s a thought experiment.) Does it matter if this codebase is well-written? I’m not sure it does! (1/5)

Code quality has always been ONE factor; it’s never been always the most important. Eg we often accept complex internals as the price for a clean external API; and we all write sloppy code for one-offs, prototypes, etc. So part of me accepts the “code quality doesn’t matter” argument. I can see a vision of agentic engineering with systems that prove correctness; if an agent produces code that is provably correct, maybe the quality really doesn’t matter! (2/5)
I’m far from convinced that this is actually possible. It’s certainly not now — and I’m not talking about models. Testing and verification tools are nowhere near where they’d need to be, regardless of model quality. Today, code quality DOES still matter; even the best-case version of agentic engineering can’t produce code that’ll never require maintenance. But I can see a possible future where code quality might not matter, or will matter a lot less, and that’s FASCINATING. (3/5)
Specifically what I find fascinating is: the tooling that would be required to make agentic engineering begin to live up to the hype — much better testing tools, formal business logic specification languages, more powerful and easier to use formal verification tools, better static analysis tooling, etc — would be massively useful to software engineering quite regardless of the existence/utility/quality of LLMs. (4/5)
Will we actually build them? I sort of doubt it: the history of software development, and of course the current trajectory, suggests we’ll continue to yolo our way through it. I wouldn’t exactly say I’m optimistic, but hope springs eternal. (5/5)

@jacob Responding to the thread as a whole, I think code readability will matter just as much as cockpit black boxes matter.

The universe is against us even with bug-free hardware and software. Even the perfect self-driving car will kill some people. The perfect AI doctor will lose some patients. The perfect automated factory will assemble some lemons.

We will need to be able to learn why and how it happened. Step by step. So, logs and readable code. Otherwise trusting automation is impossible

@ambv @jacob Definitely...

I was going to say something about accountability, but this is pretty much my thought too.

Especially as it's incredibly unlikely that anyone would one-shot a perfect implementation, even more so for something that is extremely critical.

If we can't understand "why" something went wrong, how do we account for it? Was it negligence? Can someone be held accountable? Is the prompting wrong or is the code wrong?

I can see why that's a dream scenario for a corporation, though.

@pythonbynight @ambv Ok yes I like this: “explainabibility” (I need a better word but fine) is one reason why code quality matters. Like, if a dam fails, we’d really like to have the blueprints — and for them to be readable! — so we can figure out where we screwed up. “The dam works because the lake holds water” isn’t sufficient.
@jacob @pythonbynight @ambv "Explainability" is the word generally used for this requirement when it comes to the use of machine learning systems in finance (loan approvals, that kind of thing), so if there's a better word, it hasn't been found yet.
@ancoghlan @jacob @pythonbynight @ambv there's also "legibility", in the Seeing Like a State sense, ie. you have people (or LLMs) who report to you, and you need to be able to make sense of what they're doing in order to hold them accountable.
@jacob I think this is a subtle category error, in the sense that "readability" (and "code quality" more generally) is a transitive adjective. Readable to whom? High quality according to whose taste? We strive for an "objective" sense of code quality because the audience we are usually addressing is the pool of potential candidates who may become future maintainers of the code, and that's a nebulous group. But, that group's nebulousness eventually must be removed as it becomes "the current team"
@jacob all of that is to say: the reason readability matters is that if you want to maintain some code, some specific group of humans must maintain an understanding of its structure, such that they can effectuate *and be held accountable for* required changes. The putative cost reduction of an "agentic" tool inherently assumes that you can shrink that group by replacing some of them with LLMs. Maybe, eventually, some will be able to. But ultimately *somebody* still needs to read all the code.
@jacob the world in which agentic tools could have the level of success that you're imagining with trash-level code quality would seem to me to be the same world where anyone annoyed with a subscription-model app would simply download a 30-year-old abandonware replacement from archive dot org, because they'd be comfortable with long-term stasis. which seems unrealistic to me.

@jacob I keep wanting a deterministic layer in there somewhere. Letting an AI take over the test suite is the piece that feels riskiest to me. That's always been the layer of trust and confidence.

A steadily evolving test suite that I understand is my best guarantee that things will continue to work at least as well as they have in the past. If I give over control of that layer, I don't know where my confidence that things will work comes from.

@jacob I have been doing some of this, mostly cranking up the tools that already exist. I’m pretty sure it helps. But I can also see the agent jumping through all the hoops I give it, and they are both familiar and would be incredibly frustrating for me when hand programming.
@jacob @freakboy3742 Yes!! This is a big part of what makes me actually optimistic about the excitement around LLMs and the like: these systems afford us an opportunity to reflect on what we actually are trying to and then create structures conducive to that.

@jacob yeah, exactly. Lots of AI coding boosters are talking about the importance of tests, documentation, small modules… On one hand, yes! Great ideas! OTOH why did we have to wait for LLMs to prioritize this? (And as soon as we can avoid it, will we?)

But, I do like the move to smaller PRs, smaller files, more tests. My biggest challenge is that PR review has become a bigger bottleneck.

@jacob Never thought about the provably correct path. I have seen the "who cares what the code looks like when it's only the AI that sees it" though.

But yeah, that's intriguing.

@jacob A thought when reading this: even if it were possible to make something perfect (serving your straw man) by today's version of perfect, we know that definition will change over time (e.g. https everywhere today, but not before Firesheep).

I think there are some parallels here to “I'm not doing anything wrong, so privacy doesn't matter to me.” The definition of “not doing anything wrong" has changed.

Also difficult maintenance via LLM will get *expensive* when it's unsubsidized.

@sean Ahh yes! “Done” and “correct” are slippery, I was thinking that, but the idea of their definition changing over time totally matches my experience and is a pretty great way of bringing the strawman back to earth.

@jacob code quality does not matter *if the code never has to change*. it’s always a forward looking statement: we want easy to understand code because we will have to modify it later.

if later never comes then i agree entirely;

but in my own very limited explorations imho the LLMs also find good code easier to read and reason about…

@jacob if we stop seeing transformative improvements in model quality, i suspect we’ll get more of a bifurcation along business models? facebook can move fast and break stuff; orgs with high uptime demands can’t. pressure to yolo will be higher with one than the other
@jacob I mean, we don't care about the machine code quality (not even in performance terms in many cases), so I do think we won't care about code quality in the future, but what I do care is determinism: the same input (prompt) should *always* produce the same output (generated code), I guess it'll come down to a new kind of programming-ish language that somehow provides that?

@jacob Jumping on one specific thing: Rice's Theorem means that any "interesting" program has limits on the extent of static analysis, so "provably correct" is not a thing you can get.

The things that would improve product quality the most are things that provably can't actually be automated in the general case. Those are the limits we have to reckon with.

@jacob (this is the subject of my #NBPy talk: https://pretalx.northbaypython.org/nbpy-2026/talk/GQLNDC/ and is a thing I am hinting at in this contemporaneous-to-yours thread: https://social.coop/@chrisjrn/116337892980613263)
@chrisjrn Yes of course I already knew about Rice’s Theorem before your toot why would you think otherwise
@jacob et tu brute?
@janl I’m not sure what you mean?
@jacob Every week someone I look up to falls for the genai scam.

@janl *sigh*

If having anything positive to say or any curiosity at all about LLMs means I’ve fallen for a scam than ok I guess. I wish there was room for more nuance here.

@jacob yeah, I know you understand there are some topics where asking for nuance is a dogwhistle and I consider this technology one of them. To me the only sensible position is to stay the eff away from it.
@janl I don’t know where to start with that. I’m sorry you see me that way. I still think you’re great.
@janl To be clear - my position is that genAI is on balance probably a net negative and if I could unring the bell I would. But also I would say that about the internet — and “there are good things about the internet” is hardly a controversial option (I think?). I get that The Discourse demands extreme opinions but I don’t think it’s “falling for a scam” to find positive things about AI, and I think it’s dead wrong to approach ANYTHING without a sense of curiosity.
@jacob hence my original post. I’m so very sorry they got you.
@janl That’s certainly one narrative. It’s not mine.
@jacob I also want to state that I am a person and not A Discourse.

@janl @jacob can I ask the last time you used genai for some code? Maybe not even your main/work codebase just anything even a toy script?

I’ve found that often the people who hold this opinion tried it for a couple hours a year or more ago and formed an opinion.

@frank you can ask and the answer is never.

I do not think evaluating these things based on their merits is ethical.

The tool you are using has been built by stealing* from me** and my friends and it offers to rent their theft back to me, so excuse me when I am not interested.

*and that’s only one of the many issues
**I’d be entitled to thr Anthropic lawsuit money if I were a US citizen

@frank @janl @jacob

On the other hand, we've just seen what LLMs did rewriting sqlite, a monstrous botch that resulted in a monstrous decrease in performance. And we've *just* seen the quality of claude code, which was, purportedly, vibe-coded. So.

@frank @janl @jacob I'm obviously not Jan, but maybe Jacob gets pushback like that because stuff like "imagine a codebase that’s magically bug-free and feature-complete. " and going on from there to talk about LLMs is practically indistinguishable from the propaganda and grandstanding Altman and Amodei and Musk peddle. Just imagine what would be possible if. Fascinating.

Yeah it's fascinating but so is imagining what we could do if we could photosynthesize. And...

@adriano @janl @jacob I recently got some blood work done and I’m very Vitamin D deficient so I’m personally quite glad I can’t photosynthesize as my vampire ass would be dead 🤣
@adriano @frank @janl It’s frustrating because the thing I was trying to discuss about the question of why code quality matters, not yet another “LLMs: good or evil” argument. And for the most part replies have been focused on that — and have really teased out some super interesting points about what “quality” means and why it’s important. And in doing so I think makes a much better case against agentic coding ever working the way proponents claim it might!

@jacob "If the roof stays up and the floor is dry, does the build quality of my house matter?"

Yes. Yes, it does.

@meejah Why? Say more!

@jacob The view from the top of the mountain is great -- but it feels better (more "accomplishment") if you get there on foot instead of a chair-lift or helicopter.

I at least put non-zero value on "the process" and the methods used. I don't necessarily have a great way to articulate this feeling right now.

In climbing, one may "climb a climb" on toprope, on lead or even free-soloing. These come with different intrinsic rewards, even if the end-point is the same ("alive, and on top").

@jacob I believe some of the more scholarly posts I've read about the value of "trying, and failing" in learning apply here too: that being told the right answer by an oracle machine doesn't lead to the same learning as "doing the work" yourself (i.e. making mistakes, etc).

Everything I've learned about mentoring seems to back this up: if you just tell people the right answer, they do not learn it as well (nor appear to feel the same) as if they fail, then find the answer.

@jacob (...and of course there's some balance: continuing to flail and fail is not going to produce learning either, unless there's a TON of motivation).

Coming back to a climbing analogy, if you try to climb something _way_ too hard for you, it's just going to be frustrating (or fatal) instead of producing accomplishment.

So two "equal" programs (that produce same/similar output for inputs) will still be judged differently (by humans) based on other factors (like "process"). IMO.

@meejah See, I don’t disagree with you — but also you gotta be careful because the metaphor cuts both ways. Is someone who can’t walk less deserving of seeing a summit? I spend a ton of time in the wilderness doing stupidly viscerally feel the importance of “earning it” — but also know how much privilege plays a role. Difficulty and accessibility are opposed. Would it be a bad thing if non-programmers could build programs? Hell no! (Are LLMs the right tool to solve that problem? Also hell no!)

@jacob Yes, there's no absolute scale here. Someone finishing something important to them is no less "deserving" of feeling accomplishment.

I'm not sure why I went to "physical" examples, but this applies to lots of areas. I should have kept it about 'craft' (in the widest possible sense) I guess

I do believe that people will feel _less_ "accomplishment" if the thing wasn't "hard" (for them!) though. (That is, climbing the stairs can be just as much an accomplishment as any mountain)

@meejah Yeah totally. There’s also like a multitude of motivations, right? Sometimes I build a table by hand because woodworking is enjoyable and the craft is important. Sometimes I just need a place to put my drink and IKEA is fine. Sometimes I write code because coding is enjoyable and craftsmanship matters. Sometimes I just need a damn website so people know how to get to the wedding.
@jacob According to the classical problem solving flowchart, if it ain't broke, don't fix it. (But if you have to make updates, quality and readability start to matter a lot more.)
@jacob sustainable software development is how i think about it. beyond maintenance, we gotta be able to add new features too right? - it needs to be good enough for it to stand on itself and allow change
@jacob there are a lot of software engineers, and developers who have read assembly instructions in the last ten years. There are a lot more who haven’t.

@issackelly @jacob The assembly was replaced by higher level code that still needed to be maintained and understood.

I’d argue that sufficiently-specific and structured prompts will still need to be maintained, too. But with the caveat that it will be run through a fundamentally non-deterministic interpreter.

@palendae @issackelly But also isn’t non-determinism sort of the point of LLMS? Like it’s not something you can just turn off, right? (This is a genuine question, not a bit - I really don’t know.)

@jacob @issackelly Yes, which I think is why they’re not suitable for what we’re trying to do with them.

The output of the code it writes might not matter, but consistency of it _is_. I don’t think we can trust them to produce code that retains functionality without a ton more tooling, which you alluded to earlier.

@palendae @jacob Jacob I think your original point is approximately where I am with it.

* Humans own/maintain the specs, and
* have a lot of oversight/responsibility over a test suite.

I don't know that "prompt maintenance" is really where my mind is at with agents + LLMs.

If the spec represents what you want, and the tests represent the spec, then that's the API to the product, like the function body is the API to other functions, what's on the inside can change or be ambiguous.

@issackelly @jacob Specs and the tests would be the prompts here; the instructions we give to a machine to do something. Whether it’s a REPL or a set of files given to agents isn’t really relevant, IMO.

I’m not especially optimistic that the industry will apply much rigor to the testing or the specs, to everyone’s detriment.

@palendae @jacob I dunno. I think specs (formal specifications) and tests (acceptance, regression, property-based, and unit, as appropriate) should _not_ be prompts, but rather barriers code should pass, human written or otherwise. The more complete and accurate they are the more likely you are to get value out of LLM code long term.

LLMs can help a skilled human write them; but humans must own the responsibility for them

"Product specs" can be llm prompts or jira tickets for engineers.

@issackelly @jacob I’m curious - are the specs and enhanced tests what you’re focusing on?

Because at my current job, I’m hearing a lot of people saying this, but the emphasis (and measured outcomes) is around throwing agents at code generation and automated review without bringing the testing up at the same time.

@palendae @jacob For me, in the last ~9 months of engineering; "It depends" -- If it _is_ or _is a dependency of_ an API or security critical code, yes. Both for me and the code I review. If it's feature work, especially experiments, on top; then "wing it" has been useful.

At the moment I am not beholden to anybody but myself, so I feel like I'm sharing the bounds of what seems _right_ not what others are currently doing.

I do feel there is far too much exuberance compared to capabilities.