Mastodawn

Look this isn’t at all a defense of slop code, but it has me thinking — how much does code quality matter, and why?

It’s maintenance, right? We care about readability because we know we’ll have to make changes, fix bugs, etc.

But so … imagine a codebase that’s magically bug-free and feature-complete. (I’m aware this is a strawman - that’s the point, it’s a thought experiment.) Does it matter if this codebase is well-written? I’m not sure it does! (1/5)

Show thread

jacobian 1d ago

Code quality has always been ONE factor; it’s never been always the most important. Eg we often accept complex internals as the price for a clean external API; and we all write sloppy code for one-offs, prototypes, etc. So part of me accepts the “code quality doesn’t matter” argument. I can see a vision of agentic engineering with systems that prove correctness; if an agent produces code that is provably correct, maybe the quality really doesn’t matter! (2/5)

Show thread

jacobian 1d ago

I’m far from convinced that this is actually possible. It’s certainly not now — and I’m not talking about models. Testing and verification tools are nowhere near where they’d need to be, regardless of model quality. Today, code quality DOES still matter; even the best-case version of agentic engineering can’t produce code that’ll never require maintenance. But I can see a possible future where code quality might not matter, or will matter a lot less, and that’s FASCINATING. (3/5)

Show thread

jacobian

Specifically what I find fascinating is: the tooling that would be required to make agentic engineering begin to live up to the hype — much better testing tools, formal business logic specification languages, more powerful and easier to use formal verification tools, better static analysis tooling, etc — would be massively useful to software engineering quite regardless of the existence/utility/quality of LLMs. (4/5)

Show thread

jacobian 1d ago

Will we actually build them? I sort of doubt it: the history of software development, and of course the current trajectory, suggests we’ll continue to yolo our way through it. I wouldn’t exactly say I’m optimistic, but hope springs eternal. (5/5)

Show thread

Łukasz Langa 1d ago

@jacob Responding to the thread as a whole, I think code readability will matter just as much as cockpit black boxes matter.

The universe is against us even with bug-free hardware and software. Even the perfect self-driving car will kill some people. The perfect AI doctor will lose some patients. The perfect automated factory will assemble some lemons.

We will need to be able to learn why and how it happened. Step by step. So, logs and readable code. Otherwise trusting automation is impossible

Show thread

Mario Munoz 1d ago

@ambv @jacob Definitely...

I was going to say something about accountability, but this is pretty much my thought too.

Especially as it's incredibly unlikely that anyone would one-shot a perfect implementation, even more so for something that is extremely critical.

If we can't understand "why" something went wrong, how do we account for it? Was it negligence? Can someone be held accountable? Is the prompting wrong or is the code wrong?

I can see why that's a dream scenario for a corporation, though.

Show thread

jacobian 1d ago

@pythonbynight @ambv Ok yes I like this: “explainabibility” (I need a better word but fine) is one reason why code quality matters. Like, if a dam fails, we’d really like to have the blueprints — and for them to be readable! — so we can figure out where we screwed up. “The dam works because the lake holds water” isn’t sufficient.

Show thread

Alyssa Coghlan 1d ago

@jacob @pythonbynight @ambv "Explainability" is the word generally used for this requirement when it comes to the use of machine learning systems in finance (loan approvals, that kind of thing), so if there's a better word, it hasn't been found yet.

Show thread

neoluddite 21h ago

@ancoghlan @jacob @pythonbynight @ambv there's also "legibility", in the Seeing Like a State sense, ie. you have people (or LLMs) who report to you, and you need to be able to make sense of what they're doing in order to hold them accountable.

Show thread

Glyph 21h ago

@jacob I think this is a subtle category error, in the sense that "readability" (and "code quality" more generally) is a transitive adjective. Readable to whom? High quality according to whose taste? We strive for an "objective" sense of code quality because the audience we are usually addressing is the pool of potential candidates who may become future maintainers of the code, and that's a nebulous group. But, that group's nebulousness eventually must be removed as it becomes "the current team"

Show thread

Glyph 21h ago

@jacob all of that is to say: the reason readability matters is that if you want to maintain some code, some specific group of humans must maintain an understanding of its structure, such that they can effectuate *and be held accountable for* required changes. The putative cost reduction of an "agentic" tool inherently assumes that you can shrink that group by replacing some of them with LLMs. Maybe, eventually, some will be able to. But ultimately *somebody* still needs to read all the code.

Show thread

Glyph 21h ago

@jacob the world in which agentic tools could have the level of success that you're imagining with trash-level code quality would seem to me to be the same world where anyone annoyed with a subscription-model app would simply download a 30-year-old abandonware replacement from archive dot org, because they'd be comfortable with long-term stasis. which seems unrealistic to me.

Show thread

jacobian 11h ago

@glyph Yeah, that’s one conclusion I’m coming to thanks to this discussion. I was aware of the slipperiness of “quality”, and was kinda doing that deliberately because exploring weather quality matters is (to me) more interesting than defining quality precisely. But what I hadn’t noticed was that “done” was just as slippery — and in order for quality (whatever definition) not to matter, there has to be no maintenance, and that requires “done” to actually be a thing.

Show thread

jacobian 11h ago

@glyph (Tho - I think software engineering would be in a better place if “done” was more common. Like, is 80% of the TCO of a bridge in maintenance? I sort of doubt it. There are times when I think that calling it software “engineering” is a bit of a joke.)

Show thread

dreid 11h ago

@jacob @glyph I prefer to think of it as aspirational.

Show thread

fancysandwiches 10h ago

@jacob @glyph calling it software engineering is absolutely a joke, it has been a joke for a while, and the rush to adopt LLMs and mostly not read their output is probably the strongest evidence we have that it's been a joke.

Show thread

Glyph 7h ago

@fancysandwiches @jacob I don't know how much this still applies, but I think it's worth reading https://www.lvh.io/posts/reverse-ungineering/ and https://www.hillelwayne.com/post/are-we-really-engineers/

Reverse ungineering

(Title with apologies to Glyph.) Recently, some friends of mine suggested that "software engineer" is not a good job title. While they are of course free to call their profession whatever they like, I

lvh

Show thread

Christopher Neugebauer 7h ago

@glyph @fancysandwiches @jacob (One of my university courses was called "Software Construction", and I unironically like it as an all-encompassing term. There _is_ engineering there, but not everyone needs to be/think/at like an engineer.)

Show thread

fancysandwiches 7h ago

@chrisjrn @glyph @jacob yeah that seems appropriate to me. I just refer to people as software developers. We're developing software, we're not engineering it (for the most part, there are some folks taking stuff incredibly seriously).

Show thread

Glyph 7h ago

@jacob this specific question is really interesting and my hunch was that it would be a surprisingly high %, so, let's see:

https://en.as.com/latest_news/when-was-san-franciscos-golden-gate-bridge-built-and-how-much-did-it-cost-n/#:~:text=The%20Golden%20Gate%20Bridge%20in%20San%20Francisco%20was%20constructed%20from%20January%201933%20to%20May%201937.%20The%20project%20cost%20approximately%20$35%20million%2C%20which%20is%20equivalent%20to%20around%20$666%20million%20in%20today’s%20dollars. says the golden gate cost $35MM ($666MM inflation-adjusted for 2024) to build; https://www.sfchronicle.com/bayarea/article/s-f-golden-gate-bay-bridge-operate-costs-18221920.php claims $103MM to operate in 2024. If we were to just flatten that out, that would put the maintenance share of the TCO at 93% as of this writing :)

When was San Francisco’s Golden Gate Bridge built and how much did it cost?

The bridge’s construction was a significant achievement in engineering and architecture and continues to be an important part of the city’s identity.

AS USA

Show thread

Glyph 7h ago

@jacob I suspect that this is not really a fair way to measure this sort of expense but it's still a counterintuitive back-of-the-envelope calculation!

@glyph Holy shit!

@glyph For one thing - iron rusts, but software doesn’t. Or shouldn’t! I guess that’s the whole problem right there, eh?

Show thread

Alyssa Coghlan 2h ago

@jacob @glyph There's a story about bridge maintenance where the gist was that by the time one pass of repainting the support structure was complete, it was time to start the next pass. Road maintenance is another brutally expensive (and unrelenting) activity. So "build for maintenance" isn't just a software thing - it's something that software inherited from physical engineering, where making the most likely to wear out pieces easy to replace makes a big difference in maintenance costs.

Show thread

Alyssa Coghlan 2h ago

@jacob @glyph The equivalent of "rusting" in software is "the assumptions about the world encoded in a piece of software will inevitably become incorrect". Whether that matters or not depends on the software (even games are often described as having "dated" graphics or interface design, let alone software that actively supports or integrates with institutional processes or other software systems). YOLOing the original construction definitely makes software decay faster, though.

Show thread

Glyph 1h ago

@jacob Yeah, that's the problem. Hence my retrocomputing example. If software *really* didn't corrode, why aren't we all just using Mac OS 8.6, the last really good operating system ;)

Show thread

Dave bauer 1h ago

@glyph @jacob Yeah why can't we just put 64GB of ram in out Mac IIcx

Show thread

JP 2h ago

@jacob @glyph I highly recommend the book The Care of Things: The Ethics and Politics of Maintenance (https://www.wiley.com/en-sg/The+Care+of+Things%3A+Ethics+and+Politics+of+Maintenance-p-9781509562398) which explores the different meanings "done" can have. What "done" means to different people can be quite subtle and complex all on its own.

Show thread

Itamar Turner-Trauring 59m ago

@daedalus Huh, obscure enough the local interlibrary loan network doesn't have it (and a bunch of colleges/small unis in that, it's usually pretty good). Although Harvard has it. Is this "read once" or "refer multiple times" kind of book?

Show thread

Raphael 11h ago

@jacob But aren't we then just shifting all the complexity, skill requirement and quality concerns to the business logic specification language, which will maybe be less verbose than code, but certainly not easier to reason about and write perfect specifications in?

The maintenance burden of software isn't just high because there's a bug in that manually-written for loop, it's mostly because requirements weren't clear enough or change.

Show thread

Raphael 11h ago

@jacob I keep coming back to the joke of that one physics professor that in this universe, not only energy, mass, and momentum are conserved, but also complexity. I don't think it was a joke.

Show thread

jacobian 9h ago

@rami I mean probably - but also you could say a similar thing about shifting the maintenance burden from assembly to compiled languages, from compiled to interpreted, etc. Higher level abstracts are A Thing, I don’t think it’s a huge stretch to imagine an even higher level where we’re specifying whole swaths of applications in a few lines. I guess if you want to be charitable you could say “that’s a prompt!” but the fuzziness and nondeterminism of LLMs make it real hard for me to be charitable.

Show thread

Raphael 8h ago

@jacob Oh yes, absolutely. Could happen! Going from "instructions" to "specifications that can be proven to be implemented" will be a useful abstraction, but writing specifications that are not only sufficiently detailed to prove the code but also prove the *right* thing feels like it's going to be harder than coding and prompting combined, because it combines the hard skills of both. (Still, the end result could be better, not denying that!)

Show thread

Eric Matthes 1d ago

@jacob I keep wanting a deterministic layer in there somewhere. Letting an AI take over the test suite is the piece that feels riskiest to me. That's always been the layer of trust and confidence.

A steadily evolving test suite that I understand is my best guarantee that things will continue to work at least as well as they have in the past. If I give over control of that layer, I don't know where my confidence that things will work comes from.

Show thread

Ian Bicking 1d ago

@jacob I have been doing some of this, mostly cranking up the tools that already exist. I’m pretty sure it helps. But I can also see the agent jumping through all the hoops I give it, and they are both familiar and would be incredibly frustrating for me when hand programming.

Show thread

it's B! Cavello 🐝1d ago

@jacob @freakboy3742 Yes!! This is a big part of what makes me actually optimistic about the excitement around LLMs and the like: these systems afford us an opportunity to reflect on what we actually are trying to and then create structures conducive to that.

Show thread

ash furrow (still spooky)13h ago

@jacob yeah, exactly. Lots of AI coding boosters are talking about the importance of tests, documentation, small modules… On one hand, yes! Great ideas! OTOH why did we have to wait for LLMs to prioritize this? (And as soon as we can avoid it, will we?)

But, I do like the move to smaller PRs, smaller files, more tests. My biggest challenge is that PR review has become a bigger bottleneck.

Show thread

cratermoon 10h ago

@jacob Instead of spending $TRILLIONS on ensloppenators, what if we spent a fraction of that money improving the deterministic tools we have?

Show thread

jacobian 3h ago

@cratermoon I think one of the potential silver linings of the AI hype bubble is that we actually might. I know of a couple of people poking around this area. As with many things around AI, “optimistic” would be a wild exaggeration, but i hold a little hope.