People keep assuring me that LLMs writing code is a revolution, that as long as we maintain sound engineering practices and tight code review they're actually extruding code fit for purpose in a fraction of the time it would take a human.

And every damned time, every damned time any of that code surfaces, like Anthropic's flagship offering just did, somehow it's exactly the pile of steaming technical debt and fifteen year old Stack Overflow snippets we were assured your careful oversight made sure it isn't.

Can someone please explain this to me? Is everyone but you simply prompting it wrong?

It's a good thing programmers aren't susceptible to hubris in any way, or this would have been so much worse.

You know, it isn't even that tools like this are useless. There are absolutely things they could be good at. I've personally seen Claude find stupid little bugs you'd spend an hour figuring out and hating yourself for afterwards with great efficiency. I tried the first iteration of Copilot, back when it was just an aggressive autocomplete, and while I had to stop using it because it was overconfidently trying to finish my programs for me without being asked, it was great for filling in boilerplate and maybe even a couple lines of real code for the basic stuff. We have models nowadays that are actually trained to find bugs and security issues in code rather than having the entire internets thrown at them to produce something Altman & Amodei can sell to the gullible as AGI.

But there's the problem. The technology has been around for a while, we have a good idea of what it's good for and, more importantly, what it's not. "Our revolutionary expert system for finding bugs in your code" isn't nearly as marketable to the general public, and the CEO class especially, as "our revolutionary PhD level sentient AI that will solve all the world's problems if you only give us another couple trillion dollars, and also wants to be your girlfriend." And so we get Claude and ChatGPT and RAM shortages and AI psychosis and accelerated climate change instead of smaller, focused models that are actually good at their specialist subjects. Because those don't produce as much shareholder value.

@bodil I liked @mmasnick's take on how mayyyybe there's a silver lining in code-generating that it can help re-democratize personal computing in which it's not the personal computer but also the software can be customized and home-grown.

I like to think that sammy boi is out there, trying to buy up the world's complete silicon wafer production because he spends his sleepless nights dreading gen AI breaking loose of his ilk's corporate capture.

I'm sure many of us won't gleefully march into local-AI boosterism without addressing the (open-weight) elephant in the room, maybe one way truly open & fair models will leave the fairy realm of the Mozilla Foundation "Wouldn't It Be Cool..?!!" list.

Like, waiting for the "AI bubble to pop" is like hoping for an alien invasion: all it will bring is pain and destruction with no clear "ok, what now?" that follows. I like the _hopefulness_ of his perceived trajectory and I truly hope we get there before we split the planet in half. 😶

AI Might Be Our Best Shot At Taking Back The Open Web

I remember, pretty clearly, my excitement over the early World Wide Web. I had been on the internet for a year or two at that point, mostly using IRC, Usenet, and Gopher (along with email, naturall…

Techdirt
@flaki @bodil Note that for one of the notable examples in this article (Fray) the author (Derek) has debunked the analogy.
@janl @bodil ugh, haven't seen his comment before, but honestly not surprised about his reaction :(
@flaki
Software has always been homegrowable and customizable. Society chose to reject people actually customizing it by mass marketing computers that have increasingly complex requirements for being "useful". (Hell, even the good old C64 is packed with proprietary bits.)
LLMs democratize nothing, local or not. Good docs, relative simplicity and community do.
@bodil @mmasnick

@bodil ”it was great for filling in boilerplate”

There’s your problem right there. Computer science should work towards getting rid of the need for boilerplate, not invent ways to write more of it.

Every piece of boilerplate is a failing of the language or library that you’re using, and is technical debt. Editing generated code doubly so.

@ahltorp @bodil :-) I think a good slice of computer science does.

However, “the market” does not. It operates to extract profits. Not to simplify, reduce barriers, improve access, or clarify.

@benjohn @ahltorp @bodil The logical conclusion then is to regulate that away 🙃 I would guess we are more than a few Therac-level incidents away from this happening sadly.
@[email protected]

Boilerplate is a side effect of excessive abstraction.
Now think about it for a second. 😉

(btw, did you consider an April fool for Anthropic's leak? It would be great PR.. after.)

@[email protected] @[email protected]
@giacomo You would have to explain what you mean there, because it makes absolutely no sense. Boilerplate is used instead of abstraction.
@[email protected]

If you don't abstract your code only need to solve a pretty specific problem.

If you abstract your code can handle a variety of tasks, you need new code to connect your generalized code with the actual problem to solve.

The enormous amount of boilerplate required by "modern" frameworks just makes the tradeoff evident. Unfortunately, marketing and hype hide this obvious fact to most developers.

@giacomo @ahltorp

it's because modern frameworks use bad abstractions like "component" or "model" or "capacitator enabler" that generalize a very narrow subset of the problem domain rather than good old reasonable abstractions like functor or a monad transformer

@ahltorp @bodil I've seen a pretty good argument that basically goes like this:

  • Copilot seems to be good in your org because your org is full of boilerplate

  • Your org is full of boilerplate because most software that solves real problems in the amount of time people are willing to spend money to solve them... Is full of boilerplate.

We can generally only remove the boilerplate once we have the problem domain and solution shape firmly in view, and that usually happens after we get to a working prototype, at which point the money folk immediately cut the budget because the people in the not-in-a-computer world see the problem as solved now.

@ahltorp @bodil Or just have a template source code file that you copy as needed.

@drwho @bodil Yes, if it’s really necessary, and that has been the traditional solution pre-LLM. But I would still call that editing generated code, even though the generation is static.

It’s much better to have language and/or library support that makes the boilerplate go away. We don’t make our IDE generate an assembly subroutine call template and edit that when we want a function call, we just write the function name and parameters with parentheses or whatever the syntax is in our language.

@drwho @bodil When I wrote a 4x4 matrix multiplier in assembly in the beginning of the ’90s, I needed a lot of repetitive code. Did I generate that? No, I wrote a macro.

I did generate some other code, but then that generation became part of the build chain, I didn’t copy-paste the result into a source file.

I don’t expect most people to write advanced macros, compilers or other code generators, but I expect language and advanced library designers to.

@bodil unfortunately, it seems that AGI, defined as "human level intelligence", might actually be close due to a movement in the opposite direction: humans getting dumber really fast.
@aurisc4 @bodil Ahh I hadn't considered that strategy. It all makes sense now!
MAD Bugs: Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)

To our knowledge, this is the first remote kernel exploit both discovered and exploited by an AI.

Calif
@bodil Excellent points. Which are the mentioned bug and security finding models you have in mind and where/how are they available?

@bodil I would be at least *partially* aboard if it were more like an autosuggest that you can turn off based exclusively off things that you've written before in that project. In GDScript, for instance, I'm often writing "get_tree().get_root().get_node(GlobalVariables.<Insert variable name here>"). An autocomplete like that'd be a useful tool, because you can understand it's scope and it's sources, and it's very clear where the buck stops.

But what do we have instead? Utter dogwater. 🙄

@bodil the problems here are capitalism not AI. In particular too much capital seeking new areas to extract unreasonable returns from. Very similar to dot com and real estate bubbles but made worse by the increased amount of capital sitting around since Covid bubble. Almost impossible for our economy to resist hype and just develop interesting new techniques for the betterment of humanity.
@bodil im convinced OP is either an april fools joke or a psyop by anthropic to make public perception of slopwranglers a little more centrist a la “it has its uses” (to which, lmfao no fuck off) and tbh im probably not gonna give you the benefit of the doubt here, so probably gonna be pathfinding to the block button now, as one does
@bodil we're just holding the LLM tesseract the wrong way, right?
@bodil “you still need a human in the loop”, they tell me, while consistently failing to be at all effective when they’re the human that should be in the loop.
@benjamineskola @bodil More likely, the organics in the loop got laid off last year.
@drwho @bodil Not even that. I've seen people respond this way when it's nothing but their own responsibility.
@bodil I imagine that the fact that no one has to dive into the spaghetti means they don't care about it. Treating it like bytecode or binaries, the optimization and maintenance of which are Somebody Else's Problem™. I've only just started reading about folks profiling the trash heaps these things spit out, and it doesn't look great.
@cargot_robbie @bodil Which is pretty much how JavaScript, Typescript, and Perl have usually been treated.
@bodil
I work in ops, not development, but those sound engineering practices and tight code reviews must be partly theater to guilt people into submitting better work in the first place, right? Too bad Claude code isn't a human with any sense of shame.
@oysteivi @bodil Nope. It didn't even before Claude.

@bodil

> Can someone please explain this to me?

Sure: code with the job of managing a natural language LLM isn't going to look like procedural code you're used to.

If you have doubts whether coding assistants like https://antigravity.google are any use, download it, try it on your own code with your own choice of tasks and find out.

You can throw the changes away if you are worried about getting contaminated.

You can write about your experiment here. And, you will actually know.

Google Antigravity

Google Antigravity - Build the new way

Google Antigravity

@hopeless
Your explanation just restates the observation but it provides no reason for why it's supposed to be looking different.

@bodil

@Landa @bodil

> Your explanation just restates the observation

OP has a point and a question... the point is Anthropic's leak not looking like they expected. It's because its job is not what they are used to.

The question is "are LLMs useful for writing code". To which I encourage them to stop being passive-aggressive about it and actually find out, and write about it, like a human with agency.

Your response is "just" denial. Please let us know your experience with antigravity...

@bodil Anthropomorphic is not maintaining sound engineering practices. It's just impossible at the speed they're pushing. The way the claude code tech lead talks about it it's clear that there's no tight code review. It's a company pushing the coding is solved, SWE is dead narrative. The last thing they want to admit is that even if the code is pretty good, you still need human in the loop

@bodil

Oh no, the probability engine is producing average output.

surprisedpikachu.jpeg

@bodil I don't get it either. It completely baffles me that anyone can look at the generated output and think "this is how it should be" or look at the anthropic leak and say "this is great engineering"

And once someone has emotionally invested in LLMs being the future of their career it is really hard to get a honest conversation going.

And when I test it and it doesn't deliver It seems to always boil down to: you are doing it wrong ... you are stuck in your old ways ... pre AI thinking ...

@themipper @bodil pure gaslighting
they try to convince you that you are the problem, and that you can't understand why this shit is the best..

@thinkb4coding @themipper @bodil

"And once someone has emotionally invested in LLMs being the future of their career it is really hard to get a honest conversation going."

Yeah that's exactly it! The endless goalpost-moving is so exhausting.

"Try it."

"I did, it wasn't great."

"Did you try the latest model?"

"Well it's the model that was latest when we had this conversation 6 months ago."

"Maybe you're using it wrong? If you're not doing agenetic coding of course it won't work!"

"When X used agenetic coding, it deleted their production database. When Y used it someone popped their github account by putting instructions into the agents.md file in one of their transitive dependencies."

"Well those things haven't happened to me." (Yet)

@themipper @bodil Hint: the people looking at it and saying this is how it should be are the same people who measure productivity in LOC.

@themipper @bodil this!

“once someone has emotionally invested in LLMs being the future of their career it is really hard to get a honest conversation going” 🎯

@bodil Indeed. Its good programmers are not susceptible to hubris.

Otherwise, they'd probably react to a concept like AI by claiming that programming is a form of art that can never be managed by AI, that AI is crap and anything produced by it must also be crap - proven by the fact that AI uses the crap from crappy websites where crappy - eh - programmers have posted it, and they'd utterly fail to make any distinction as to where AI can be a useful tool and what should better be done manually.

First and foremost, they'd scream defiance about the code quality of AI itself, amusingly ignoring the fact that this code has very obviously been written by programmers. But they'd be AI programmers, and therefore obviously worthless bastards.

My, am I happy that programmers are so absolutely immune to hubris.

@papageier @bodil

I am not a computer programmer if I can help it, but I find the following line of argument very interesting :

The generative LLM is a machine for producing the "most probable message", which according to information theory is the message we should discard.

https://man-and-atom.tumblr.com/post/812029038295187456/now-this-might-be-considered-a-theological

Man and Atom

Now, this might be considered a theological objection — A large language model or similar “generative AI” produces the most probable message. That is, in fact, the only thing it can do¹. If you have...

Tumblr
@bodil it gets even better: as the whole system is non deterministic you can always claim: the others are just promting it wrong. And there is no way to falsify or verify it. How convenient. (This drives me crazy)
@bodil we have sound engineering practices at home
@dysfun my theory is that you're about as likely to produce quality code from LLMs through "sound engineering practices" and "careful oversight" as you are to write safe C code by being more careful.
@hugo basically the main difference i am sensing is the ability to fool yourself. those who can love LLMs and those who can't hate them.

@dysfun I would even slice it not just to those who can vs those who can't, but perhaps say those who know they can be fooled but don't want to be.

I know there are psychological hazards there that are extremely powerful and subtle. I don't want to subject myself to that.

@hugo i've been watching lots of aviation videos. they give a shit in that industry, it's weird.
To put it more directly:
I am not so arrogant as to think I will survive unscathed where literally every other human who has attempted it has not.

@bodil

> It's a good thing programmers aren't susceptible to hubris in any way, or this would have been so much worse.

I've been told hubris, as well as laziness and impatience are the three great virtues of a programmer.

https://threevirtues.dev/

The essential virtue is integrity and it's missing on that list. Without it, the original explanations would crumble.

I guess it's either one of those off-by one errors - or it was a daft claim right from the start.

The Three Virtues of a GREAT Programmer

According to Larry Wall, the original author of the Perl programming language, there are three great virtues of a programmer; Laziness, Impatience and Hubris.

@bodil The "it's a tool and you just have to check its output" argument enfuriates me like little else does.

Nobody ever fucking does that. People don't even take reviews of real code seriously, and I call bullshit on every sloperator who claims to have read and understood the output of their oversized autocomplete engine.

And don't get me started on the asshats who also let the slopthing write their docs. Or their business emails. Or anything else, really.

@bodil
It does seem to work though? A lot of users think that their product is good, or even the best, and that it keeps improving.

It may be built from bubblegum and clotheshangers, but apparently that works just as well as good engineering.

At least in short run. It may give them headaches in the long run. But a company like anthropic will be ecstatically happy if they make it to the long run at all.

@bodil I use claude for programming and it takes a couple of iterations to get good code.

This versus "lets just ship it" approach. It's the same copied-from-stackoverflow quality you previously got from some offshore teams. Passes functional requirements = done.

@bodil I never use these tools, but yesterday I was using libcbor for the first time and decided giving chatgpt a try to generate a small snippet to decode a simple structure. It introduced two memory leaks (didn't call cbor_decref() on extra references it created). Told it if it shouldn't be calling cbor_decref() on those, and it confidently said you must not because those references were borrowed (wrong, and the documentation states clearly they aren't 🤷‍♂️).