People keep assuring me that LLMs writing code is a revolution, that as long as we maintain sound engineering practices and tight code review they're actually extruding code fit for purpose in a fraction of the time it would take a human.

And every damned time, every damned time any of that code surfaces, like Anthropic's flagship offering just did, somehow it's exactly the pile of steaming technical debt and fifteen year old Stack Overflow snippets we were assured your careful oversight made sure it isn't.

Can someone please explain this to me? Is everyone but you simply prompting it wrong?

It's a good thing programmers aren't susceptible to hubris in any way, or this would have been so much worse.

You know, it isn't even that tools like this are useless. There are absolutely things they could be good at. I've personally seen Claude find stupid little bugs you'd spend an hour figuring out and hating yourself for afterwards with great efficiency. I tried the first iteration of Copilot, back when it was just an aggressive autocomplete, and while I had to stop using it because it was overconfidently trying to finish my programs for me without being asked, it was great for filling in boilerplate and maybe even a couple lines of real code for the basic stuff. We have models nowadays that are actually trained to find bugs and security issues in code rather than having the entire internets thrown at them to produce something Altman & Amodei can sell to the gullible as AGI.

But there's the problem. The technology has been around for a while, we have a good idea of what it's good for and, more importantly, what it's not. "Our revolutionary expert system for finding bugs in your code" isn't nearly as marketable to the general public, and the CEO class especially, as "our revolutionary PhD level sentient AI that will solve all the world's problems if you only give us another couple trillion dollars, and also wants to be your girlfriend." And so we get Claude and ChatGPT and RAM shortages and AI psychosis and accelerated climate change instead of smaller, focused models that are actually good at their specialist subjects. Because those don't produce as much shareholder value.

@bodil I liked @mmasnick's take on how mayyyybe there's a silver lining in code-generating that it can help re-democratize personal computing in which it's not the personal computer but also the software can be customized and home-grown.

I like to think that sammy boi is out there, trying to buy up the world's complete silicon wafer production because he spends his sleepless nights dreading gen AI breaking loose of his ilk's corporate capture.

I'm sure many of us won't gleefully march into local-AI boosterism without addressing the (open-weight) elephant in the room, maybe one way truly open & fair models will leave the fairy realm of the Mozilla Foundation "Wouldn't It Be Cool..?!!" list.

Like, waiting for the "AI bubble to pop" is like hoping for an alien invasion: all it will bring is pain and destruction with no clear "ok, what now?" that follows. I like the _hopefulness_ of his perceived trajectory and I truly hope we get there before we split the planet in half. 😶

AI Might Be Our Best Shot At Taking Back The Open Web

I remember, pretty clearly, my excitement over the early World Wide Web. I had been on the internet for a year or two at that point, mostly using IRC, Usenet, and Gopher (along with email, naturall…

Techdirt
@flaki @bodil Note that for one of the notable examples in this article (Fray) the author (Derek) has debunked the analogy.
@janl @bodil ugh, haven't seen his comment before, but honestly not surprised about his reaction :(
@flaki
Software has always been homegrowable and customizable. Society chose to reject people actually customizing it by mass marketing computers that have increasingly complex requirements for being "useful". (Hell, even the good old C64 is packed with proprietary bits.)
LLMs democratize nothing, local or not. Good docs, relative simplicity and community do.
@bodil @mmasnick

@bodil ”it was great for filling in boilerplate”

There’s your problem right there. Computer science should work towards getting rid of the need for boilerplate, not invent ways to write more of it.

Every piece of boilerplate is a failing of the language or library that you’re using, and is technical debt. Editing generated code doubly so.

@ahltorp @bodil :-) I think a good slice of computer science does.

However, “the market” does not. It operates to extract profits. Not to simplify, reduce barriers, improve access, or clarify.

@benjohn @ahltorp @bodil The logical conclusion then is to regulate that away 🙃 I would guess we are more than a few Therac-level incidents away from this happening sadly.
@[email protected]

Boilerplate is a side effect of excessive abstraction.
Now think about it for a second. 😉

(btw, did you consider an April fool for Anthropic's leak? It would be great PR.. after.)

@[email protected] @[email protected]
@giacomo You would have to explain what you mean there, because it makes absolutely no sense. Boilerplate is used instead of abstraction.
@[email protected]

If you don't abstract your code only need to solve a pretty specific problem.

If you abstract your code can handle a variety of tasks, you need new code to connect your generalized code with the actual problem to solve.

The enormous amount of boilerplate required by "modern" frameworks just makes the tradeoff evident. Unfortunately, marketing and hype hide this obvious fact to most developers.

@giacomo @ahltorp

it's because modern frameworks use bad abstractions like "component" or "model" or "capacitator enabler" that generalize a very narrow subset of the problem domain rather than good old reasonable abstractions like functor or a monad transformer

@ahltorp @bodil I've seen a pretty good argument that basically goes like this:

  • Copilot seems to be good in your org because your org is full of boilerplate

  • Your org is full of boilerplate because most software that solves real problems in the amount of time people are willing to spend money to solve them... Is full of boilerplate.

We can generally only remove the boilerplate once we have the problem domain and solution shape firmly in view, and that usually happens after we get to a working prototype, at which point the money folk immediately cut the budget because the people in the not-in-a-computer world see the problem as solved now.

@bodil unfortunately, it seems that AGI, defined as "human level intelligence", might actually be close due to a movement in the opposite direction: humans getting dumber really fast.
MAD Bugs: Claude Wrote a Full FreeBSD Remote Kernel RCE with Root Shell (CVE-2026-4747)

To our knowledge, this is the first remote kernel exploit both discovered and exploited by an AI.

Calif
@bodil Excellent points. Which are the mentioned bug and security finding models you have in mind and where/how are they available?

@bodil I would be at least *partially* aboard if it were more like an autosuggest that you can turn off based exclusively off things that you've written before in that project. In GDScript, for instance, I'm often writing "get_tree().get_root().get_node(GlobalVariables.<Insert variable name here>"). An autocomplete like that'd be a useful tool, because you can understand it's scope and it's sources, and it's very clear where the buck stops.

But what do we have instead? Utter dogwater. 🙄

@bodil the problems here are capitalism not AI. In particular too much capital seeking new areas to extract unreasonable returns from. Very similar to dot com and real estate bubbles but made worse by the increased amount of capital sitting around since Covid bubble. Almost impossible for our economy to resist hype and just develop interesting new techniques for the betterment of humanity.
@bodil im convinced OP is either an april fools joke or a psyop by anthropic to make public perception of slopwranglers a little more centrist a la “it has its uses” (to which, lmfao no fuck off) and tbh im probably not gonna give you the benefit of the doubt here, so probably gonna be pathfinding to the block button now, as one does
@bodil we're just holding the LLM tesseract the wrong way, right?
@bodil “you still need a human in the loop”, they tell me, while consistently failing to be at all effective when they’re the human that should be in the loop.
@bodil I imagine that the fact that no one has to dive into the spaghetti means they don't care about it. Treating it like bytecode or binaries, the optimization and maintenance of which are Somebody Else's Problem™. I've only just started reading about folks profiling the trash heaps these things spit out, and it doesn't look great.
@bodil
I work in ops, not development, but those sound engineering practices and tight code reviews must be partly theater to guilt people into submitting better work in the first place, right? Too bad Claude code isn't a human with any sense of shame.

@bodil

> Can someone please explain this to me?

Sure: code with the job of managing a natural language LLM isn't going to look like procedural code you're used to.

If you have doubts whether coding assistants like https://antigravity.google are any use, download it, try it on your own code with your own choice of tasks and find out.

You can throw the changes away if you are worried about getting contaminated.

You can write about your experiment here. And, you will actually know.

Google Antigravity

Google Antigravity - Build the new way

Google Antigravity

@hopeless
Your explanation just restates the observation but it provides no reason for why it's supposed to be looking different.

@bodil

@Landa @bodil

> Your explanation just restates the observation

OP has a point and a question... the point is Anthropic's leak not looking like they expected. It's because its job is not what they are used to.

The question is "are LLMs useful for writing code". To which I encourage them to stop being passive-aggressive about it and actually find out, and write about it, like a human with agency.

Your response is "just" denial. Please let us know your experience with antigravity...

@bodil Anthropomorphic is not maintaining sound engineering practices. It's just impossible at the speed they're pushing. The way the claude code tech lead talks about it it's clear that there's no tight code review. It's a company pushing the coding is solved, SWE is dead narrative. The last thing they want to admit is that even if the code is pretty good, you still need human in the loop

@bodil

Oh no, the probability engine is producing average output.

surprisedpikachu.jpeg

@bodil I don't get it either. It completely baffles me that anyone can look at the generated output and think "this is how it should be" or look at the anthropic leak and say "this is great engineering"

And once someone has emotionally invested in LLMs being the future of their career it is really hard to get a honest conversation going.

And when I test it and it doesn't deliver It seems to always boil down to: you are doing it wrong ... you are stuck in your old ways ... pre AI thinking ...

@themipper @bodil pure gaslighting
they try to convince you that you are the problem, and that you can't understand why this shit is the best..

@thinkb4coding @themipper @bodil

"And once someone has emotionally invested in LLMs being the future of their career it is really hard to get a honest conversation going."

Yeah that's exactly it! The endless goalpost-moving is so exhausting.

"Try it."

"I did, it wasn't great."

"Did you try the latest model?"

"Well it's the model that was latest when we had this conversation 6 months ago."

"Maybe you're using it wrong? If you're not doing agenetic coding of course it won't work!"

"When X used agenetic coding, it deleted their production database. When Y used it someone popped their github account by putting instructions into the agents.md file in one of their transitive dependencies."

"Well those things haven't happened to me." (Yet)

@themipper @bodil Hint: the people looking at it and saying this is how it should be are the same people who measure productivity in LOC.

@bodil Indeed. Its good programmers are not susceptible to hubris.

Otherwise, they'd probably react to a concept like AI by claiming that programming is a form of art that can never be managed by AI, that AI is crap and anything produced by it must also be crap - proven by the fact that AI uses the crap from crappy websites where crappy - eh - programmers have posted it, and they'd utterly fail to make any distinction as to where AI can be a useful tool and what should better be done manually.

First and foremost, they'd scream defiance about the code quality of AI itself, amusingly ignoring the fact that this code has very obviously been written by programmers. But they'd be AI programmers, and therefore obviously worthless bastards.

My, am I happy that programmers are so absolutely immune to hubris.

@papageier @bodil

I am not a computer programmer if I can help it, but I find the following line of argument very interesting :

The generative LLM is a machine for producing the "most probable message", which according to information theory is the message we should discard.

https://man-and-atom.tumblr.com/post/812029038295187456/now-this-might-be-considered-a-theological

Man and Atom

Now, this might be considered a theological objection — A large language model or similar “generative AI” produces the most probable message. That is, in fact, the only thing it can do¹. If you have...

Tumblr
@bodil it gets even better: as the whole system is non deterministic you can always claim: the others are just promting it wrong. And there is no way to falsify or verify it. How convenient. (This drives me crazy)
@bodil we have sound engineering practices at home
@dysfun my theory is that you're about as likely to produce quality code from LLMs through "sound engineering practices" and "careful oversight" as you are to write safe C code by being more careful.
@hugo basically the main difference i am sensing is the ability to fool yourself. those who can love LLMs and those who can't hate them.

@dysfun I would even slice it not just to those who can vs those who can't, but perhaps say those who know they can be fooled but don't want to be.

I know there are psychological hazards there that are extremely powerful and subtle. I don't want to subject myself to that.

@hugo i've been watching lots of aviation videos. they give a shit in that industry, it's weird.

@bodil

> It's a good thing programmers aren't susceptible to hubris in any way, or this would have been so much worse.

I've been told hubris, as well as laziness and impatience are the three great virtues of a programmer.

https://threevirtues.dev/

The essential virtue is integrity and it's missing on that list. Without it, the original explanations would crumble.

I guess it's either one of those off-by one errors - or it was a daft claim right from the start.

The Three Virtues of a GREAT Programmer

According to Larry Wall, the original author of the Perl programming language, there are three great virtues of a programmer; Laziness, Impatience and Hubris.

@bodil The "it's a tool and you just have to check its output" argument enfuriates me like little else does.

Nobody ever fucking does that. People don't even take reviews of real code seriously, and I call bullshit on every sloperator who claims to have read and understood the output of their oversized autocomplete engine.

And don't get me started on the asshats who also let the slopthing write their docs. Or their business emails. Or anything else, really.

@bodil
It does seem to work though? A lot of users think that their product is good, or even the best, and that it keeps improving.

It may be built from bubblegum and clotheshangers, but apparently that works just as well as good engineering.

At least in short run. It may give them headaches in the long run. But a company like anthropic will be ecstatically happy if they make it to the long run at all.

@bodil I use claude for programming and it takes a couple of iterations to get good code.

This versus "lets just ship it" approach. It's the same copied-from-stackoverflow quality you previously got from some offshore teams. Passes functional requirements = done.

@bodil I never use these tools, but yesterday I was using libcbor for the first time and decided giving chatgpt a try to generate a small snippet to decode a simple structure. It introduced two memory leaks (didn't call cbor_decref() on extra references it created). Told it if it shouldn't be calling cbor_decref() on those, and it confidently said you must not because those references were borrowed (wrong, and the documentation states clearly they aren't 🤷‍♂️).
@bodil It introducing memory leaks is specially bad, because the code "will work", but the leaks will cause problems in the future, when you are not debugging that code anymore. Yeah, slop is slop after all.

@bodil

It's not like we've all been cobbling stuff up in a hurry from StackExchange C&Ps for the last decade anyway.

Because there's a point where no matter what yr fkn agile velocity, the far end of the backlog is red shifting towards some management event horizon, and it now hurts too much to think properly.

So you do the thing that makes the pain go away because you are a thinking feeling human.

Jira does not care and adds another ticket with a t-shirt size attached.

I don't fault people (much) for this, but it does make me wonder at the tottering edifices of fail we've propped up with sticks and yaml to make... Something 'better'.

Like K15s needs the entirety of the cncf et al to make the experience something less awful than a power noise all dayer in the function room of a flat roofed pub. Why isn't that a warning that the entire premise of the thing is flawed?

They appear to have automated the soul crushing machine, but for money.

@bodil First you must realize Knuth could have subtitled "the art of programming" as the lazy art ;). So lazy, in fact, we create programs just to write programs for us.

Now, artificial ignorance is spewing back from it's training, which is largely a similar pile of...

On top of that pile, as information entropy suggests, there is even some degeneracy from that, which is why you can't have AI coprophagia. Rather than being a Yogi a little smarter than the average programmer, it is condemned to being a little dumber than the average of the training input. Simple physics.

@bodil And considering "we" couldn't actually figure out to prioritise test before LLMs does not fill me with confidence that the test phase will be prioritised now.

Because LLM introduction for most is about production velocity and cost cutting.

@bodil I don't fundamentally disagree, but that raises another interesting point. Just how little those qualities have caused Claude Code any significant failures.

Which is the other thing people say - if these tools can do anything of use where are the successful apps? Well kind of here, right? Claude Code is one of the most successful product launches of its kind, for quite some time. It works well enough to be considered a market leader. Apparently it's made of pure jank, but apparently, perhaps surprisingly, that doesn't matter so much? I mean, yikes.

I don't like it all that much as a tooling approach, but it's  coherent, performs well, and works reliably, when I have tried it.  (I've put in a few dozen hours maybe) Me being one of nature's outliers forever, I think Claude desktop is a better product, whatever that's worth 🤷 - but I don't use that as a coding agent much either.

@cms

It turns out when the user base has been conditioned to accept that the system randomly provides incorrect output ('hallucinates') when operating 'correctly', HOW it fails is not so important. Pull the lever and try again.

@bodil

@Orb2069 no, it doesn't "turn out" like that at all, I don't think that is a very accurate description of anything happening here.

c.f. https://jsomers.net/blog/it-turns-out
“It turns out” « the jsomers.net blog

@cms

Thanks for wasting my time with your language peeve.

@cms I've been told for my entire career that code quality isn't important as long as the code does the job with acceptable frequency. And, well, if you write CRUD apps for companies providing non-essential services, I suppose that's generally correct. And if I find out the software for my car's lane keeping feature has so much as a single line written by Claude in it, I'll be fucking selling that car immediately with zero regard for the evident success of Claude's web UI.
@bodil yes to all of that.

I suppose with regard to your car example, there's a strong case to be built there for certifiable safety standards and processes, like other forms of critical automated manufacturing.
@bodil it seems unlikely  to me that any brute forced language generating techniques can be provenly constrained in that way, cheaply and effectively enough.