It kind of feels like its going to something big happening in the press to get people to stop.
I was thinking an AI caused Therac 25, but maybe a copilot worm that wipes all windows 11 computers might get some outlawing AI code legislation.
It kind of feels like its going to something big happening in the press to get people to stop.
I was thinking an AI caused Therac 25, but maybe a copilot worm that wipes all windows 11 computers might get some outlawing AI code legislation.
The thing most likely to get people to stop is the end of the massive subsidies for its use that the VCs are currently pouring in.
Already firms are starting to panic a little about token use for things like Claude Code, and are putting limiters in their workers that really defeat the purpose of all of the "YOU MUST USE THIS OR BE FIRED" diktats. But operating indefinitely at those prices will bankrupt Anthropic soon.
So at some point the private equity love affair with everything AI will dry up (possibly because of a Iran war-induced financial crisis), and at that point it's going to be "my org can spend $50k annually on my personal Claude tokens to make me 20% more productive . . . or it could just hire a junior dev?"
There's a chance they manage to optimize this, or get it to work using a lighter weight model. But I think it's unlikely.
So far, from what I've seen, any time one of the subscription AI places put up their prices to something resembling actual operating costs (nevermind paying back gigantic sunk capital costs), users have screamed and then bolted.
Honestly, doing the really heavy duty Claude Code stuff that's getting pushed now will easily run to $50k per developer at current costs. And no, I don't see that as something that enterprises will ultimately be willing to swallow. Nor do I see a path for them to get the GPU cycle burn down easily.
What downward trend in average model complexity? What downward trend in energy consumption? They're both going up! Nobody can get the cost of inference to go down outside of going with discount models like Deepseek which are okay for spouting text but you can't get anywhere near the code quality of something like Claude Code (and even with CC, as the OP link says, quality is still something that only works well in certain languages and in certain situations, with lots and lots of guard rails).
Ed Zitron isn't everyone's cup of tea, but he's been watching the finances of this for a while and there's absolutely no sign of the burn rate slowing down or the cost of inference dropping.

Hi! If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words,
Anthropic gets some credit for getting Claude Code to actual usability and decent code if you spend enough time scolding and cajoling the model and manually forcing it through various code quality assurances. But they're not doing it on cheap models, they're doing it on the biggest, most expensive models which require the biggest and most expensive GPUs. You can't get those results out of Deepseek or Ollama or any of the smaller, cheaper models. The code quality goes right back into the toilet, no mater what guard rails you put on it.
Given the horrific mess that is the Claude Code source code (see this megathread for a walk through the chaos fractal that is Claude Code https://neuromatch.social/@jonny/116324676116121930) it's possible that they could tighten the hell out of it and clean up some of the immense noise in it to get some efficiency. But then what does that say about Claude Code's code quality?
As for the custom chips, I'm not sure how much more customized you can make a chip for ML models than what NVIDIA is cranking out, but at the very least here's what's going on with Microsoft's attempts to get Azure to work on smaller hardware. This is a really sobering read from a former MS system engineer.
Certainly, the capability of ARM chips to really change cloud computing if someone can get the ultra-efficient ones to scale up shouldn't be overlooked. And someone else who isn't Microsoft will probably figure it out (although AWS in particular is also staggering under its immense technical debt right now).
But there is just one titanic mess after another under the hoods of the major tech firms burning hundreds of billions of VC dollars right now.
https://isolveproblems.substack.com/p/how-microsoft-vaporized-a-trillion
@mirth @MichaelTBacon @cwebber
I think some press person estimated the ai companies costs were 10x their revenue. I don't remember which one though.
Its plausible theres improvements in both inference and training though.
I don't know how much progress there is on model collapse though.
And this does nothing about the feeling that the reason companies push AI is to make workers fear for their jobs and block unionization efforts. Also I think the main goal of current AI alignment is to make the AIs obedient to billionaires so they can have obedient secret police for their future kingdoms.
What trends make you think the costs will fall toward zero? Who right now is delivering high quality products with lower cost models?
@MichaelTBacon @mirth @cwebber
One my collaborators claims to be having good luck with an open weight coding model running on his local nvidia GPUs.
He liked devstral for coding, and heard that qwen3 is supposed to be good as well.
@MichaelTBacon @mirth @cwebber
I'm not even entirely sure that having an LLM code up a matplotlib plot is all that different from copying a plot out of the matplotlib gallery.
I don't the the grad students really understood either version. Either path is copy and paste followed by trial and error.
Yeah, I think it was XKCD like 10 years ago that said we're going to change the name of software development to "searching stackoverflow."
Now what the LLMs are doing is ingesting stackoverflow then using a half a kilowatt=hour to give a slightly neater answer.
Except that it's also slowly killing stackoverflow's engagement, so soon the LLM's answers are going to start getting out of date . . .
That's the thing, though, I don't think even the smallest trend is there. It's not that "it's early yet, but it's pointing in the right direction." I haven't seen anything that suggests there's any downward trend at all, no matter how small.
To get anything like equivalent performance out of the discount models like Deepseek, you have to run multiple instances in parallel or run a bunch of agents along with it.
The big breakthroughs in capability in the last year or two have all been about ramping *up* the power usage, model size, and GPU capacity, either by using bigger token windows or adding secondary agents. That's kind of some cool engineering, but power consumption or operating costs are just going up.
The main advantage of the discount models is that they have a much lower cost of training, mostly because (we suspect) they were trained off of responses from the bigger models.
I would be very interested in links showing what you say if you have them.
@MichaelTBacon @alienghic @cwebber This is a synthesis of what I've seen across years of doing compute related work plus reading the published benchmark data and some papers. In my opinion it is necessary to have a reasonable knowledge of at least what's been published before having specific opinions about what the technical trends are or are not. Here's a leaderboard for SWE Bench Verified, a reasonable gauge of one dimension of model strength:
I've gotten into a similar fight myself, basically with a crowd of people who were utterly convinced that Claude Code couldn't create anything that actually worked.
As the above example clearly details, it can make code that works (it made Claude Code, among other things, for better or for worse) but whether that's a benchmark of usefulness is another story.
I'm pushing back on your noting of trends, or at least pushing for clearer evidence that I can unpack myself, because I think it's a critical distinction. I'm still not seeing anything like the kinds of asymptotic curves that would be needed to make this stuff viable as a commercial success without the mega subsidies from VC/PE/PC sources. I *do* think it will be transformative, but not in any of the ways that are currently getting discussed.
But if there *is* some real trend towards real dramatic reductions in power/CPU usage per unit of useful output (whatever that is), it changes the story.
Can a $100k server cover $7k/month of subscribers though? I'm not at all sure of that, even for just servicing requests, particularly since you're going to have very spiky and uneven utilization and if those models are slow to respond, you're going to lose customers, fast.
Beyond that, a huge amount of the cost of the models is in the training. You can say, oh, sure, but once the training is done, they're good, but that means that your old model knows nothing about anything that's happened since it was trained.
I go back to Anthropic saying last August that its $200/month Max users were costing it $50k/month, each. 20-30% increases in performance are totally insufficient to scratch that problem. Some of it could be achieved by making Claude Code less of a shambolic train wreck of code, but that's not cheap either.
@mirth @alienghic It was in this part of the thread which got forked off:
"The Max tier tells a revealing story about the economics of flat-rate AI pricing. Internal data revealed that some $200/month Max users were costing Anthropic over $50,000 per month in compute. The tier was introduced specifically to manage this cost imbalance while retaining high-value users."
The point is that what's showing up as API charges or app use charges are nowhere near what it costs to run the model. I'm sure that $200/$50000 ratio is probably out at the extreme (which is why I rounded way down to just a 10x subsidy instead of a 250x subsidy) but the point is every "frontier" model seller is losing gigantic wads of cash on operating costs alone, and I don't see any corner getting turned towards bringing those costs under control.
Anthropic at the very least appears to have wrestled its training costs to be less than its annual revenue. So that's a start. Maybe they can get their operating costs down too. Their leadership seems to be the least full of lying assholes of most of the big AI chasers. So that's a positive for them.
The GlassWing or whatever announcement from today will turn heads, for sure. And it's a definitely not-terrible use for language models. But that's a very important but very niche application, not "code generation for everyone!"
https://nitter.net/ShanuMathew93/status/2041444857416126617#m
That seems entirely plausible for these days.