"I used AI. It worked. I hated it." by @mttaggart https://taggart-tech.com/reckoning/

This is a really good blogpost. And I"m sure it'll make some people unhappy to read whether they're pro or anti genAI. What's good about @mttaggart's blogpost is he talks honestly about how using Claude Code did actually solve the problem he set out to do. It needed various guardrails, but they were possible to set up, and the project worked. But the post is also completely clear and honest about how miserable it was:

- It removed the joy from the process
- If you aim to do the right thing and carefully evaluate the output, your job ends up eventually becoming "tapping the Y key"
- Ramifications on people learning things
- Plenty of other ethical analysis
- And the nagging wonder whether to use it next time, despite it being miserable.

I think this is important, because it *is* true that these tools are getting to the point where they can accomplish a lot of tasks, but the caveat space is very large (cotd)

I used AI. It worked. I hated it.

I used Claude Code to build a tool I needed. It worked great, but I was miserable. I need to reckon with what it means.

What I think is also good about the piece is that it shows how using this tech eventually funnels people down a particular direction. This is captured also by this exchange on lobste.rs: https://lobste.rs/s/7d8dxv/i_used_ai_it_worked_i_hated_it#c_7jirfk

The story that people start with vs where they go is very different:

- They're really just for experts, and are assistants, they don't write the code for you
- Okay the write a lot of the code for me, but I personally don't commit anything without reviewing
- YOLO mode

Which eventually leads you to becoming the drinky bird pressing the Y key from that Simpsons episode. (Funnily enough I wrote that in my comment on lobste.rs in reply to someone else before I had even gotten to the point where I saw that @mttaggart literally had that gif)

And at that point, you're checked out. All that's left is vibes.

And unfortunately, these systems don't survive that point very well. And neither do you, in your skills and abilities.

There are a lot of other concerns but I think since a lot of people on the fediverse are opposed to these tools, they might not be very familiar with where they're currently at ability-wise. @mttaggart provides a good description that they *are* capable of solving many problems you put in front of them... and that doesn't remove the other problems they generate or involved in their process.

The slop part isn't just the individual outputs, but the cumulation, and the effect on society itself.

Is that pushing the goalposts? It may be. I think "slop" used to be easier to dismiss when it came to code because it was obviously bad. Now when it's bad, it's non-obviously bad, which is part of its own problem. And cognitive debt, deskilling, and etc don't get factored into the quality of output aspect.

But unfortunately, the immediate reward aspects of these things are going to make it hard for society to recognize.

Let me add one more thing to this. It's said implicitly above but let's be explicit. The problem is that this pipeline effectively *undoes itself*.

Part of the reason this worked well for @mttaggart is carefully setting up guardrails and monitoring things.

But the very patterns of usage of these things makes it so that people either never develop the skills where they can, or are demotivated to provide that level of care over time.

Which means the system eventually moves towards a structure that degrades and shakes itself apart by the very patterns of usage.

I don't know how to solve this.

@cwebber @mttaggart

The problem is that this pipeline effectively undoes itself.AH! That... that is it. That is the phrase. That is the succinct way to describe (one of) the major problem(s) with all the "but it works!!" stories. I'd been feeling that but couldn't figure out how to phrase it, or identify it, exactly. Thank you.

@cwebber @mttaggart

I don't think anyone does...yet

@cwebber @mttaggart

And picking Rust, and the choice of problem space, and the model selection and....

All that variability introduces a lot of room for "it works great / it absolutely does not work!"

(I agree that "efficacy" is the wrong framing given all the negative externalities)

@cwebber I'm worried
@shapr @cwebber I am also worried. One of the most infuriating things about this whole social phenomenon is the way that boosters interpret this worry as an anxiety about being "left behind" or some personal inadequacy. I have plenty of those but this isn't one of them: the anxiety is about the total collapse of the already-barely-sustainable systems we have limping along in software. It is a feast of seed corn
@glyph “It is a feast of seed corn.” *shivers*
@glyph @shapr @cwebber @dasparky Yes, exactly. My worry is that we’re burning everything down, not that I don’t know how to use a lighter.
@cwebber for myself, this is exactly where I've had the goalposts set for exactly 3 years. If there are goalposts being moved, they aren't on this end of the field

@cwebber Also, I really recommend this talk. It's very clarifying in thinking about "how will AI change the world?"

As she says, AI doesn't have to be awesome at coding to upend some major social conventions.

https://slideslive.com/39055698/are-we-having-the-wrong-nightmares-about-ai

Zeynep Tufekci · Are We Having the Wrong Nightmares About AI? · SlidesLive

Professional Conference Recording

SlidesLive

@cwebber

It kind of feels like its going to something big happening in the press to get people to stop.

I was thinking an AI caused Therac 25, but maybe a copilot worm that wipes all windows 11 computers might get some outlawing AI code legislation.

@alienghic @cwebber

The thing most likely to get people to stop is the end of the massive subsidies for its use that the VCs are currently pouring in.

Already firms are starting to panic a little about token use for things like Claude Code, and are putting limiters in their workers that really defeat the purpose of all of the "YOU MUST USE THIS OR BE FIRED" diktats. But operating indefinitely at those prices will bankrupt Anthropic soon.

So at some point the private equity love affair with everything AI will dry up (possibly because of a Iran war-induced financial crisis), and at that point it's going to be "my org can spend $50k annually on my personal Claude tokens to make me 20% more productive . . . or it could just hire a junior dev?"

There's a chance they manage to optimize this, or get it to work using a lighter weight model. But I think it's unlikely.

@MichaelTBacon @alienghic @cwebber Yeah I think there's some use in some cases that work okay, but only make sense at the current financially unsustainable offers. As soon as the prices go up/everyone has to pay the piper, the cases where LLMs are useful won't be financially justifiable.

@ocdtrekkie @alienghic @cwebber

And I think when that happens, there's enough sunk capital into the models and built data centers that there will be a desperate search for some way to put those to effective use, and I think they'll find something. But I don't know what it's going to be (if I did I could make myself very, very rich, probably).

But I think the downsizing/de-skilling of this period that we're in the middle of is going to leave a gaping hole in the US's tech sector capacity, and I'm not sure it's going to recover.

@MichaelTBacon @alienghic @cwebber We are already past this point: When schools started giving kids iPads and Chromebooks instead of Windows PCs we ushered in a huge generation of people who don't understand the technology they use.
@MichaelTBacon @alienghic @cwebber Maybe we'll pivot all those GPUs back to crypto, lol.

@ocdtrekkie @MichaelTBacon @alienghic @cwebber

I'm by far no windows fan (more a list of reasons than one) and I would love to see the next generation learning Unix scripts and ssh and more... but I agree that an iPad is not the gateway to a proper shell. :(

If they even forget what a folder structure is, then we get a problem.

@ChristianRiegel @ocdtrekkie @alienghic @cwebber Most people already don't know what a directory tree is, in a large part thanks to Apple.

IMHO this is fundamental computer knowledge that should be taught at grade school level, maybe early high school.

edit: Then again @MichaelTBacon makes a good point here: https://social.coop/@MichaelTBacon/116349207112344811. Perhaps it's more about being aware of where you store files and that one has agency in that action that should be taught.

Michael Bacon (@[email protected])

@[email protected] @[email protected] @cwebber Yeah, if you look at how the big object stores are managed, folders are just a naming convention, not anything that represents the actual way the data are stored on disk. I like folders and still use them of course. But I'm not sure they're a critical element of understanding computing the way they used to be.

social.coop
@ChristianRiegel @ocdtrekkie @MichaelTBacon @alienghic @cwebber yeah, the best thing would have been to give those kids Linux laptops
@LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber @alienghic For those that want to explore the technology, yes. For those who want to use it and not think about it too much (which is FINE), an iPad or a Neo is a fine place to start.

@MichaelTBacon @LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber

In thinking about this thread some, I think the most fundamental question for is an OS good for discovery, is can you build applications for that OS on itself.

I think that's really why iOS and Android are inferior to Windows, Mac, Linux.

It looks like ChromeOS can build Android apps at least.

@alienghic @LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber

Yeah, and thankfully I think Neo hails the end of "let's try to make iPadOS into a real notebook OS!" It's crippleware on the hardware at this point, and they could spend lots of time and money continuing to try to warp iOS into something that can be functional . . . or they can do what they did and just let you run MacOS on iPad-level hardware.

@ocdtrekkie @alienghic @cwebber

I agree with a lot of your posts on this thread but not this one. There's nothing inherently superior about a Windows PC over a Chromebook in terms of understanding what's happening underneath, just a ton of headaches, mysterious error messages, and unnecessary software compat errors.

If we want kids to understand the tech, give them something where they have to use a shell. Most kids and most adults don't need to know that. But the ones who do will find their way anyway.

I got given an account on an AIX workstation when I was 16, and shortly thereafter we got a bunch of Sun workstations. That was when I learned the first of the tech skills I'm still using today at age 49. I've forgotten most of what I knew before then because who the hell needs a .BAT file or how to manage a TSR program, or how to track down an IRQ conflict?

@MichaelTBacon @ocdtrekkie @cwebber

Teachers have noticed that people who grow up on just mobile OSes never learn how to use hierarchical file systems on their own.

The File -> open/save convention of older desktop software is completely unfamiliar to them.

Does this need more than for someone to realize they need teach how to use a hierarchical file system?

That I don't know

@alienghic @MichaelTBacon @cwebber Yeah that's exactly the stuff I'm talking about. We abstracted basic competency out of our software! Stuff people used to learn as a kid we're gonna end up having to train people about at the start of computer science in college!

@ocdtrekkie @MichaelTBacon @cwebber

How important is it to understanding a computer though?

It's not that important to understand how the CP/M, Apple ][, C 64 filesystems worked these days.

@alienghic @ocdtrekkie @cwebber

Yeah, if you look at how the big object stores are managed, folders are just a naming convention, not anything that represents the actual way the data are stored on disk.

I like folders and still use them of course. But I'm not sure they're a critical element of understanding computing the way they used to be.

@alienghic @MichaelTBacon @ocdtrekkie @cwebber

I try to un-learn hierarchical file systems for certain things 🙂 I grew up with them, but e.g. the times when I had emails in subfolder or a tree of browser bookmarks are over. Using tags now.

@MichaelTBacon @ocdtrekkie @alienghic The fun part is, despite all the calls that "you're gonna be left behind!", there's a good chance that the industry is going to desperately need those who have retained their skills, so actually not switching to genAI tools might be a better way to not be left behind
@cwebber @MichaelTBacon @alienghic We're gonna be like COBOL programmers are today. Few in number and vital to the functioning of modern society.
@ocdtrekkie @cwebber @MichaelTBacon @alienghic Gonna use the "hard skills renaissance" to fund my retirement.

@cwebber @MichaelTBacon @ocdtrekkie

What I want to know is can an agent fill out my expense report, order more toner for the lab printer, or hunt my boss down for an account number.

I want it to do the boring sucky parts of my job, not the parts I like.

Oh god, the most horrifying idea.

An agent that pings you to say, "I just wanted to mention that your actions look a lot like a micro-aggression, perhaps you should take a break and calm down".

(Clippy for office behavior)

@alienghic @cwebber @MichaelTBacon As an IT professional, *reviewing gigabytes of log files I generate daily to look for attackers* is absolutely something I want AI tools to do. But also that *has* to be local because of the potential sensitivity of the content.

@ocdtrekkie @cwebber @MichaelTBacon

Though I'm not sure a transformer LLM is the best tool for analyzing logs.

Maybe there's a way to get to be a more forgiving parser for logs? But the analyzing logs for something exceptional that needs to be responded to is probably some other technique.

@alienghic
That sounds like the *ideal* application for machine learning techniques; patterns associated with an attacker should be unusual and therefore have a relatively high Shannon entropy, but in order to detect them you'd need to develop a reliable model of benign access patterns.

@ocdtrekkie @cwebber @MichaelTBacon

@krans @alienghic @cwebber @MichaelTBacon There's a company called Darktrace that will sell you an incredibly expensive on-premise server to do machine learning anomaly detection on every single packet in your network. It's very cool. It's also very, very expensive.
@krans @alienghic @cwebber @MichaelTBacon The problem is it is very hard to convince bean counters to spend massive amounts of money on looking for things that might or might not be there. The use case that's most interesting is very hard to sell the cost of!
@ocdtrekkie @alienghic @cwebber @MichaelTBacon This is exactly my forlorn hope: that they’ll need us old folks to keep the lights on when the crash is over and the kids can’t use any reliable tools.
@cwebber @MichaelTBacon @alienghic There's a steady chant of "Roman steel" in my head whenever these discussions come up, like how they need low-background steel unexposed to the first nuclear detonations of the 1940s and 50s for certain projects like particle detectors, and this pristine steel is sourced from ruins as ancient as Roman shipwrecks. The widespread use of LLMs is the nuclear detonation of skillsets. Fortunately devs don't need to sleep under the ocean for millennia like some grognard Cthulhu to keep their brains low-radiation, they just need to not be okay with the million ills of these models while being okay with being called technophobic and behind the times.
@ocdtrekkie @alienghic @cwebber @MichaelTBacon Yes, this is a strange technology that eats its own. It can only exist because their is a body of well written code. As more of its output becomes its input, it will degrade. As the user relies more on it, their skills erode. Talk about killing the goose of the golden eggs
@cwebber @MichaelTBacon @ocdtrekkie @alienghic after 25 years as a programmer, and having shifted to a different career I am finally trying to learn about what I have been doing all this time. That is focusing on learning programming instead of just intuitively building web apps.
@MichaelTBacon @alienghic @cwebber I don't see a stopping in the near term. PE hasn't done a lot of real AI deals, I think that's blocked on a lack of proven playbooks. VCs are making bets but the actual end user value is pretty unclear. Having studied this area and its trajectory quite a bit over the last year I think the unit economics of API serving are already approximately sustainable, and the models and hardware designs continue to get cheaper for a given level of performance. ...
@MichaelTBacon @alienghic @cwebber ... Right now the big firms are loading up on cash and I think working hard to cut the cost of their subscription products (e.g. ChatGPT or Claude) to the point where they'll be able to run unsubsidized in the near future at something not far from the current output quality and pricing. Their priority appears to be to sell more seats at low cost (Claude Enterprise starts at $20 a seat) and hope that they can get entrenched before starting to ramp prices up.
@MichaelTBacon @alienghic @cwebber I haven't seen any hard data, but spending enough time in tech industry circles it seems to be working.

@mirth @alienghic @cwebber

So far, from what I've seen, any time one of the subscription AI places put up their prices to something resembling actual operating costs (nevermind paying back gigantic sunk capital costs), users have screamed and then bolted.

Honestly, doing the really heavy duty Claude Code stuff that's getting pushed now will easily run to $50k per developer at current costs. And no, I don't see that as something that enterprises will ultimately be willing to swallow. Nor do I see a path for them to get the GPU cycle burn down easily.

@MichaelTBacon @alienghic @cwebber That math sounds way off. Assuming a monthly usage of 5M tokens for day to day developer usage, at the current Claude API costs, and billing them all at the highest rate ($25 per M), that's $125 per month at current pricing. It's a long way from there to $50k, and surveying the trajectory over the last couple years as well as models from some of the Chinese labs it's pretty clear that model size necessary to do these tasks is trending down.
@MichaelTBacon @alienghic @cwebber The other thing happening is there are many efforts to build special-purpose chips for these workloads, and some will eventually pan out. Big neural nets on GPUs are extremely wasteful in energy terms, and even though many people seem to think that approach is horribly wrong it's become "too big to fail" in a way that will encourage investment into new chips until something sticks.
@MichaelTBacon @alienghic @cwebber Combine a downward trend in average model complexity (by usage) and downward trend in energy consumption (on new hardware) on top of a typical usage that currently costs perhaps $100 - $1000 at the high end... I can easily see a world of $500/month/seat subscriptions without any structural changes. I'm not saying it's good or that I like it, but based on the best information I can find I don't think the "price explosion" scenario is plausible.

@mirth @alienghic @cwebber

What downward trend in average model complexity? What downward trend in energy consumption? They're both going up! Nobody can get the cost of inference to go down outside of going with discount models like Deepseek which are okay for spouting text but you can't get anywhere near the code quality of something like Claude Code (and even with CC, as the OP link says, quality is still something that only works well in certain languages and in certain situations, with lots and lots of guard rails).

Ed Zitron isn't everyone's cup of tea, but he's been watching the finances of this for a while and there's absolutely no sign of the burn rate slowing down or the cost of inference dropping.

https://www.wheresyoured.at/the-subprime-ai-crisis-is-here/

The Subprime AI Crisis Is Here

Hi! If you like this piece and want to support my independent reporting and analysis, why not subscribe to my premium newsletter? It’s $70 a year, or $7 a month, and in return you get a weekly newsletter that’s usually anywhere from 5,000 to 18,000 words,

Ed Zitron's Where's Your Ed At

@mirth @alienghic @cwebber

Anthropic gets some credit for getting Claude Code to actual usability and decent code if you spend enough time scolding and cajoling the model and manually forcing it through various code quality assurances. But they're not doing it on cheap models, they're doing it on the biggest, most expensive models which require the biggest and most expensive GPUs. You can't get those results out of Deepseek or Ollama or any of the smaller, cheaper models. The code quality goes right back into the toilet, no mater what guard rails you put on it.

Given the horrific mess that is the Claude Code source code (see this megathread for a walk through the chaos fractal that is Claude Code https://neuromatch.social/@jonny/116324676116121930) it's possible that they could tighten the hell out of it and clean up some of the immense noise in it to get some efficiency. But then what does that say about Claude Code's code quality?

@mirth @alienghic @cwebber

As for the custom chips, I'm not sure how much more customized you can make a chip for ML models than what NVIDIA is cranking out, but at the very least here's what's going on with Microsoft's attempts to get Azure to work on smaller hardware. This is a really sobering read from a former MS system engineer.

Certainly, the capability of ARM chips to really change cloud computing if someone can get the ultra-efficient ones to scale up shouldn't be overlooked. And someone else who isn't Microsoft will probably figure it out (although AWS in particular is also staggering under its immense technical debt right now).

But there is just one titanic mess after another under the hoods of the major tech firms burning hundreds of billions of VC dollars right now.

https://isolveproblems.substack.com/p/how-microsoft-vaporized-a-trillion

How Microsoft Vaporized a Trillion Dollars

Inside the complacency and decisions that eroded trust in Azure—from a former Azure Core engineer.

Axel’s Substack

@mirth @alienghic @cwebber

The point is that current pricing is paying for about 10% of the actual operating costs of running the services, and many customers are already finding the token fees onerous or the monthly limits too low to use to its potential on a daily basis. There is no AI product outside of NVIDIA right now where the revenues are more than like 30% of operating costs, and most are well below 15%.

At some point, that VC/PE/PC subsidy is going to dry up and either the subscription costs will have to go up or they will have to find a way to get the same level of quality out of smaller models, cheaper hardware, or some other cut. And despite that being the very strong goal of most of the big AI model holders and hundreds of billions of in R&D costs, nobody has managed that yet.

@cwebber I relate to so much of what this article is saying. A lot of the fediverse are people who haven't used AI and hate it based on assumptions and outdated experiences. Then there are those of us who use it (often because we have to) and hate it, and it's a different kind of hate. I also have noticed that it works so much better with Rust. Without guardrails, even Opus 4.6 just makes bizarre decisions. Even doing old-fashioned things (like using -1 to mean uninitialized).
@thomasjwebb @cwebber I'm not a programmer; I'm a social scientist. I occasionally converse with chatgpt to test its behavior in that domain. When it fails, I feel unsatisfied. When it passes, I feel unsatisfied. I find myself wanting to convince it that it shouldn't exist, as if it were a "someone" that could be convinced. I don't know if this puts me in camp one or two
@independentpen @cwebber I think one issue is that a lot of this really is a social science question that developers treat as only an engineering problem. I and the author of the blogpost relate to being dissatisfied even with valid outputs. It’s like someone being “correct” but not in a way that gives us any confidence their underlying model is good.

@cwebber

I don't care how capable they are or if they get better. As the blog post points out, the genesis of this technology is theft and abuse and the point is to replace people with machines that can't complain, ask for days off, or join a union.

I avoid them at work except where not using them would get me fired. Right now, that's a very small footprint but, as every single vendor we have is bundling that shit in, it will only grow. I'm going to have to use them soon. I don't want to but I haven't the luxury of quitting or getting fired because my wife and I would not survive without the health insurance.

At least, in my personal life, I can continue to resist and push back.

@jrdepriest @cwebber there's an account on here called Simple Sabotage that may be of interest @simple_sabotage