"I used AI. It worked. I hated it." by @mttaggart https://taggart-tech.com/reckoning/

This is a really good blogpost. And I"m sure it'll make some people unhappy to read whether they're pro or anti genAI. What's good about @mttaggart's blogpost is he talks honestly about how using Claude Code did actually solve the problem he set out to do. It needed various guardrails, but they were possible to set up, and the project worked. But the post is also completely clear and honest about how miserable it was:

- It removed the joy from the process
- If you aim to do the right thing and carefully evaluate the output, your job ends up eventually becoming "tapping the Y key"
- Ramifications on people learning things
- Plenty of other ethical analysis
- And the nagging wonder whether to use it next time, despite it being miserable.

I think this is important, because it *is* true that these tools are getting to the point where they can accomplish a lot of tasks, but the caveat space is very large (cotd)

I used AI. It worked. I hated it.

I used Claude Code to build a tool I needed. It worked great, but I was miserable. I need to reckon with what it means.

What I think is also good about the piece is that it shows how using this tech eventually funnels people down a particular direction. This is captured also by this exchange on lobste.rs: https://lobste.rs/s/7d8dxv/i_used_ai_it_worked_i_hated_it#c_7jirfk

The story that people start with vs where they go is very different:

- They're really just for experts, and are assistants, they don't write the code for you
- Okay the write a lot of the code for me, but I personally don't commit anything without reviewing
- YOLO mode

Which eventually leads you to becoming the drinky bird pressing the Y key from that Simpsons episode. (Funnily enough I wrote that in my comment on lobste.rs in reply to someone else before I had even gotten to the point where I saw that @mttaggart literally had that gif)

And at that point, you're checked out. All that's left is vibes.

And unfortunately, these systems don't survive that point very well. And neither do you, in your skills and abilities.

There are a lot of other concerns but I think since a lot of people on the fediverse are opposed to these tools, they might not be very familiar with where they're currently at ability-wise. @mttaggart provides a good description that they *are* capable of solving many problems you put in front of them... and that doesn't remove the other problems they generate or involved in their process.

The slop part isn't just the individual outputs, but the cumulation, and the effect on society itself.

Is that pushing the goalposts? It may be. I think "slop" used to be easier to dismiss when it came to code because it was obviously bad. Now when it's bad, it's non-obviously bad, which is part of its own problem. And cognitive debt, deskilling, and etc don't get factored into the quality of output aspect.

But unfortunately, the immediate reward aspects of these things are going to make it hard for society to recognize.

Let me add one more thing to this. It's said implicitly above but let's be explicit. The problem is that this pipeline effectively *undoes itself*.

Part of the reason this worked well for @mttaggart is carefully setting up guardrails and monitoring things.

But the very patterns of usage of these things makes it so that people either never develop the skills where they can, or are demotivated to provide that level of care over time.

Which means the system eventually moves towards a structure that degrades and shakes itself apart by the very patterns of usage.

I don't know how to solve this.

@cwebber @mttaggart

The problem is that this pipeline effectively undoes itself.AH! That... that is it. That is the phrase. That is the succinct way to describe (one of) the major problem(s) with all the "but it works!!" stories. I'd been feeling that but couldn't figure out how to phrase it, or identify it, exactly. Thank you.

@cwebber @mttaggart

I don't think anyone does...yet

@cwebber @mttaggart

And picking Rust, and the choice of problem space, and the model selection and....

All that variability introduces a lot of room for "it works great / it absolutely does not work!"

(I agree that "efficacy" is the wrong framing given all the negative externalities)

@[email protected] I was looking at a few products on Amazon today. I noticed that, for various products, the product description on the page itself contradicts itself in several places. First it is described as having feature X, then Y, then X again, then Z, and then Y once more. I think it would be a good task for an AI to check such self-contradictory pages and flag any inconsistencies. This is particularly relevant for very large websites such as Amazon, Wikipedia or GrokiPedia. However, proponents of AI seem to prefer letting their software do things for which it is unsuitable, whilst completely ignoring tasks for which it is actually well-suited. @[email protected]
@Life_is
LLMs aren't good at fact checking though. What IS the feature set of the product? Only the manufacturer or maybe people who already bought the object know what features it has. An LLM can flag "this is inconsistent" but it can't figure out what the truth is. it has no concept of the world.
@cwebber @mttaggart

@dlakelan @Life_is That could change with neurosymbolic programming. Which I believe is important, and the next step.

And, as it turns out, leads to dramatic structural improvements AND doesn't resolve any of the problems in @mttaggart's blogpost.

@dlakelan
If Amazon just added a big scare banner “this product page appears to be misleading” for every obvious contradiction (start with the ultra-high confidence cases only) would probably help a lot. Obviously, the lower the confidence threshold, the more false-positives without effective recurse.

Also for Wikipedia this wouldn’t work too well, I think. For instance in medicine there’s something “paradoxical effect”, where eg different dosages lead to literally completely opposite effects. I’ve had cases before where I researched a drug and thought “that cannot both be true” only to find that yes it is – at different dosages. I doubt the LLM would fare much better at this.
@Life_is @cwebber @mttaggart

@cwebber @mttaggart

I think software (in general) can do all the amazing things it currently does because most programmers were very, very careful.

We (as a society) don't fill harmful substances into drinking bottles. We don't let unisolated live wires hang from the ceiling. We put railings on stairs.

And then we give each developer a loaded footgun in the hope that they will be very gentle and careful with it.

The existence of genAI, especially in the hands of non-developers, means that a randomly selected piece of code is now much more likely to explode in your hands.

For me, this means: We need ways to execute code safely even if that code is horrible and malicious. Better sandboxing, more fine-grained permission systems, more use of good languages.

[Imagine a long rant about software components and the importance of good interfaces here, with stuff like "we should do immutable message parsing, not C ABI" strewn in.]

@cwebber exactly, and that would be my critique to this otherwise very, very interesting post : https://fediscience.org/@repepo/116339289260094252

the author is IMO under the fallacy that the LLMs *can* be useful when you are already experienced, and you are very careful checking the ouput. It might work for a while, it won’t work everytime.

@mttaggart

@ced @cwebber @mttaggart Eurgh. Flashbacks to back in the dark ages when there were long rants about how we should be using computers for things humans are bad at and having people do things computers are bad at.

"... very careful checking the output" is a canonical example of "things we write software for because we can't trust humans to do it consistently well and reliably in the long term".

@cwebber @mttaggart It's just part of the ongoing slow motion collapse of the Industrial Age. Capitalism has always been the enemy of sustainability, and it has always been destroying the skills entire subcultures of workers and craftspeople had been refining for centuries or even millennia within a few decades. There is already so much skill and knowledge that has been lost because somebody invented a way to do something cheaper by scaling it up and using more machines and fewer humans.
The end of the Industrial Age will be brutal. There are so many things nobody alive remembers how to do with preindustrial methods, and while we still have written descriptions and can probably figure something out, all the finer details that were never written down have to be discovered again. This is worse than the collapse of the Bronze Age.
@cwebber I'm worried
@shapr @cwebber I am also worried. One of the most infuriating things about this whole social phenomenon is the way that boosters interpret this worry as an anxiety about being "left behind" or some personal inadequacy. I have plenty of those but this isn't one of them: the anxiety is about the total collapse of the already-barely-sustainable systems we have limping along in software. It is a feast of seed corn
@cwebber for myself, this is exactly where I've had the goalposts set for exactly 3 years. If there are goalposts being moved, they aren't on this end of the field

@cwebber Also, I really recommend this talk. It's very clarifying in thinking about "how will AI change the world?"

As she says, AI doesn't have to be awesome at coding to upend some major social conventions.

https://slideslive.com/39055698/are-we-having-the-wrong-nightmares-about-ai

Zeynep Tufekci · Are We Having the Wrong Nightmares About AI? · SlidesLive

Professional Conference Recording

SlidesLive

@cwebber

It kind of feels like its going to something big happening in the press to get people to stop.

I was thinking an AI caused Therac 25, but maybe a copilot worm that wipes all windows 11 computers might get some outlawing AI code legislation.

@alienghic @cwebber

The thing most likely to get people to stop is the end of the massive subsidies for its use that the VCs are currently pouring in.

Already firms are starting to panic a little about token use for things like Claude Code, and are putting limiters in their workers that really defeat the purpose of all of the "YOU MUST USE THIS OR BE FIRED" diktats. But operating indefinitely at those prices will bankrupt Anthropic soon.

So at some point the private equity love affair with everything AI will dry up (possibly because of a Iran war-induced financial crisis), and at that point it's going to be "my org can spend $50k annually on my personal Claude tokens to make me 20% more productive . . . or it could just hire a junior dev?"

There's a chance they manage to optimize this, or get it to work using a lighter weight model. But I think it's unlikely.

@MichaelTBacon @alienghic @cwebber Yeah I think there's some use in some cases that work okay, but only make sense at the current financially unsustainable offers. As soon as the prices go up/everyone has to pay the piper, the cases where LLMs are useful won't be financially justifiable.

@ocdtrekkie @alienghic @cwebber

And I think when that happens, there's enough sunk capital into the models and built data centers that there will be a desperate search for some way to put those to effective use, and I think they'll find something. But I don't know what it's going to be (if I did I could make myself very, very rich, probably).

But I think the downsizing/de-skilling of this period that we're in the middle of is going to leave a gaping hole in the US's tech sector capacity, and I'm not sure it's going to recover.

@MichaelTBacon @alienghic @cwebber We are already past this point: When schools started giving kids iPads and Chromebooks instead of Windows PCs we ushered in a huge generation of people who don't understand the technology they use.
@MichaelTBacon @alienghic @cwebber Maybe we'll pivot all those GPUs back to crypto, lol.

@ocdtrekkie @MichaelTBacon @alienghic @cwebber

I'm by far no windows fan (more a list of reasons than one) and I would love to see the next generation learning Unix scripts and ssh and more... but I agree that an iPad is not the gateway to a proper shell. :(

If they even forget what a folder structure is, then we get a problem.

@ChristianRiegel @ocdtrekkie @alienghic @cwebber Most people already don't know what a directory tree is, in a large part thanks to Apple.

IMHO this is fundamental computer knowledge that should be taught at grade school level, maybe early high school.

edit: Then again @MichaelTBacon makes a good point here: https://social.coop/@MichaelTBacon/116349207112344811. Perhaps it's more about being aware of where you store files and that one has agency in that action that should be taught.

Michael Bacon (@[email protected])

@[email protected] @[email protected] @cwebber Yeah, if you look at how the big object stores are managed, folders are just a naming convention, not anything that represents the actual way the data are stored on disk. I like folders and still use them of course. But I'm not sure they're a critical element of understanding computing the way they used to be.

social.coop
@ChristianRiegel @ocdtrekkie @MichaelTBacon @alienghic @cwebber yeah, the best thing would have been to give those kids Linux laptops
@LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber @alienghic For those that want to explore the technology, yes. For those who want to use it and not think about it too much (which is FINE), an iPad or a Neo is a fine place to start.

@MichaelTBacon @LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber

In thinking about this thread some, I think the most fundamental question for is an OS good for discovery, is can you build applications for that OS on itself.

I think that's really why iOS and Android are inferior to Windows, Mac, Linux.

It looks like ChromeOS can build Android apps at least.

@alienghic @LunaDragofelis @ChristianRiegel @ocdtrekkie @cwebber

Yeah, and thankfully I think Neo hails the end of "let's try to make iPadOS into a real notebook OS!" It's crippleware on the hardware at this point, and they could spend lots of time and money continuing to try to warp iOS into something that can be functional . . . or they can do what they did and just let you run MacOS on iPad-level hardware.

@ocdtrekkie @alienghic @cwebber

I agree with a lot of your posts on this thread but not this one. There's nothing inherently superior about a Windows PC over a Chromebook in terms of understanding what's happening underneath, just a ton of headaches, mysterious error messages, and unnecessary software compat errors.

If we want kids to understand the tech, give them something where they have to use a shell. Most kids and most adults don't need to know that. But the ones who do will find their way anyway.

I got given an account on an AIX workstation when I was 16, and shortly thereafter we got a bunch of Sun workstations. That was when I learned the first of the tech skills I'm still using today at age 49. I've forgotten most of what I knew before then because who the hell needs a .BAT file or how to manage a TSR program, or how to track down an IRQ conflict?

@MichaelTBacon @ocdtrekkie @cwebber

Teachers have noticed that people who grow up on just mobile OSes never learn how to use hierarchical file systems on their own.

The File -> open/save convention of older desktop software is completely unfamiliar to them.

Does this need more than for someone to realize they need teach how to use a hierarchical file system?

That I don't know

@alienghic @MichaelTBacon @cwebber Yeah that's exactly the stuff I'm talking about. We abstracted basic competency out of our software! Stuff people used to learn as a kid we're gonna end up having to train people about at the start of computer science in college!

@ocdtrekkie @MichaelTBacon @cwebber

How important is it to understanding a computer though?

It's not that important to understand how the CP/M, Apple ][, C 64 filesystems worked these days.

@alienghic @ocdtrekkie @cwebber

Yeah, if you look at how the big object stores are managed, folders are just a naming convention, not anything that represents the actual way the data are stored on disk.

I like folders and still use them of course. But I'm not sure they're a critical element of understanding computing the way they used to be.

@alienghic @MichaelTBacon @ocdtrekkie @cwebber

I try to un-learn hierarchical file systems for certain things 🙂 I grew up with them, but e.g. the times when I had emails in subfolder or a tree of browser bookmarks are over. Using tags now.

@MichaelTBacon @ocdtrekkie @alienghic The fun part is, despite all the calls that "you're gonna be left behind!", there's a good chance that the industry is going to desperately need those who have retained their skills, so actually not switching to genAI tools might be a better way to not be left behind
@cwebber @MichaelTBacon @alienghic We're gonna be like COBOL programmers are today. Few in number and vital to the functioning of modern society.
@ocdtrekkie @cwebber @MichaelTBacon @alienghic Gonna use the "hard skills renaissance" to fund my retirement.

@cwebber @MichaelTBacon @ocdtrekkie

What I want to know is can an agent fill out my expense report, order more toner for the lab printer, or hunt my boss down for an account number.

I want it to do the boring sucky parts of my job, not the parts I like.

Oh god, the most horrifying idea.

An agent that pings you to say, "I just wanted to mention that your actions look a lot like a micro-aggression, perhaps you should take a break and calm down".

(Clippy for office behavior)

@alienghic @cwebber @MichaelTBacon As an IT professional, *reviewing gigabytes of log files I generate daily to look for attackers* is absolutely something I want AI tools to do. But also that *has* to be local because of the potential sensitivity of the content.

@ocdtrekkie @cwebber @MichaelTBacon

Though I'm not sure a transformer LLM is the best tool for analyzing logs.

Maybe there's a way to get to be a more forgiving parser for logs? But the analyzing logs for something exceptional that needs to be responded to is probably some other technique.

@alienghic
That sounds like the *ideal* application for machine learning techniques; patterns associated with an attacker should be unusual and therefore have a relatively high Shannon entropy, but in order to detect them you'd need to develop a reliable model of benign access patterns.

@ocdtrekkie @cwebber @MichaelTBacon

@krans @alienghic @cwebber @MichaelTBacon There's a company called Darktrace that will sell you an incredibly expensive on-premise server to do machine learning anomaly detection on every single packet in your network. It's very cool. It's also very, very expensive.
@krans @alienghic @cwebber @MichaelTBacon The problem is it is very hard to convince bean counters to spend massive amounts of money on looking for things that might or might not be there. The use case that's most interesting is very hard to sell the cost of!
@ocdtrekkie @alienghic @cwebber @MichaelTBacon This is exactly my forlorn hope: that they’ll need us old folks to keep the lights on when the crash is over and the kids can’t use any reliable tools.
@cwebber @MichaelTBacon @alienghic There's a steady chant of "Roman steel" in my head whenever these discussions come up, like how they need low-background steel unexposed to the first nuclear detonations of the 1940s and 50s for certain projects like particle detectors, and this pristine steel is sourced from ruins as ancient as Roman shipwrecks. The widespread use of LLMs is the nuclear detonation of skillsets. Fortunately devs don't need to sleep under the ocean for millennia like some grognard Cthulhu to keep their brains low-radiation, they just need to not be okay with the million ills of these models while being okay with being called technophobic and behind the times.
@ocdtrekkie @alienghic @cwebber @MichaelTBacon Yes, this is a strange technology that eats its own. It can only exist because their is a body of well written code. As more of its output becomes its input, it will degrade. As the user relies more on it, their skills erode. Talk about killing the goose of the golden eggs
@cwebber @MichaelTBacon @ocdtrekkie @alienghic after 25 years as a programmer, and having shifted to a different career I am finally trying to learn about what I have been doing all this time. That is focusing on learning programming instead of just intuitively building web apps.
@MichaelTBacon @alienghic @cwebber I don't see a stopping in the near term. PE hasn't done a lot of real AI deals, I think that's blocked on a lack of proven playbooks. VCs are making bets but the actual end user value is pretty unclear. Having studied this area and its trajectory quite a bit over the last year I think the unit economics of API serving are already approximately sustainable, and the models and hardware designs continue to get cheaper for a given level of performance. ...
@MichaelTBacon @alienghic @cwebber ... Right now the big firms are loading up on cash and I think working hard to cut the cost of their subscription products (e.g. ChatGPT or Claude) to the point where they'll be able to run unsubsidized in the near future at something not far from the current output quality and pricing. Their priority appears to be to sell more seats at low cost (Claude Enterprise starts at $20 a seat) and hope that they can get entrenched before starting to ramp prices up.
@MichaelTBacon @alienghic @cwebber I haven't seen any hard data, but spending enough time in tech industry circles it seems to be working.

@mirth @alienghic @cwebber

So far, from what I've seen, any time one of the subscription AI places put up their prices to something resembling actual operating costs (nevermind paying back gigantic sunk capital costs), users have screamed and then bolted.

Honestly, doing the really heavy duty Claude Code stuff that's getting pushed now will easily run to $50k per developer at current costs. And no, I don't see that as something that enterprises will ultimately be willing to swallow. Nor do I see a path for them to get the GPU cycle burn down easily.

@cwebber I relate to so much of what this article is saying. A lot of the fediverse are people who haven't used AI and hate it based on assumptions and outdated experiences. Then there are those of us who use it (often because we have to) and hate it, and it's a different kind of hate. I also have noticed that it works so much better with Rust. Without guardrails, even Opus 4.6 just makes bizarre decisions. Even doing old-fashioned things (like using -1 to mean uninitialized).
@thomasjwebb @cwebber I'm not a programmer; I'm a social scientist. I occasionally converse with chatgpt to test its behavior in that domain. When it fails, I feel unsatisfied. When it passes, I feel unsatisfied. I find myself wanting to convince it that it shouldn't exist, as if it were a "someone" that could be convinced. I don't know if this puts me in camp one or two
@independentpen @cwebber I think one issue is that a lot of this really is a social science question that developers treat as only an engineering problem. I and the author of the blogpost relate to being dissatisfied even with valid outputs. It’s like someone being “correct” but not in a way that gives us any confidence their underlying model is good.

@cwebber

I don't care how capable they are or if they get better. As the blog post points out, the genesis of this technology is theft and abuse and the point is to replace people with machines that can't complain, ask for days off, or join a union.

I avoid them at work except where not using them would get me fired. Right now, that's a very small footprint but, as every single vendor we have is bundling that shit in, it will only grow. I'm going to have to use them soon. I don't want to but I haven't the luxury of quitting or getting fired because my wife and I would not survive without the health insurance.

At least, in my personal life, I can continue to resist and push back.

@jrdepriest @cwebber there's an account on here called Simple Sabotage that may be of interest @simple_sabotage