"I used AI. It worked. I hated it." by @mttaggart https://taggart-tech.com/reckoning/

This is a really good blogpost. And I"m sure it'll make some people unhappy to read whether they're pro or anti genAI. What's good about @mttaggart's blogpost is he talks honestly about how using Claude Code did actually solve the problem he set out to do. It needed various guardrails, but they were possible to set up, and the project worked. But the post is also completely clear and honest about how miserable it was:

- It removed the joy from the process
- If you aim to do the right thing and carefully evaluate the output, your job ends up eventually becoming "tapping the Y key"
- Ramifications on people learning things
- Plenty of other ethical analysis
- And the nagging wonder whether to use it next time, despite it being miserable.

I think this is important, because it *is* true that these tools are getting to the point where they can accomplish a lot of tasks, but the caveat space is very large (cotd)

I used AI. It worked. I hated it.

I used Claude Code to build a tool I needed. It worked great, but I was miserable. I need to reckon with what it means.

What I think is also good about the piece is that it shows how using this tech eventually funnels people down a particular direction. This is captured also by this exchange on lobste.rs: https://lobste.rs/s/7d8dxv/i_used_ai_it_worked_i_hated_it#c_7jirfk

The story that people start with vs where they go is very different:

- They're really just for experts, and are assistants, they don't write the code for you
- Okay the write a lot of the code for me, but I personally don't commit anything without reviewing
- YOLO mode

Which eventually leads you to becoming the drinky bird pressing the Y key from that Simpsons episode. (Funnily enough I wrote that in my comment on lobste.rs in reply to someone else before I had even gotten to the point where I saw that @mttaggart literally had that gif)

And at that point, you're checked out. All that's left is vibes.

And unfortunately, these systems don't survive that point very well. And neither do you, in your skills and abilities.

There are a lot of other concerns but I think since a lot of people on the fediverse are opposed to these tools, they might not be very familiar with where they're currently at ability-wise. @mttaggart provides a good description that they *are* capable of solving many problems you put in front of them... and that doesn't remove the other problems they generate or involved in their process.

The slop part isn't just the individual outputs, but the cumulation, and the effect on society itself.

Is that pushing the goalposts? It may be. I think "slop" used to be easier to dismiss when it came to code because it was obviously bad. Now when it's bad, it's non-obviously bad, which is part of its own problem. And cognitive debt, deskilling, and etc don't get factored into the quality of output aspect.

But unfortunately, the immediate reward aspects of these things are going to make it hard for society to recognize.

Let me add one more thing to this. It's said implicitly above but let's be explicit. The problem is that this pipeline effectively *undoes itself*.

Part of the reason this worked well for @mttaggart is carefully setting up guardrails and monitoring things.

But the very patterns of usage of these things makes it so that people either never develop the skills where they can, or are demotivated to provide that level of care over time.

Which means the system eventually moves towards a structure that degrades and shakes itself apart by the very patterns of usage.

I don't know how to solve this.

@cwebber @mttaggart

The problem is that this pipeline effectively undoes itself.AH! That... that is it. That is the phrase. That is the succinct way to describe (one of) the major problem(s) with all the "but it works!!" stories. I'd been feeling that but couldn't figure out how to phrase it, or identify it, exactly. Thank you.

@cwebber @mttaggart

I don't think anyone does...yet

@cwebber @mttaggart

And picking Rust, and the choice of problem space, and the model selection and....

All that variability introduces a lot of room for "it works great / it absolutely does not work!"

(I agree that "efficacy" is the wrong framing given all the negative externalities)

@[email protected] I was looking at a few products on Amazon today. I noticed that, for various products, the product description on the page itself contradicts itself in several places. First it is described as having feature X, then Y, then X again, then Z, and then Y once more. I think it would be a good task for an AI to check such self-contradictory pages and flag any inconsistencies. This is particularly relevant for very large websites such as Amazon, Wikipedia or GrokiPedia. However, proponents of AI seem to prefer letting their software do things for which it is unsuitable, whilst completely ignoring tasks for which it is actually well-suited. @[email protected]
@Life_is
LLMs aren't good at fact checking though. What IS the feature set of the product? Only the manufacturer or maybe people who already bought the object know what features it has. An LLM can flag "this is inconsistent" but it can't figure out what the truth is. it has no concept of the world.
@cwebber @mttaggart

@dlakelan @Life_is That could change with neurosymbolic programming. Which I believe is important, and the next step.

And, as it turns out, leads to dramatic structural improvements AND doesn't resolve any of the problems in @mttaggart's blogpost.

@dlakelan
If Amazon just added a big scare banner “this product page appears to be misleading” for every obvious contradiction (start with the ultra-high confidence cases only) would probably help a lot. Obviously, the lower the confidence threshold, the more false-positives without effective recurse.

Also for Wikipedia this wouldn’t work too well, I think. For instance in medicine there’s something “paradoxical effect”, where eg different dosages lead to literally completely opposite effects. I’ve had cases before where I researched a drug and thought “that cannot both be true” only to find that yes it is – at different dosages. I doubt the LLM would fare much better at this.
@Life_is @cwebber @mttaggart

@cwebber @mttaggart

I think software (in general) can do all the amazing things it currently does because most programmers were very, very careful.

We (as a society) don't fill harmful substances into drinking bottles. We don't let unisolated live wires hang from the ceiling. We put railings on stairs.

And then we give each developer a loaded footgun in the hope that they will be very gentle and careful with it.

The existence of genAI, especially in the hands of non-developers, means that a randomly selected piece of code is now much more likely to explode in your hands.

For me, this means: We need ways to execute code safely even if that code is horrible and malicious. Better sandboxing, more fine-grained permission systems, more use of good languages.

[Imagine a long rant about software components and the importance of good interfaces here, with stuff like "we should do immutable message parsing, not C ABI" strewn in.]

@cwebber exactly, and that would be my critique to this otherwise very, very interesting post : https://fediscience.org/@repepo/116339289260094252

the author is IMO under the fallacy that the LLMs *can* be useful when you are already experienced, and you are very careful checking the ouput. It might work for a while, it won’t work everytime.

@mttaggart

@ced @cwebber @mttaggart Eurgh. Flashbacks to back in the dark ages when there were long rants about how we should be using computers for things humans are bad at and having people do things computers are bad at.

"... very careful checking the output" is a canonical example of "things we write software for because we can't trust humans to do it consistently well and reliably in the long term".

@cwebber @mttaggart It's just part of the ongoing slow motion collapse of the Industrial Age. Capitalism has always been the enemy of sustainability, and it has always been destroying the skills entire subcultures of workers and craftspeople had been refining for centuries or even millennia within a few decades. There is already so much skill and knowledge that has been lost because somebody invented a way to do something cheaper by scaling it up and using more machines and fewer humans.
The end of the Industrial Age will be brutal. There are so many things nobody alive remembers how to do with preindustrial methods, and while we still have written descriptions and can probably figure something out, all the finer details that were never written down have to be discovered again. This is worse than the collapse of the Bronze Age.
@cwebber @mttaggart this is also a well known pattern in industries from aviation to nuclear engineering, I'm not sure why anyone would think programmers are somehow exempt

@wronglang @cwebber @mttaggart

Framing it are "we are opposed to those tools" Is a bit weird. I think we are mostly opposed to the society project behind them.

I also kind of feel that they are using to add a new layer on something we could not "reign" and that will not help fixing what was the issue at first.

@defuneste @cwebber @mttaggart yeah I agree it's opposition to the project, and tbh opposition to the attempt to collapse the space between "handy little prediction generator" and "planet heating plagiarism machine".

@cwebber I've been trying to make similar claims, spelling it out that in open source work (which is what I get paid to do) the burden it puts on maintainers is extremely high.

It also makes the entire project implicitly dependent on a commercial products (and that's if you ignore all of the potential legal issues come with the licensing implications, but so many folks in the open source world have been conveniently ignoring it), which is unlike any other "tool" we've ever used in the past.

But neither of these thoughts have seemed to faze folks who use them (and less surprisingly, folks in management), despite being irreconcilable with the goals of building and maintaining open source software 😓

@cwebber @mttaggart the way to "solve" it is to write the slop generator off as a bad job. It's Keurig machines for programming. You can make a cup of coffee that way and it's fine. If you want to make crew coffee? It's an abject nightmare that will almost surely jam your machine. If you're an espresso bar? The thing is just garbage.

@cwebber thanks for this. /me opens @mttaggart's post in a tab to read later.

I haven't finished reading it yet, but this post about how it's externalising/moving-the-work rather than removing it comes at things from a similar but adjacent perspective (I think)

https://mstdn.social/@rysiek/116332755277747901

Of course, shifting costs onto not-me is something that made lots of capitalists wealthy, so that might not help sway society 😕

@cwebber
There is no solution, fundamentally: the machine spits out correct patterns correctly, but doesn't know that these are the right patterns, the right problem, the right solution.

you have to do that.

Now, I don't know about you, but the way I get to the answers to the above is by implementing and recognising the feel...
@mttaggart

@cwebber
Analogy: automation augmentation most people use everyday-driving a vehicle.
1. Literacy-U take lessons learning to steer & direct car in certain situations, modes & applications-personal car, vans, heavy goods vehicles (HGV), etc. there are different licenses for each capability.
2. License-you’re tested & credentialed to be a “safe driver”. Not the fastest, most productive, 
 but “safe on the road” driver
3. Lifetime journeys-practice doing.

Vibe code skips 1&2, doing 3 hands free!

@cwebber @mttaggart I think there are ways of using the semantic-space structure of LLMs to empower human creativity, but chatbot genies are not it, and I don't know how to get to some hypothetical better world from here. the chatbot genies are a greased path that suck everything down toward their careless-slop future

@cwebber I'm seeing a lot of this at work. The team members who are using chatbots are certainly generating code that runs (maybe even correctly), but they do seem somewhat miserable. But my impression is that unlike @mttaggart, they're *not* reviewing every line of code.

What's worse, the CTO is pushing hard for some miraculous "fully agentic" future, where the developers will be even less familiar with "their" code. 😑

I don't know how to solve this either, but I think it's going to end badly on this trajectory.