Mastodawn

The key weakness in AI agents is that they're a lie. They don't work. They just don't fuckin' work. You can't set a hallucination engine to work doing tasks. It's pants on head stupid. The hype pretends this isn't the case and hypothesises a fabulous future where they work *at all*. This is a lie.

A useful model for "AI agents" is that they're the current excuse meme for AI. They're not a thing that works at all, now or in the fabulous future. But they're *such* good material for hypecrafting. No sausage at all, but *my god* that sizzle.

Androcat Nov 10

But say this to the believers and they respond "It simply isn't credible to criticize this technology without acknowledging that it is useful to many people"...

As if they get to demand that we believe the lie when we criticize the lie.

The only thing we gotta acknowledge is that many people are utter tools who want to be lied to.

Jonathan Hendry Nov 10

People who were losing patience are like “ah, agents, now we’ll surely get what we were promised!” And then it takes a few months or a year for them to figure out, nope it still doesn’t work. By which time the AI grifters will have another silver bullet to pitch.

Mateusz 🏳️‍🌈Nov 10

@jonhendry @davidgerard The real problem here are poor bosses. They’re not interested in work we do, they don’t understand problems we struggle with and they’re just happy to believe there’s a golden hammer that removes any responsibilities from them.

DragonBard Nov 10

@aemstuz @jonhendry @davidgerard Ah, but the responsibility will be removed after the company goes bankrupt because the AI sent all the money to a 419 scammer.

@davidgerard Maybe I'm misunderstanding something, but for what I understood, it's basically the same LLM stuff but in the background?
Basically, if you roll the dice enough times, you might get something that passes all the unit tests?
(And burn a whole bunch of tokens in the process...)

David Gerard Nov 10

@art_codesmith you say "do a thing" and it goes and does the thing! Or what it hallucinates as the thing. This turns out to have a disastrously high failure rate. Also, it's hilariously easy to prompt-inject.

@davidgerard @art_codesmith It seems to me that we've built a system that has a chance of getting the right answer, but we've given up on finding ways to improve the chances of it getting the right answer. Instead we've wrapped the system in a loop, to check if it has the right answer after each iteration.

It's Bogosort.

The greatest achievement of humanity, worth boiling the oceans for, is Bogosort.

@welbog @davidgerard It's... kind of that, yeah.
It's a bit more iterative (maybe) but still. It's kind of like an attempt to throw a ball into a hole by throwing it vaguely down the gradient.

@art_codesmith @welbog @davidgerard

Bogosort! An accurate way of describing what I saw by idly testing google to tell me the number of business days between days X and Y.

It kept correcting itself over and over before finally arriving at what it thought might be the correct number of days (still didn't account for a public holiday in that timeframe).

I am Jack's Found 404 Nov 10

@welbog @davidgerard @art_codesmith

I had to look this up... "Also known as stupid sort" 🤣🤣🤣

https://en.wikipedia.org/wiki/Bogosort

Bogosort - Wikipedia

@art_codesmith @davidgerard

It's when you give the LLM access to APIs.

So instead of asking 'propose some code to do x using API y' , it gets to run the proposed code directly

@Zamfr @davidgerard I feel like having a hallucination machine do anything with an API without human oversight is significantly less than ideal.

@art_codesmith @davidgerard Yes indeed, but if you call it 'agent' it will be much better of course.

David Chisnall (*Now with 50% more sarcasm!*)Nov 10

The use case for 'agents' isn't that they do useful things unattended, it's that they can consume (billed for) tokens unattended.

Skjeggtroll Nov 10

Software Agents never made much sense. In order for people to trust them to act on their behalf the task they do have to be so well defined that for all practical purposes it'd better to just automate it.

"AI Agents" make even less sense. Has anyone even suggested one that's more than just an automation wrapper around a sequence of LLM calls and service APIs?

@davidgerard that reminds me... I have to do my company's mandatory agentic AI training course this week... wish me luck 🤮

@davidgerard having to sit through unbelievably painful "podcast" style audio - which are obviously generated voices speaking generated text. Horrible word-salad cringefest so far. (Thumbs down in my review each time.) When a video with a human comes along it is REMARKABLE how I can actually listen and understand the point very easily compared to the bland incessant garbage the "podcast" puts out...

DistroWatch Nov 10

@davidgerard "You can't set a hallucination engine to work doing tasks."

You can if your goal is to produce a lot of material that is not correct, or it doesn't matter if the material is correct.

I think that is what people tend to miss about the drive to get AI into the world. The people pushing it don't care if it's accurate, it might even be better for them if it's not, they just want a lot of material that looks passable to some people. They want filler and propaganda and misinformation.

DistroWatch Nov 10

@davidgerard So when you say "they don't work", keep in mind that AI _does_ work as intended. AI agents just aren't very useful for most people. Those statements are contradictions, they're a sign of for whom AI works.

John Maxwell Nov 10

@distrowatch @davidgerard That's the high end of the spectrum of grifters here. On the low end, they're high on their supply, and actually believe the bullshit.

DistroWatch Nov 10

@jmax @davidgerard The people the hype worked on probably do believe, unfortunately.

The AI industry seems to work in parallel with the social media and entertainment industries. This conversation brings to mind a quote from an article about Spotify: "Its goal isn't to help you discover new music, its goal is simply to keep you listening for as long as possible. It serves up the safest songs possible to keep you from pressing stop."

David Gerard Nov 10

@distrowatch @jmax AI boosters tend overwhelmingly to be someone who was one-shotted by a really impressive demo, and no mere numbers on how shitty this stuff is at scale will ever convince them.

also. over and over. I find that AI boosters are literally unable to tell good from bad. they are literally unaware that their slop is actually shitty. they think you're *lying* when you say you can tell good from bad. they think you're having a go at them.

Dan Sugalski Nov 10

@davidgerard @distrowatch @jmax This is like what we were warned at when I worked at Bloomberg -- you never ever *ever* want to hit it big with your first investment, because if you do you'll inevitably be convinced you were Right or Had Luck, or whatever, rather than accepting that random chance chanced in your direction and will move on as it always does.

Better to be burned first time when investing, and I suspect with AI too, lest the brainworms of Being Special lodge in your head.

glasspshr Nov 10

@wordshaper @davidgerard @distrowatch @jmax

I have a friend who works for a law firm and said “we had this task that took 8 hours, now we run it through the slop machine and it only takes someone 2.5 hours to check and fix it”

Reminds me a bit of a place 20 years ago that bought ball bearings that were below spec from another place, they could fix them and get them to spec for less than making good ball bearings from scratch.

Hmm

Dan Sugalski Nov 10

@glasspusher @davidgerard @distrowatch @jmax Except the problem with this is that people are surprisingly good at not fucking up in the first place but *abysmal* at reliably catching fuckups. So if you do it yourself you may make two errors and catch them with 80% accuracy, but if the slop machine does it then you'll be checking 20 errors... still with 80% accuracy.

glasspshr Nov 10

@wordshaper @davidgerard @distrowatch @jmax

Quite. I don’t want something that I’ll have to check thoroughly to see if it screwed up, either.

I’m not in the mood to become a slop machine’s editor/fact checker

John Quiggin Nov 10

@glasspusher @wordshaper @davidgerard @distrowatch @jmax

Alertness to the dangers of AI slop has forced us to do checks we should have been doing in the first place, when repeating stuff we've found on the Internet, read in news media, heard from friends etc.

I am Jack's Found 404 Nov 10

@glasspusher @wordshaper @davidgerard @distrowatch @jmax

In fact doing so is only free labor to further train the slop machine

DistroWatch Nov 10

@davidgerard @jmax I notice this with developers in particular. AI bots tend to generate terrible code, so the people who say it writes well enough or that it saves them time... Make me wonder how bad their normal code is.

George B Nov 10

@distrowatch @davidgerard @jmax

It's built into my editor at work and I've found that it helps with repetitive tests on legacy codebases.

I still have to replace a lot so how much better it is than copy, paste, and modify is still debatable but it feels easier which helps with the motivation hurdle.

I've also noticed that I don't mentally register when it gives me pure crap and I need to delete it unless I deliberately make a mental note so it's easy to remember it being better than it is.

Justin ⏚Nov 10

@distrowatch @davidgerard @jmax

A recent study showed that AI coding tools made developers slower, but crucially, they believed themselves to be faster.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

Aedius Filmania ⚙️🎮🖊️Nov 11

@JustinH @distrowatch @davidgerard @jmax

You know what ?

People that push LLM for coding tools don't care that it slow devs or don't work that's well as long as it's cheaper.

If they can pay devs 50% less at the end, they are ok to have a them 30% slower.

John Maxwell Nov 11

@Aedius @JustinH @distrowatch @davidgerard Especially since they can always blame the developer when the LLM screws up.

Stella Star Nov 11

@distrowatch @jmax @davidgerard I see you have met my boss

David Gerard Nov 11

@serinde @distrowatch @jmax and everyone else's

Stella Star Nov 12

@distrowatch @jmax @davidgerard I have such a story that’s grist to your mill about the AI agent my boss is ramming through even as we speak, and the extremely red flag vendor it’s from; but I’d probably get fired

gigantos Nov 10

@davidgerard I don’t know what you base this on. It most definitely work for some things, at least some of the time.

For example, someone I know needed to do some stuff with an Arduino to make it show a pretty wave pattern using unevenly distributed led lights. This person was a crafter, not a coder. However, by oploading a hand drawn picture of where the leds where placed it generated a web based simulator with sliders to tweak parameters. Then, when he was happy with the result after tuning the sliders, it generated code for the arduino that compiled and ran perfectly. First try.

It might have been an incredible amount of luck, but this non-technical person got his art project to work without needing to learn anything about code.

@gigantos @davidgerard "works for some things, at least some of the times" is NOT the way these LLM tools are being pitched. I think there would be much less of a backlash if OpenAI and co were like "hey, here's an occasionally useful tool for generating text and here are the use cases it's actually good at," rather than "fire all your employees and replace them with AI, who cares if it's fit for purpose!"

David Gerard Nov 10

@Avner @gigantos "i can use the radioactive anthrax bomb that was specifically funded and constructed as a weapon of mass destruction as a hammer" is also an unconvincing argument, though drive-bys keep not having any other argument

note how in this case it's not even the drive-by himself, it's an anecdote about someone else who allegedly had success at using the radioactive anthrax bomb as a hammer

and not in a way that has any relevance to "AI Agents" either, so it's just a driveby promptfondler pasting a random excuse

gigantos Nov 10

@davidgerard @Avner so now a long time follower of you is a drive by. Thanks for that.

David Fleetwood - RG Admin Nov 10

@gigantos @davidgerard @Avner I mean, your story kind of discredits itself. Maybe it happened, maybe it didn't but a useful tool is one where you know when it will succeed or fail. Someone randomly having it work out does not change the calculus, stuck clocks are right twice a day after all.

Useful tools are ones where you can know when, where and how to apply them to get consistently useful results. LLMs cannot pass that most basic of tests.

@[email protected] @[email protected] @[email protected]

AI: It most definitely works for some things some of the time; or, 60% of the time it works every time

Whatisgoingon Nov 10

@Avner @gigantos @davidgerard if they would be honest there would be no money to make and no bubble to hype.

WILLIAM B PECKHAM Nov 10

@Avner @gigantos @davidgerard and you don't see that that's exactly the problem. Not that technology doesn't work, or that it's being used inappropriately, although it is. The problem is that it's been marketed as something it's not. Fed proper data and used appropriately, it is actually highly effective at certain specific jobs. Fed barely moderated data off the internet, which is 90% garbage it spews 90% garbage. This AI is not Artificial Intelligence, it is Automated Incompetence as it is being sold. The basic technology is fine. It's the use case and the marketing that is borderline criminal.

@gigantos @davidgerard If they had searched for their exact scenario they probably would have found a blog post with something very similar to what they asked for.

Someone at work showed off their usage of AI so I searched on their question and got an article (from before GPT) with nearly identical results.

Maybe it's enabling some folks to stop thinking inside the box, or self-limiting their capabilities, but that's a different problem than it's being sold as solving.

gigantos Nov 10

@zimzat @davidgerard sure. All I’m saying is that it is basically the modern day excel, where everyone and their uncle can get their hacky one time project to do something that resembles what they want.

So it does generate value for some, some of the time.

I’m not arguing it is a replacement for hiring a human

Jason Petersen (he)Nov 10

@gigantos @davidgerard I know AI boosters aren’t gonna pay attention to things like details or comprehension, but OP said “agents”.

gigantos Nov 10

@jason @davidgerard and the coding agents is not included in your definition of agents?

Anyway, I’m already been declared both a drive by and an AI booster. So I’m out. I refuse to deal with bullies on the Fediverse.

@gigantos @jason @davidgerard

I fully commiserate with you. Hatred for all things related to LLMs is a bit of a religion in the Fediverse.

And without fail you get derided for pointing out that they have their uses and if they are indeed utterly useless, then the detractors have nothing to worry about because the "problem with LLMs" that they keep hand-wringing about will solve itself eventually.

Jason Petersen (he)Nov 10

@eejalab @gigantos @davidgerard yes, the bubble will crash.

This isn't "bullying". It's existing in a society, get with it.

@davidgerard I have multiple examples of Copilot failing to provide accurate information about MS products, which, apparently, Copilot can configure FOR you!

How's that possible?

Rodrigo Dias Nov 10

@davidgerard Tried chaining LLMs for tasks—ends up in loops or wrong outputs. Better for ideation than execution.

Orb 2069 Nov 10

@rgo @davidgerard

Try a set of these, they work better:

https://wikipedia.org/wiki/Oblique_Strategies

Oblique Strategies - Wikipedia

@davidgerard - Id love to see AI crash.

Mark Harbinger Nov 10

The sound you hear won't be champagne corks. So...what do you plan to wear for the next, worldwide #GreatDePrAIssion ?

Cogito ergo mecagoendios Nov 10

@davidgerard They do this thing where they cite a percentage of success, such as "we got 60% questions right at SAT". Implicitly they are tricking your mind into thinking there is some sort of a progress bar slowly loading. That the wrinkles are about to be ironed-out soon™.

They are not.

Todd Knarr Nov 10

@davidgerard The problem is they sort-of work, just well enough and just often enough to convince people they'll work all the time. Then the moment you trust them, they go "ebola-contaminated diarrhea in pants on head"-stupid. By then it's too late, backing out isn't an option, so everyone who proposed them has to pretend this wasn't normal so they don't look like complete idiots.

@tknarr @davidgerard people are finding good results in using them as a sort of better fuzzy for searching. There hallucinations don't matter so much. https://terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale/ using them to write legal documents, legal briefs, customer or employee support seems insane.

Mathematical exploration and discovery at scale

Bogdan Georgiev, Javier Gómez-Serrano, Adam Zsolt Wagner, and I have uploaded to the arXiv our paper “Mathematical exploration and discovery at scale”. This is a longer report on…

What's new

nonlinear Nov 10

@davidgerard we live in "any day now" news. it's the news for future news, hopefully, invest on us.

Maria “indigoviolet” Ivy Nov 10

@davidgerard I once saw a Potential Man meme describing ChatGPT. I’ll post it here if I ever find it