I've never been opposed to the word "hallucinating" for describing how AI makes mistakes ... until now.

I just talked to someone who thought AI hallucinations would be obvious because it would be obvious if you talked to a *person* who was hallucinating.

In other words, they equated "hallucination" with "sounds wacko" and accepted AI output as true because it sounded level headed.

1/2

The word "hallucination" isn't going away — it's a widely used industry term — but we need to explain it better for beginners:

"Hallucination" is just a fancy word for "confidently makes mistakes":

"Remember: AI hallucinates, and you need to confirm all facts" should be something like "Remember: AI confidently makes mistakes, and you need to confirm all facts" or "AI tells you things that are wrong in a way that sounds completely believable. Confirm all facts!"

@grammargirl This is a good example of why that term is so dangerous. Thank you for posting it.

That said, while I have zero hope of making that term go away, we also have the word "slop" as a counter.

"Ugh. It had a hallucination..."

"Yup. And the results are now slop."

That said, I don't myself use "hallucination" in the "AI" context. I refer to the error rate, which last I checked, hovered around 40%.

@orionkidder @grammargirl
The explanation has to include that if you believe what the AI tells you then you are hallucinating
@AccordionBruce @orionkidder @grammargirl
Exactly this.
Hallucination is an act of cognition. The machine doesn't

@RnDanger @AccordionBruce @orionkidder @grammargirl

It seems such a pointless, minor nuance that will make no difference whatsoever in practice 😅

(yes I am aware talking about this kind of minor nuances is your day job, but still, someone's gotta say it)

@gotofritz @RnDanger @AccordionBruce @orionkidder @grammargirl
Language can be used as one of the most dangerous tools we have because it shapes the way we think (and thus our future) mostly on a subconscious level. The more subtly a word misleads, the more difference it can make in practice.

@elfburgerman @gotofritz @RnDanger @AccordionBruce @grammargirl I think this is true. Like I said above, I have zero expectation that my language use is going to make a damn bit of difference at scale, but in individual conversations, refusing the metaphor of consciousness can help reframe.

It's just an error. The machine is faulty. It makes errors a lot.

@elfburgerman @gotofritz @AccordionBruce @orionkidder @grammargirl
I agree.
"Hallucination" is a great marketing term to make people want to trust a machine, but it's a pretty poor choice of words to convey any understanding of what the machine does or how it does it
@RnDanger @elfburgerman @gotofritz @AccordionBruce @grammargirl Exactly. Making machines seem like magic, seem like they have no internal mechanism, is a common tactic. It's why we refer to external hard drives that we don't own or control as "the cloud."

@orionkidder @RnDanger @elfburgerman @AccordionBruce @grammargirl

Sounds all a bit conspiracy theory to me.

There is nothing positive about "hallucinating", I wouldn't ride a bus if I knew the driver was prone to hallucinating

@gotofritz @RnDanger @elfburgerman @AccordionBruce @grammargirl It's a marketing tactic.

And the problem with the metaphor of hallucination was explained at the top of the thread.

I'll be blocking you if you keep playing ignorant.

@RnDanger
And hallucinations are almost always pathological. A sign that a person needs help - possibly very urgently and/or for a very long time.

@AccordionBruce @orionkidder @grammargirl

@orionkidder Good point.

Also, the error rate now highly depends on which model you're talking about, but I think that's the rate for those that are most widely used -- e.g., the free models.

@grammargirl I'm seeing people claim the error rate is lower with other models, and I'm not sure I believe that since this industries just piles lies on top of lies, but the only plausible explanation of the lowered error rate I've seen is for Claude code.
@grammargirl If I understand correctly, it shoves every query through the "AI" multiple times and tests whether it does the thing it's asked to do, but of course, it hides all of that from the user.
@grammargirl To me, that feels like a brute-force workaround, a kludge, not an improvement in the tech itself. It's like saying, my car is too slow, so I'll attach a second engine to the hood.

@orionkidder No, that's probably how human brains do it. The genAI loop is wacky in other ways, but testing its results is not a wacky part of it.

@grammargirl

@orionkidder @grammargirl

I'm obliged to use LLMs at work.

In my limited experience, the error rate depends on whether the question you ask is covered by the model's training data. If so, the error rate will be fairly low (though not so low that the model becomes trustworthy). Otherwise, the error rate will approach 100% as the model just makes something up.

Of course, you never know what was in the training data, so you don't even know how reliable you can expect the model to be. In my experience, asking an LLM about material you can't find with a careful Web search is a good way to produce a screenful of friendly, grammatical, plausible rubbish.

@CppGuy @orionkidder @grammargirl the error rate also also depends on what is in the training data.
That's no doubt is part of the problem with Grok, as it's training data contains many unreliable statements garnered from X as well as deliberate falsifications added.
If the training data was just Wikipedia you would get more reliable results.
For other AI vendors adding random chats from Facebook or Instagram or AI generated websites will also lower the accuracy.
Claude Code may be slightly better, for now, because it is just plagiarising code. This won't last as the code repositories fill up with AI slop and these are flagged up as such and excluded by the crawlers.
Likewise if you ask GenAI to summarise a document it may well incorporate data from its training data as well as the text you supply.
The other reason is that GenAI just doesn't simply reproduce single sources, whatever their accuracy. It acts as a stochastic mixer: if you see a AI generated legal case reference some of it may come from one citation and some from another and the legal inference drawn maybe from something entirely different.
Likewise if you ask GenAI to summarise a document it may well incorporate words from its training data as well as the text you supply.

@marjolica @orionkidder @grammargirl

I can't comment on Grok: I've never had an X account.

Claude Code has its problems. I use it not to generate code but to explore ways of working with unfamiliar libraries and languages when I can't find answers on the Web. (A library is a body of code packaged up by one developer or organisation for others to use.) I find it's wrong more often than it's right.

As an experiment, I once used Jira's AI to summarise a detailed comment that I'd written myself. The result was shorter, sure, but it was meaningless and unusable. After that experience, I never use an AI to summarise or rewrite anything.

@marjolica FWIW, I kind of suspect that a big reaason for why the Youtube recommender has been so problematic for years now is, it treats its output as its input. Autoplay amplifies this effect.

@CppGuy @orionkidder @grammargirl

@orionkidder @grammargirl I’ve heard the Spanish science communicator Ignacio Crespo argue that “hallucination” is misleading in this context, because it imports a human mental-state metaphor into a statistical text-generation error. “Confabulation” may be closer: a plausible-sounding reconstruction that fills gaps. Still, it also comes from human cognition, so it can anthropomorphise the model too.
@orionkidder @grammargirl I think the deeper problem with “hallucination” is that it imports a human mental-state metaphor into a statistical text-generation error. That can make people expect obviously bizarre output, when the real danger is often confident, plausible-sounding falsehoods. “Confabulation” has a similar problem, though. But, I don’t know, it sounds better to me.
@danielmunoz @grammargirl This is why I refer to its "error rate." It's a machine that produces false answers to such a large degree that it shouldn't be trusted. It's simply faulty.

@orionkidder @grammargirl I refuse to use it and anthropomorphize a computer failing.

It’s like “pro life.” No, it’s “give birth or die trying.” We need to call things what they are.

@grammargirl It's an unfortunate trend in MANY consumer electronics and tech conversations now. There just aren't many resources for folks to find better educational materials on the products and services they use.
@grammargirl I don’t think we need to accept it just yet. The word is deceptive—intentionally so. What needs to be explained is this: chatbots and LLMs can't "hallucinate” because they have no minds or senses. They routinely depart from factuality because that's how they’re programmed: to generate plausible streams of text without regard to reality. (https://around.com/dont-trust-them/)

@gleick @grammargirl

the consistent trend of anthropomorphizing badly written programs, and the machines the programs run on, is used to make tech CEO's as a religious ruling class.

they create these facsimiles of truth and reality then prop themselves up as the sole interpreters and arbiters. like any religious hierarchy.

they're relying on humans ingrained need to assign importance to random objects and events and an interpreter to hand out judgement in return for taking all their money.

@gleick @grammargirl

IMO "confabulation" is more accurate than "hallucination" because the former indicates a lack of intent. Given that LLMs are not sentient, they lack intention. At most, they are reflexively responding to a reward function that optimizes towards producing text roughly resembling the pattern of their training data, but that's different from intent.

@grammargirl
Hallucinations can only happen to a mind. An LLM has no more mind than a slot machine.

The people making this stuff fell in love with their own convincing automatons, so attributed ‘hallucination’ as happening to their little babies.

It’s a much different thing if you say ‘this brainless machine is constantly making errors and spitting incorrect data’.

If you say that, it means back to the drawing board: this demo tech has failed.

@grammargirl
The industry named its own mistakes ‘hallucinations.’
Hallucinations is a forgiving term.
‘Delusions’ would be more accurate.

@grammargirl these folks are stealing language to whitewash a con. In my opinion.

Hallucination is a deviation from the normal way healthy human minds work. The confident incorrectness presented by the companies shilling AI is working as designed.

@grammargirl But what actually is the point of using it if I have to confirm all facts? Can’t I just skip the middleman?
@feisty_lemming It depends on what you're using it for. If you're fact checking, it can be faster to put in a document and say something like "Fact check this piece. Show your sources," which gives you a list of links to click and check. It's faster than putting each thing you want to check into Google and then sorting through the links (and now the AI slop too). It will also surface relevant links you may have missed that don't show up in the first 10 or 20 on Google.

@feisty_lemming You can also specify the sources you want it to use with something like "These are the 20 sites I usually use. Check there first and add anything else that seems relevant."

But I'm sure there are lots of other use cases where it's more in the way than helpful.

@grammargirl @feisty_lemming

I've done that and it generates ballpark-but-not-accurate information with fake citations.

@eestileib Fake citations (and fake quotations) are a huge problem. And sometimes it’s not even that the citation is fully fake, but a real source has been transmogrified so the details are wrong—authors are in the wrong order, title is modified, etc. @grammargirl

@eestileib @feisty_lemming I check everything and haven't had that problem. I find errors in maybe 1 in 50 links--like the page doesn't say what the model says it does--it's so rare that's just a total guess at the rate.

I'm not asking it to find new information -- just to check existing info. Not sure if that would be the difference. I also don't use the free models. They are dramatically worse.

@grammargirl @feisty_lemming

I haven't used any of the commercial ones for obvious reasons, I was farting around with Mistral on my home computer and lost interest pretty fast.

It made me get up and actually pull a book off the shelf to verify that the quotation it gave me was fake, cause it read like an grad student summarizing Shapin, not Shapin.

If I'm looking for poetic truth I can get it from a novel is my opinion.

I used to be/coordinate engineers and nothing made me lose trust in someone faster than being confidently wrong. Even if they were usually right.

@grammargirl Maybe it would be faster. I object to the mass illegality of the content theft, the environmental destruction, and all the other terrible things that come with it. So I can’t bring myself to use it in order to possibly do stuff faster. And I’m fortunate that for work at least, so far I’m not being forced to. Many who object are not that lucky.
@feisty_lemming @grammargirl
Indeed, there is no point in using #GenAI if you have to confirm all the facts; you'll do better to just skip the middleman
@feisty_lemming If your boss doesn't demand the use, you can live and work perfectly without LLMs and GPTs. A life without is possible. (And better for the environment and climate). @grammargirl

@grammargirl like when medical people call someone "confused", AI "hallucination" is a more precise term than common parlance. it basically means the bot couldn't find a plausible answer and is for some reason blocked from saying "I don't know", so it makes stuff up.

that's a bit different from "confidently makes mistakes" becuase it's "confidently making stuff up entirely".

I have no idea what would be a good replacement for "hallucinate" in this context, I agree that it feels deceptive as is though.

I'm iffy on the term. But I don't have anything better.
But this: GenAI doesn't sometimes hallucinate. It always hallucinates. It only ever hallucinates.
Sometimes, what it hallucinates is plausible.
@draNgNon @grammargirl

@BenAveling @draNgNon @grammargirl

The AI is generating language from some matrix algebra that regurgitates transforms of the test data or mirages of it. Only users can hallucinate and believe the mirages are real while a whirring vortex of vectors can't believe in anything.

@grammargirl I agree it’s not going away. I still find it constructive to point out it’s misleading, though, because it’s a good framing device for talking about what these technologies are and are not actually doing.
@grammargirl Would "delusional" be more apt?

@mpjgregoire I'm guessing no. Some people don't like any human condition applied to AI, and I imagine the person I talked to who thought they could recognize a hallucinating person/AI would also think they could recognize a delusional person/AI.

I take more words, but I think it's better to explain that it makes errors that don't sound like errors.

@grammargirl "AI tells you things that are wrong in a way that sounds completely believable."

Ah, so AI is like Wally Cox on Hollywood Squares! (Use this analogy on old people. We'll understand.)

@grammargirl I think it's funny that people who object to the use of 'halluctinate' because it anthropomorphises AI are nonetheless happy with their use of the word 'confident', as in 'confidently makes mistakes', in the same context.
@grammargirl
I'm opposed to your use of 'AI'. An LLM is not an intelligence, even though that is what people call it.
Every word the industry likes for its own products probably helps to mislead the public.
Every form of anthropomorphisation of LLMs should be banned.

@grammargirl
> The word "hallucination" ... it's a widely used industry term

It is a widely used industry lie that regurgirators do not lie but somehow are slightly mistaken.

While technically it is a "less expected but still possible words rehashing output" or "imperfect probability glitch" or like, the "lie" term has the accurate and precise definiens of what the output is factually. So that term should be used. I hope it soon will be obligatory for "the industry" to use in the EU.

:))

@grammargirl Definitely the latter, but with a slight addition:

"AI tells you things that are wrong in a way that sounds completely believable - which is the system functioning as designed. Confirm all facts!"

@grammargirl
I think of this as the nines imbalance.

In a datacenter there is talk of nines of uptime. Going from two nines (99%) uptime to three nines (99.9%) requires an order of magnitude investment. Another again for four nines (99.99%).

The AI nines imbalance is that
It is one nine accurate (90%)
but four nines eloquent (99.99%)

@grammargirl I appreciate "bullshit" as a better term per this article: https://www.psypost.org/scholars-ai-isnt-hallucinating-its-bullshitting/
Scholars: AI isn’t “hallucinating” — it’s bullshitting

A team of scholars argue that AI inaccuracies should be called "bullshit" instead of "hallucinations" because AI doesn't perceive or intend truth; it just generates plausible text based on patterns, without concern for factual accuracy.

PsyPost Psychology News
@grammargirl
Everything LLM based "AI" generates is hallucination. It's just that in more than 50% of cases those hallucinations resemble facts.

@grammargirl

I'd like to suggest that the core of the problem here is that the pace of technological development is outstripping the pace of the evolution of our language to adequately describe it. I think we will have to come up with new words, or at least appropriate some more obscure ones to the cause with updated definitions.

(All that said, I think "botfarts" or, in less polite company, "botshit" kind of works, no?)

@grammargirl a lot of people used to say "no it's not *hallucination* it's *confabulation*." Confabulation is the thing the human brain does that is somewhat analogous to what AI does: confidently believing in something that we just made up and sounds right but is entirely fiction. Confabulation is how many people explain their own behavior when questioned, but also how many people get through interviews, or make business deals, or mansplain …

@grammargirl @photovotary

none of the people using this word have ever experienced an hallucination