The use of “hallucinate” is a stroke of true evil genius in the AI world.

In ANY other context we’d just call them errors & the fail rate would be crystal clear.

Instead, “hallucinate” implies genuine sentience & the *absence* of real error.

Aw, this software isn’t shit! Boo’s just dreaming!

@Catvalente the sad thing is, the term has been used since the 1980s, first to do with computer vision. But it got co-opted more recently as a way of convincing people of AI being human-like and self-aware. A poor choice of words, but only in retrospect.
@Catvalente yes! "bullshit" is a more appropriate term for what they generate as it is disconnected from any sense of reality. I believe @ct_bergstrom is one of the early promoters of the term with his book and site: https://thebullshitmachines.com/
Modern-Day Oracles or Bullshit Machines: Introduction

A free online humanities course about how to learn and work and thrive in an AI world.

@canacar @Catvalente @ct_bergstrom Should be required reading in every high school
@Catvalente But wouldn't "real error" require an attempt to be right? AI is just a probabilistic tool. You don't consider dice to make "errors" even when a throw results in a different result than you hoped for. (Not that you would claim dice to "hallucinate" either, but like another commenter already pointed out, that word has some prior history in AI context)

@cazfi @Catvalente

MacKay (2003, _Information Theory, Inference, and Learning Algorithms_, Cambridge University Press, <https://web.archive.org/web/20170610174915/https://www.inference.org.uk/itprnn/book.pdf>), calls them "spurious stable states": does that nicely avoid both sets of pitfalls?

Wayback Machine

@Catvalente I think a more correct expression would be “pulling stuff out of its ass, if it had one” or “making shit up”.

@bloodymirova

> In ANY other context we’d just call them errors & the fail rate would be crystal clear.

The reason for using "error" and "fail rate" is that it doesn't anthropomorphize the AI.

@Catvalente

@Catvalente is it an actual error though? The programming says to give a result that looks like other results, looks genuine, not give a truthful factual result based on analysis. Error would be failure to give result as instructed, and that's not what the program does when hallucinations occur.
@Mimesatwork @Catvalente I'd agree with this - I mean, I certainly agree hallucinate is also the wrong word, but error to me implies something more fixable and programmatic, and less inherent and random, than what's going on.
@JubalBarca @Mimesatwork @Catvalente I tend to start with slot machine analogies myself, but there’s also the complex somewhat technically correct variant of an old saying in “infinite monkeys at typewriters with keys for each and every word Shakespeare ever wrote organized by how often he used them might sometimes get his plays right, how lucky”. Hallucinations are the ordinary state and what they are built to do, anything resembling reality is often luck and happenstance.
@Mimesatwork @Catvalente This is actually a really good point. The purpose of LLMs is not, and never has been, to give out real information, it has always been to approximate human language by way of statistical modeling. Even calling factual inaccuracies "failures" cedes ground to people peddling these things, however unintentionally.
@Catvalente Confabulation is a better term. We're the ones hallucinating here, hallucinating that there is any "thinking" going on.

@shanecelis @Catvalente
Confabulation has an existing technical meaning relating to creating false memories in response to brain damage or dementia. I don't think we want to use that to describe AI "hallucinations". I would call them "fabrications" or just "bullshit".

https://en.wikipedia.org/wiki/Confabulation

Confabulation - Wikipedia

Saudi Arabia Plans $100 Billion AI Powerhouse to Rival UAE’s Tech Hub

Saudi Arabia is planning a new artificial intelligence project with backing of as much as $100 billion as it seeks to develop a technological hub to rival the neighboring United Arab Emirates, people familiar with the matter said.

Bloomberg
@Catvalente
"Look at his little feet twitching! Aw! He just knocked over your entire case! How ADORABLE!"

@Catvalente I keep telling people that they're just helping the scam when they use terms like "AI" to describe a thing that, by definition, can never ever become anything remotely close to AI. You're right. Using hallucination and any other terms the actual scammers come up with are all things that just help their scam instead.

This is why I almost always tell people not to call it "AI." Preferably refer to the actual individual tech involved specifically like "LLM" or "image diffusion." And yeah, don't refer to the inevitable errors as "hallucinations" because that term doesn't even make sense in context of how they function...

(To the people saying "technically it's not actually an error" it is a logic/correctness error, but, putting that aside, it sure AF is not a "hallucination")

@Catvalente Yes, I couldn’t agree more. In terms of mental phenomenology (if you’re going to use human terms to describe these things) the AI mistakes cannot be hallucinations. These mistakes are not based on a sensory experience.

A confabulation is a more accurate term - inventing things to fill the gaps

@Catvalente
while it's not this at stake at all, it's more. they do a full plenty shit of errors, wherever it comes from in the process, because they're not sentient:
e. g. https://pouet.chapril.org/@johnrogers.[email protected]/116375460992563788
Mastodon Chapril

@Catvalente "Hallucinate" might not the best choice of words, but LLMs don't error out like traditional programs and apps. LLMs make mistakes, they misremember, they confabulate, they bullshit; which is something we're NOT used to seeing from computers; it may not (necessarily) infer consciousness, but it's something uncomfortably ... familiar.
Computer programs don't make mistakes, they may encounter scenarios not accounted for by their programing. Assuming there's no hardware issue or data corruption, a calculator app will NEVER make an arithmetic error. You can put in a calculation in a million times, and it will always give you the same answer.

@DanDan420 @Catvalente In other words, AI hallucinations are not errors, that’s system working as expected.

The error is in between the chair and the keyboard.

@slotos @Catvalente AI doesn't put out errors in the way we're used to computers putting out 404-type errors; instead they make mistakes.
In order for AI to work the way it does (to be able to read between the lines, and to organically learn from a dataset) it's programming requires a departure from strict digital binary logic of traditional computing; but that also invites ambiguity.
The user error comes in if you take an LLM at it's word when you need factual accuracy.

@DanDan420 @Catvalente You're just describing the difference between a deterministic computer program and one with simulated probability (which are sometimes "models").

LLMs still run in the same way that other programs run on a computer...deterministically. However, the simulated randomness gives folks the impression that it is somehow different. It isn't different. If you play just about any computer game you'll have encountered what's going on here conceptually.

@avocado_toast @Catvalente Yes, procedurally generated worlds and levels in video games have been around for a very long time, but they involve a top-down approach where a number of rules are explicitly hard-coded and used with random number generators to produce unique diversity.
AI, however, uses a bottom-up approach, where the "rules" are organically inferred from the dataset it's trained on, while the dataset itself is not stored within the model in any way that came be directly retrieved.
When an LLM "hallucinates" it has a sort of intuitive understanding of what the answer should look like, and fills in the blanks based on the data it's trained on.

@DanDan420 @Catvalente Forget generated worlds, and think of the fundamentals: a pseudo random dice roll in an RPG. It's still deterministic ultimately (the randomness isn't really random), but it's close enough to real that it might as well be.

This randomness makes things wavy when you interact with the models because it's incorporated all over. That waviness (or temperature as they call it in some APIs) is what gives you unpredictable results (hallucinations) instead of consistent errors.

@avocado_toast

"However, the simulated randomness gives folks the impression that it is somehow different."

That is a really good point. I need to incorporate this when I talk with my students about LLMs.

@Catvalente
I hate using this kind of language when describing LLMs! I always want to post this diagram about de-anthropomorphising the output from chatbots (source = https://www.researchgate.net/publication/370842240_Mirages_On_Anthropomorphism_in_Dialogue_Systems)

The original response may seem okay at first but then you realize it's using first person language.

#LLM #AI #AIHype

@ahimsa_pdx @Catvalente Or just say "it." No need to avoid pronouns. Unless, you know, AI is allergic to pronouns in the way many of its strongest advocates are.

@callisto Nothing wrong with using the pronoun it but in the example I shared I think the word it might be confusing?

The phrase "This generative language model" might be a bit long, but replacing that whole phrase with just "It" seems confusing to me. What does it refer to?

But absolutely, using the pronoun it seems fine in many situtations.

@ahimsa_pdx Yes, the phrase makes sense the first time in that example. But after that, there's no need to say "this model" every time.

@Catvalente From now on, my code has no bugs. It is designed to hallucinate, for which I should be made a billionaire.

Sure. I could get behind this.

@Catvalente yes! So much BS.

You may appreciate this: https://berryvilleiml.com/blog/

See esp the Anthropic entries

@Catvalente If a coworker said the things the chatbots were saying, you would have to ask why they were lying to you.

@Catvalente @jwz see also.

TL;DR given the way all outputs are generated, if one is a hallucination, they all are.

https://social.europlus.zone/@europlus/116191458412032034

europlus :autisminf: (@[email protected])

I’m sure many other have made this observation, but even just reading this post without reading the linked article made me realise (or remember) that... *All* LLM output is, in fact, a hallucination. Because the way it formulates a “hallucination” *is exactly the same* as how it formulates a response *we don’t consider* a hallucination. Same with “good” vs “bad” summaries (and whatever the relative occurrence of each is). #NoAI #HumanMade

the europlus zone
@Catvalente The word I use is "fabricate" to describe what LLMs actually do. It describes that they build sentences which are also lies and untruths.

@Catvalente

Even the term 'lie' or 'prevaricate' (IT IS A PIE) is so fundamentally inaccurate to what the program does, though we've used that anthromorphization in other, non-chatbot ways through the years.

Of course, the robot does not lie: it outputs what it has been told to output. The programmers and executives, on the other hand...

@theogrin @Catvalente

I call them lies.

I wanted to check on a detail about an event in a book because I want sure it was appropriate for a minor. Instead of rereading the book, I foolishly asked Chat GPT. The LLM said that event didn't occur.

I knew it did, I just couldn't remember a specific detail. I kept pressing and eventually it responded that it knew about the event but said it didn't occur because it didn't want to upset me. (Because gifting a minor a book with traumatizing content wouldn't be upsetting!?)

So, yeah, that a lie in my book.

@Catvalente The other thing with this is that "hallucinate" implies that producing accurate output without fabrications is the normal way an LLM works, and "hallucination" is some failure of this that may be treatable.

No. Fabricating linguistically plausible output is exactly what we expect a "language model" to do. The technology, with some tuning, may do surprisingly well at producing accurate output, but it still is fundamentally a model of language, and there's no particular reason to believe it's possible to make a version that isn't prone to hallucination without a completely different methodology than how LLMs work.

@Catvalente it's called bullshitting. And I call it that.

@Catvalente

I've simply called it "lies" forever.

The argument always is the same - it can't be lies, it's just wrong.

Well then, it can't be a hallucination. If we're going to pretend it's intelligent, and it tells me things that aren't true *with confidence and assurance* - that's a lie.

(In reality, now, when using it for code assist, I just call it stupid. Nobody argues about that.)

@Catvalente They play these games all of the time with language and it's horrible to watch everyone get sucked in and start using the lingo without knowing the obvious design behind it.

Another example of this is "hyperscaler".

And perhaps more relatedly "agent"...which as Ed Zitron points out...means a chatbot.

@Catvalente that's funny, I remember "hallucination" bring used against AI as a worse condition than "error" as in "is not even wrong it's just randomly stumbling about like a thousand dice in a trenchcoat hooked up to a Markov chain". Error implies it was cranking through some instructions which may have been flawed, but what AI was doing was as unfathomable as trying to understand someone else's nonsensical ramblings. I think what you're describing is motivated reasoning on behalf of the AI boosters that will twist any epithet critics will apply. Like Magats rallying behind "deplorables". They're so deep in, they cannot consider the opposite point of view, because they want not to do very badly.

@Catvalente

Yep, its probabilistic computing, using statistical methods over billions of parameters.

@Catvalente If they really insist on anthromorphizing their fancy auto-complete, I'd absolutely get behind making "hallucinate" take a dirt nap and start calling the errors "psychotic breaks". At least that gives it a danger vibe. But yeah, 100% agree!
@Catvalente users, and I mean the huge majority in the computerized world who are only using, believe that this thing is cleverer then them. Including managers. And they think it saves time and brings money. The holey grale.
And all discussions here in the nerd bubble how to name it are absolutely useless. The cat is out of the box. It should have never been let out.

@Catvalente The so called hallucinations are in fact called beam search.

Beam search is like trying say the top 10 most -likely- words, and doing that several steps deep to construct the most likely phrase.

@Catvalente in all fairness, I think that these really are not errors, stemming from the fact that LLMs don't have any concept of what truth (or really anything) is and therefore there is no implicit metric with which to rank the responses in this regard. Due to the way that they just statistically predict what might fit in the next blank spot, "hallucination" actually describes the underlying process better 🤔
@Catvalente basically, saying an LLM made an error would be analog to a situation where you place an important decision on a coin flip and the coin lands on the "wrong" side and proclaiming that the coin made an error.
@Catvalente So it creates mirages then or something akin to a pareidolic effect where it uses the fuzziness of human perception to create something that has the shape of the thing but is not?
@Catvalente good point. It's not hallucinating. It's just plain straight wrong.
@Catvalente in colloquial use it veers into sanism real fast
@Catvalente
Alternate interpretation:
LLMs do not actually "make mistakes". Every time, they do exactly what they are supposed to do, without error.
Problem is, what they are supposed to do is "hallucinating". Every single LLM output is a "hallucination". Sometimes, by random chance, the "hallucination" given matches objective reality, but that is not at all relevant for the LLM's function.

@painting_squirrel @Catvalente yes. this.

But worse: I wouldn't say "sometimes by random chance." The results do _frequently_ match reality because the statistics driving the output generation were based on text that matched reality. But when they don't match they still sound indistinguishable without checking against objective reality.

So it is a lull-you-and-kill-you game.

@poleguy @Catvalente
I'll concede your point but only because "sometimes" and "random chance" are very badly defined terms that are not very intuitively understood by humans (myself included).

To wit: only because something is statistically well tuned to give good results most of the time does not mean that the results are not inherently random. "random" != "everything has the same probability"

@Catvalente When AUTOCORRECT Was One by David Gerrold.

@Catvalente

The best term, IMHO, is baloney (or BS).

It's not a lie because the AI cannot have intent to deceive.

It's not a hallucination because the AI cannot be deceived.

It's not an error or bug or mistake because it's doing what the creator intended.

When you input a question to an AI, the output of an AI is baloney. And that's OK if you're willing to eat baloney.

@Catvalente
And if you didn't know the argument the answer could be correct. The intelligence is on this side of the screen, hopefully.
@Catvalente
The de rigueur anthromorphising sets off my bullshit detector constantly.