I don't like the term "hallucinations" when we talk about AI. Sure, LLMs can get things wrong, but a hallucination is an error in perception, and you can't have an error in perception when there's no one there to perceive. The only hallucinations that are happening are on your side of the keyboard.

@maxleibman

This is precisely so. However, hallucination *instantly* gets the point across to people who don't know/care how LLMs work.

It's like describing evolution as Natural Selection. In reality, there is no 'selection'. Nobody is selecting anything. But people instantly grasp the concept, without having to deep-dive into evolutionary forces.

@maxleibman That's a great point. What do we call them then? just "errors"?
@VE3RWJ That I don’t have a good answer to.
@maxleibman great point about these things not perceiving anything. It's so hard not to anthropomorphize

@maxleibman @VE3RWJ - The error, as you point out, is in anthropomorphizing AI.

However, if one insists on doing that, the best analogous human behavior is "Bullshitting".

Confidently giving an answer, without regard to correctness, by regurgitating stuff you've heard. [edit to add] Which is, of course, what it 's doing all the time; it's just that this time it happens to be factually incorrect.

So my best so far is "incorrect bullshitting."

@jmax @maxleibman @VE3RWJ This tech (as has happened many times before) is teaching us about the way our brains work

Even at our most methodical, there’s a level of “bullshitting” that we have to make when we’re performing a professional task. Eventually, fundamentally, we have to trust our senses and trust our memories. If we can replicate results — well, good: that sounds like a scientific method. It’s up to us to design procedures, and protocols around our actions, to prevent mistakes.

To err is human. And LLM’an.

@whophd @maxleibman @VE3RWJ Stop shilling for con artists.

@maxleibman @VE3RWJ Yes, it’s a (deliberately) difficult position!

I think part of the trickiness here is that the “hallucinations” aren’t materially different from what they do the rest of the time. It’s just that this response is so obviously wrong that we classify it as an error. But it’s not like something broke _that one time_. All responses are “hallucinations.” They vary by proximity to accuracy. The term is pure marketing.

@corners_plotted @maxleibman @VE3RWJ

It has some relationship to reality, a model that outputs false positives even though ground truth denies it; a bias to see patterns that don't exist.

But I agree with your assessment that it's not really something different than all the other output. It's just wrong. The AI makes EVERYTHING up, it's just that often it turns out to be similar to reality.

@ThreeSigma @corners_plotted @VE3RWJ Exactly. If you're guessing a statistically plausible next word, you're going to line up with reality often (maybe even shockingly often), because what's likely to come next will be something that makes sense, and making sense is often correlated with reality. But it's a correlation, not knowledge. There's no amount of grounding that makes it something other than a guess. Grounding is just the process of changing to the question to be, "Ok, given THIS context, what's NOW the most likely next word?"

@maxleibman @corners_plotted @VE3RWJ

I had someone try to convince me in another thread that LLMs didn't work word-to-word, but composed answers hierarchically in paragraphs or whatever.. My understanding is that that's wrong, and they work only on the next word, but maybe my understanding is a year or two out of date?

@maxleibman @VE3RWJ confabulations is sometimes used

@VE3RWJ @maxleibman Wellllll here’s where I generally have to remind people that LLMs aren’t like computers or calculators, not like the ones we’ve personally interacted with for 50 years. They’re not sticklers for syntax or numeric accuracy.

In fact they’re built on errors, large piles of measured human divergence. It’s errors all the way down.

Not a spreadsheet.

@maxleibman I hate how people decided to use humanizing language to discuss LLMs.
@aleen @maxleibman it’s … well … the analogies hold strong, though
@maxleibman then what do you suggest?
@Amoshias Anything less anthropomorphized would be an improvement, but it’s a losing battle because hallucinations has become the term of art.
@maxleibman Like, constantly, man.

@maxleibman @petrillic I had a realisation related to this a month ago.

Given the way all genAI outputs are generated, if one is a hallucination, they all are.

https://social.europlus.zone/@europlus/116191458412032034

europlus :autisminf: (@[email protected])

I’m sure many other have made this observation, but even just reading this post without reading the linked article made me realise (or remember) that... *All* LLM output is, in fact, a hallucination. Because the way it formulates a “hallucination” *is exactly the same* as how it formulates a response *we don’t consider* a hallucination. Same with “good” vs “bad” summaries (and whatever the relative occurrence of each is). #NoAI #HumanMade

the europlus zone

@maxleibman @europlus @petrillic yep if you run different LLMs at home and dumb them down to smaller faster models, that’s pretty much it

There are some interesting takes on how to quantify this stuff (easily and really quickly, even though there was an industry around benchmarking that turned up out of nowhere, complete with a glossary of jargon) and while I couldn’t find a way to do it myself a year ago, I saw something last week that was rather good. Alex Ziskind probably.

@maxleibman

💯 agree. I wrote this expressing the same opinion:

A hallucination is “an experience involving the apparent perception of something not present” according to the OED.

A LLM neither experiences or perceives anything. It’s lazy to anthropomorphise LLMs.

https://stewart123579.github.io/blog/posts/emacs/importing-kindle-clippings-in-emacs/#incorrect-information

Importing Kindle Clippings in Emacs

I wrote ebook-notes.el, an Emacs Lisp package, to streamline the process of importing highlights and notes from an Amazon Kindle’s “My Clippings.txt” file directly into Org mode files. It automatically handles the association of notes with their corresponding highlights and prevents the import of duplicate entries. To make life interesting, I decided to try using a LLM to “help”. I used Google’s Gemimi 2.5 Flash model. Don’t judge me. This was research!

SVW Thunk'd

@maxleibman @europlus I think the layperson definition is closer to “spontaneously random imaginary vision” and the “error” in perception is directly related to having an expectation of a measured observation of reality. Whereas if you shut your eyes and try, you can hallucinate on purpose — there’s no error but it’s still hallucinating.

But, the layperson definition might need it to be vivid before it could get that label.

@maxleibman I was thinking about this just today when someone was talking about AI "hallucinations." (They were kind enough to put it in scare quotes.) I couldn't think of a better term, though.

Perhaps "fabrication" would work, but then everything an LLM does is a fabrication. It just so happens that some of its fabrications correspond with reality. So to be precise it might have to be called something like "Inaccurate fabrications." That's not very catchy, though.

@bodhipaksa @maxleibman

The term I like is "bullshitting", which I got from https://undark.org/2023/04/06/chatgpt-isnt-hallucinating-its-bullshitting/

See also https://thebullshitmachines.com/ , an expansion on this idea into a small course.

ChatGPT Isn’t ‘Hallucinating.’ It’s Bullshitting.

Opinion | Artificial Intelligence models will make mistakes. We need more accurate language to describe them.

Undark Magazine