This is precisely so. However, hallucination *instantly* gets the point across to people who don't know/care how LLMs work.
It's like describing evolution as Natural Selection. In reality, there is no 'selection'. Nobody is selecting anything. But people instantly grasp the concept, without having to deep-dive into evolutionary forces.
@maxleibman @VE3RWJ - The error, as you point out, is in anthropomorphizing AI.
However, if one insists on doing that, the best analogous human behavior is "Bullshitting".
Confidently giving an answer, without regard to correctness, by regurgitating stuff you've heard. [edit to add] Which is, of course, what it 's doing all the time; it's just that this time it happens to be factually incorrect.
So my best so far is "incorrect bullshitting."
@jmax @maxleibman @VE3RWJ This tech (as has happened many times before) is teaching us about the way our brains work
Even at our most methodical, there’s a level of “bullshitting” that we have to make when we’re performing a professional task. Eventually, fundamentally, we have to trust our senses and trust our memories. If we can replicate results — well, good: that sounds like a scientific method. It’s up to us to design procedures, and protocols around our actions, to prevent mistakes.
To err is human. And LLM’an.
@maxleibman @VE3RWJ Yes, it’s a (deliberately) difficult position!
I think part of the trickiness here is that the “hallucinations” aren’t materially different from what they do the rest of the time. It’s just that this response is so obviously wrong that we classify it as an error. But it’s not like something broke _that one time_. All responses are “hallucinations.” They vary by proximity to accuracy. The term is pure marketing.
@corners_plotted @maxleibman @VE3RWJ
It has some relationship to reality, a model that outputs false positives even though ground truth denies it; a bias to see patterns that don't exist.
But I agree with your assessment that it's not really something different than all the other output. It's just wrong. The AI makes EVERYTHING up, it's just that often it turns out to be similar to reality.
@maxleibman @corners_plotted @VE3RWJ
I had someone try to convince me in another thread that LLMs didn't work word-to-word, but composed answers hierarchically in paragraphs or whatever.. My understanding is that that's wrong, and they work only on the next word, but maybe my understanding is a year or two out of date?
@VE3RWJ @maxleibman Wellllll here’s where I generally have to remind people that LLMs aren’t like computers or calculators, not like the ones we’ve personally interacted with for 50 years. They’re not sticklers for syntax or numeric accuracy.
In fact they’re built on errors, large piles of measured human divergence. It’s errors all the way down.
Not a spreadsheet.
@maxleibman @petrillic I had a realisation related to this a month ago.
Given the way all genAI outputs are generated, if one is a hallucination, they all are.
I’m sure many other have made this observation, but even just reading this post without reading the linked article made me realise (or remember) that... *All* LLM output is, in fact, a hallucination. Because the way it formulates a “hallucination” *is exactly the same* as how it formulates a response *we don’t consider* a hallucination. Same with “good” vs “bad” summaries (and whatever the relative occurrence of each is). #NoAI #HumanMade
@maxleibman @europlus @petrillic yep if you run different LLMs at home and dumb them down to smaller faster models, that’s pretty much it
There are some interesting takes on how to quantify this stuff (easily and really quickly, even though there was an industry around benchmarking that turned up out of nowhere, complete with a glossary of jargon) and while I couldn’t find a way to do it myself a year ago, I saw something last week that was rather good. Alex Ziskind probably.
💯 agree. I wrote this expressing the same opinion:
A hallucination is “an experience involving the apparent perception of something not present” according to the OED.
A LLM neither experiences or perceives anything. It’s lazy to anthropomorphise LLMs.
I wrote ebook-notes.el, an Emacs Lisp package, to streamline the process of importing highlights and notes from an Amazon Kindle’s “My Clippings.txt” file directly into Org mode files. It automatically handles the association of notes with their corresponding highlights and prevents the import of duplicate entries. To make life interesting, I decided to try using a LLM to “help”. I used Google’s Gemimi 2.5 Flash model. Don’t judge me. This was research!
@maxleibman @europlus I think the layperson definition is closer to “spontaneously random imaginary vision” and the “error” in perception is directly related to having an expectation of a measured observation of reality. Whereas if you shut your eyes and try, you can hallucinate on purpose — there’s no error but it’s still hallucinating.
But, the layperson definition might need it to be vivid before it could get that label.
@maxleibman I was thinking about this just today when someone was talking about AI "hallucinations." (They were kind enough to put it in scare quotes.) I couldn't think of a better term, though.
Perhaps "fabrication" would work, but then everything an LLM does is a fabrication. It just so happens that some of its fabrications correspond with reality. So to be precise it might have to be called something like "Inaccurate fabrications." That's not very catchy, though.
The term I like is "bullshitting", which I got from https://undark.org/2023/04/06/chatgpt-isnt-hallucinating-its-bullshitting/
See also https://thebullshitmachines.com/ , an expansion on this idea into a small course.