@maxleibman @petrillic I had a realisation related to this a month ago.
Given the way all genAI outputs are generated, if one is a hallucination, they all are.
I’m sure many other have made this observation, but even just reading this post without reading the linked article made me realise (or remember) that... *All* LLM output is, in fact, a hallucination. Because the way it formulates a “hallucination” *is exactly the same* as how it formulates a response *we don’t consider* a hallucination. Same with “good” vs “bad” summaries (and whatever the relative occurrence of each is). #NoAI #HumanMade
@maxleibman @europlus @petrillic yep if you run different LLMs at home and dumb them down to smaller faster models, that’s pretty much it
There are some interesting takes on how to quantify this stuff (easily and really quickly, even though there was an industry around benchmarking that turned up out of nowhere, complete with a glossary of jargon) and while I couldn’t find a way to do it myself a year ago, I saw something last week that was rather good. Alex Ziskind probably.