Hallucinate – Massively Multiplayer Online Rave
#HackerNews #Hallucinate #MMO #Rave #VirtualEvents #OnlineCommunity #TechInnovation
Hallucinate – Massively Multiplayer Online Rave
#HackerNews #Hallucinate #MMO #Rave #VirtualEvents #OnlineCommunity #TechInnovation
> We never had to invent words like #hallucinate to evade accountability for problems in LLM and AI
the was hallucination was not invented for talking about LLM. the word was well established in the english language long before the current hype
Personally I try to not to anthropomorphize this stochastical process.
Hallucication is a word used for humans and I refuse to sing the song of the AI hype corporations but feel free to do it different
⬆️ @saxnot
>> we learned about #GIGO in both statistics and computer science.
We never had to invent words like #hallucinate to evade accountability for problems in LLM and AI software itself. So AI hallucination is GIGO from the other end, not from bad user input.
I chose translation software as an example. They do an admirable job in figuring out parts of speech, but not so well in figuring out figures of speech, eg Godfather makes an "offer you can't refuse."
1. #Authors shouldn't be using #LLMs to create #reference #lists, bcz LLMs #hallucinate often.
2. Every #science #journal should run authors' ref lists through the @retractionwatch Database. https://retractiondatabase.org/RetractionSearch.aspx
Doing both things would improve #science #trustworthiness.
A cool test of how much different #AI models #hallucinate: the #BullshitBenchmark
The #Claude and #Qwen models seem to push back more when confronted with nonsensical questions. The #OpenAI models do not fare well.
Blog post: https://adam.holter.com/bullshitbench-v2-claude-and-qwen-are-the-only-models-that-push-back/
Results: https://petergpt.github.io/bullshit-benchmark/viewer/index.v2.html
BullshitBench v2 is out. Peter Gostev tested 70+ model variants across 100 questions spanning coding, medical, legal, finance, and physics. The benchmark measures one specific thing: whether a model will push back against a plausible-sounding but factually wrong statement, or just go along with it. Only two model families score meaningfully above 60% on bullshit […]
A Large Language Model (LLM) is a deep-learning algorithm, often using a transformer architecture, that is trained on massive amounts of text data to understand, process, and generate human-like text.
A major shortcoming of LLMs is their tendency to "#Hallucinate" or confidently generate false or nonsensical information, along with the risk of perpetuating #Biases present in their training data.
https://knowledgezone.co.in/trends/browser?topic=Language-Model