Mastodawn

I've never been opposed to the word "hallucinating" for describing how AI makes mistakes ... until now.

I just talked to someone who thought AI hallucinations would be obvious because it would be obvious if you talked to a *person* who was hallucinating.

In other words, they equated "hallucination" with "sounds wacko" and accepted AI output as true because it sounded level headed.

1/2

Show thread

Mignon Fogarty 4d ago

The word "hallucination" isn't going away — it's a widely used industry term — but we need to explain it better for beginners:

"Hallucination" is just a fancy word for "confidently makes mistakes":

"Remember: AI hallucinates, and you need to confirm all facts" should be something like "Remember: AI confidently makes mistakes, and you need to confirm all facts" or "AI tells you things that are wrong in a way that sounds completely believable. Confirm all facts!"

Show thread

Orion (he/him)4d ago

@grammargirl This is a good example of why that term is so dangerous. Thank you for posting it.

That said, while I have zero hope of making that term go away, we also have the word "slop" as a counter.

"Ugh. It had a hallucination..."

"Yup. And the results are now slop."

That said, I don't myself use "hallucination" in the "AI" context. I refer to the error rate, which last I checked, hovered around 40%.

Show thread

Mignon Fogarty 4d ago

@orionkidder Good point.

Also, the error rate now highly depends on which model you're talking about, but I think that's the rate for those that are most widely used -- e.g., the free models.

Show thread

Orion (he/him)4d ago

@grammargirl I'm seeing people claim the error rate is lower with other models, and I'm not sure I believe that since this industries just piles lies on top of lies, but the only plausible explanation of the lowered error rate I've seen is for Claude code.

Show thread

Orion (he/him)4d ago

@grammargirl If I understand correctly, it shoves every query through the "AI" multiple times and tests whether it does the thing it's asked to do, but of course, it hides all of that from the user.

Show thread

Orion (he/him)4d ago

@grammargirl To me, that feels like a brute-force workaround, a kludge, not an improvement in the tech itself. It's like saying, my car is too slow, so I'll attach a second engine to the hood.

Show thread

Riley S. Faelan

@orionkidder No, that's probably how human brains do it. The genAI loop is wacky in other ways, but testing its results is not a wacky part of it.

@grammargirl