A cool new paper about detecting "hallucinations" (or "bullshitting") of LLMs.
The idea is deceptively simple. If we can cluster potential answers based on the semantic entailment, and then calculate the entropy of the potential answers, then we can see how "unsure" the model is. The more "unsure" it feels, the likelier that the generated answer is a confabulation.
"Detecting hallucinations in large language models using semantic entropy"
Detecting hallucinations in large language models using semantic entropy - Nature
Hallucinations (confabulations) in large language model systems can be tackled by measuring uncertainty about the meanings of generated responses rather than the text itself to improve question-answering accuracy.

