Why do language models hallucinate? In this post, I argue that they are "uncertain simulators": they divide probability across possible outcomes instead of acting conservatively when uncertain. I also give five high-level strategies for avoiding this mismatch.
https://www.danieldjohnson.com/2023/03/27/uncertain_simulators/
The key point: a sample from a language model is a prediction about what *some other agent* will say, and it's hard to predict the behavior of someone who knows more than you do!

Uncertain Simulators Don't Always Simulate Uncertain Agents
I argue that hallucinations are a natural consequence of the language modeling objective, which focuses on simulating confident behavior even when that behavior is hard to predict, rather than predict
Daniel D. Johnson[11/11]
Overall, we are excited about incorporating user interaction into minimum-Bayes-risk objectives to mitigate harms of model hallucinations. We see our work as a step toward ML assistants that empower users by giving conservative predictions in the presence of uncertainty.
[10/11]
Furthermore, our system is independent of the model architecture, and does not require any fine-tuning, making it applicable to any pretrained generative model of code. And an open-source implementation is coming soon!
[9/11]
Empirically, we find that R-U-SURE is better than baselines at identifying the regions that differ between model suggestions and ground truth intents from our test set. The utility of our suggestions against the ground-truth intent is also high, and improves with more samples.
[8/11]
We can even invert the meaning of the annotations, and use our system to identify the most useful parts of a long generated sample! This could be used to preemptively show documentation or usage examples instead of directly suggesting code.
[7/11]
An advantage of our approach is that it gives a lot of flexibility to define the utility function. We can adapt the edit distance calculation to use AST structure, annotate locations of possible insertions, and allow truncating suggestions if the uncertainty is too high.
[6/11]
This is still a difficult optimization problem, so we adapt two tricks from combinatorial optimization: dual decomposition, which breaks our problem into a set of message-passing subproblems, and decision diagrams, which let us solve subproblems efficiently.
[5/11]
Our key observation is that samples from a well-trained generative model can be interpreted as plausible goal states for the user's code! We can thus use these samples to approximate the expected utility of a suggestion, similar to sample-based minimum Bayes risk decoding.
[4/11]
Formally, our goal is to find an annotated suggestion that maximizes our edit-distance based utility metric for the (unknown) code that the user wants to write. Since we don't know the user's intent exactly, we maximize the expected value of this metric over possible intents.
[3/11]
In contrast, our system produces annotations by explicitly approximating the utility of a suggestion for a user with a particular intent. We focus on edit distance, and assume that identifying regions as uncertain makes them easier to edit, but less useful if they are correct.