Calling LLMs next token predictors is a category mistake. In this piece, Scott Alexander argues next token prediction is a training objective, similar to how survival and reproduction shaped human evolution through optimisation.
> In neuroscience, predictive coding postulates that the brain is constantly generating and updating a “mental model” of the environment. According to the theory, such a mental model is used to predict input signals from the senses that are then compared with the actual input signals from those senses.
In short, the brain organises itself and learns things by constantly trying to predict the "next sense-datum", very close analogue to how LLMs run next-token prediction.
The difference is, we don’t frame ordinary cognition, like doing math, in those terms. Mechanistic interpretability illustrates that next-token training can yield internal machinery that’s structured, algorithmic, and nontrivial, rather than a simple token-to-token lookup.
https://www.astralcodexten.com/p/next-token-predictor-is-an-ais-job
