Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought
Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought
It really doesn’t. You’re just describing the “fancy” part of “fancy autocomplete.” No one was ever really suggesting that they only predict the next word. If that was the case they would just be autocomplete, nothing fancy about it.
What’s being conveyed by “fancy autocomplete” is that these models ultimately operate by combining the most statistically likely elements of their dataset, with some application of random noise. More noise creates more “creative” (meaning more random, less probable) outputs. They do not actually “think” as we understand thought. This can clearly be seen in the examples given in the article, especially to do with math. The model is throwing together elements that are statistically proximate to the prompt. It’s not actually applying a structured, logical method the way humans can be taught to.
We also check to see if the word that popped into our heads actually rhymes by saying it out loud. Actual validation steps we can take is a bigger difference than being a little more robust.
We also have non-list based methods like breaking the word down into smaller chunks to try to build up hopefully more novel rhymes. I imagine professionals have even more tools, given the complexity of more modern rhyme schemes.