Mastodawn

Knowing and hallucinating is all the same for a LLM. They are made to just predict the most probable next token for the given input, based on what they learned. They knew the fact? Then their prediction was good. They hallucinated? Then their prediction was bad, maybe because the knowledge is bad at that point.

About creating something new: that's one of the really fascinating aspects of GenAI. They were just tasked to predict tokens, but along the way, they learned to abstract! This can be shown:

- on visual tasks, see https://arxiv.org/abs/2305.18354
- by detecting abstract linguistic concepts, see https://arxiv.org/abs/2404.15848

Now, abstraction is the key to recombine existing knowledge and apply it to new domains. To me, that explains how LLMs can solve new coding tasks, explain strange topics to 5 year olds, compose poems about the absurdest subjects, or do other things that can't just be copied from somewhere.

LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations

Can a Large Language Model (LLM) solve simple abstract reasoning problems? We explore this broad question through a systematic analysis of GPT on the Abstraction and Reasoning Corpus (ARC), a representative benchmark of abstract reasoning ability from limited examples in which solutions require some "core knowledge" of concepts such as objects, goal states, counting, and basic geometry. GPT-4 solves only 13/50 of the most straightforward ARC tasks when using textual encodings for their two-dimensional input-output grids. Our failure analysis reveals that GPT-4's capacity to identify objects and reason about them is significantly influenced by the sequential nature of the text that represents an object within a text encoding of a task. To test this hypothesis, we design a new benchmark, the 1D-ARC, which consists of one-dimensional (array-like) tasks that are more conducive to GPT-based reasoning, and where it indeed performs better than on the (2D) ARC. To alleviate this issue, we propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC. Although the state-of-the-art GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset, our study reveals that the use of object-based representations can significantly improve its reasoning ability. Visualizations, GPT logs, and data are available at https://khalil-research.github.io/LLM4ARC.

arXiv.org