I tested 80 models on two simple grid-based problems, asking them to locate 2026 and compute the sum of the neighbors when the numbers are placed in a spiral on the grid. The results surprised me, as models performed better on the problem I thought to be harder, but also I got to see the models cheating. Read more at https://mihai.page/ai-2026-1/
Testing 80 LLMs on spatial reasoning on grids

How do LLMs see 2D grids? Would it be harder for them to work on square grids or hexagonal ones? I'm expanding the Kaggle benchmark mentioned in the last article and I'm testing 80 different LLMs. The results will surprise you.