I tested 80 models on two simple grid-based problems, asking them to locate 2026 and compute the sum of the neighbors when the numbers are placed in a spiral on the grid. The results surprised me, as models performed better on the problem I thought to be harder, but also I got to see the models cheating. Read more at https://mihai.page/ai-2026-1/
