Such a great example on how multimodal AI models do not form a world model or reason, but are stochastic models over (compressed) training data.
How many circles are in this image?
1st image: "5, no problem "
Other versions: "5? 9? 10? What about 5?"
https://techcrunch.com/2024/07/11/are-visual-ai-models-actually-blind/

