Such a great example on how multimodal AI models do not form a world model or reason, but are stochastic models over (compressed) training data.

How many circles are in this image?

1st image: "5, no problem "
Other versions: "5? 9? 10? What about 5?"
https://techcrunch.com/2024/07/11/are-visual-ai-models-actually-blind/

'Visual' AI models might not see anything at all | TechCrunch

Although these companies' claims are artfully couched, it's clear that they want to express that the model sees in some sense of the word.

TechCrunch