AI gets a D: Study shows inaccuracies, inconsistency in ChatGPT answers

"It struggled most to identify hypotheses as false, getting those answers correct just 16.4% of the time. Furthermore, ChatGPT was inconsistent: Across 10 identical prompts, it consistently estimated only 73% of the statements accurately."

🔗 https://news.wsu.edu/press-release/2026/03/12/ai-gets-a-d-study-shows-inaccuracies-inconsistency-in-chatgpt-answers/

#AI #ArtificialIntelligence #Technology #Tech #ChatGPT #Science

AI gets a D: Study shows inaccuracies, inconsistency in ChatGPT answers

A WSU study found ChatGPT frequently gave inconsistent answers when evaluating scientific hypotheses, highlighting limits in current AI reasoning.

WSU Insider

@bibliolater

> Cicek and his colleagues ran the experiment with the free version of ChatGPT-3.5 in 2024, and the free, updated ChatGPT-5 mini in 2025.

Lmao, is this a joke? Anyways, if you're looking for a benchmark for fluid intelligence, ACI-AGI is the place for that

https://arcprize.org/arc-agi/2/

ARC-AGI-2

Details about ARC-AGI-2

ARC Prize