Our world changed with LLM / AI chatbots, which must be assessed for reliability. I evaluate LLM in part by asking them about known content (me!).
In today's news Olmo 3 (Allen AI) is an epic fail on my self-test.
Aside: another chatbot I recently tested was initally dead-on viz-a-viz me BUT (exhausted of facts) w/o warning BADLY hallucinated when pressed for additional detail, fabricating false content.
Better models (Claude ...) will admit no new data and stop.



