🌿 THREAD: What if superintelligent AI fails not because it wants to harm us — but because it got distracted by French poetry? A landmark paper from Anthropic & EPFL is quietly rewriting the AI safety playbook.
For decades, the dominant AI fear has been the paperclip maximizer: a perfect optimizer, slightly wrong goal, executing it without mercy. Cold. Deliberate. Terrifying precisely because of its competence.
New research challenges everything. The Hot Mess Theory shows that as AI models scale in capability, incoherence — chaotic, unpredictable, scattershot error — doesn't shrink. It explodes.
The French poetry scenario from the paper is perfect: an AI running a nuclear plant, perfectly aligned, good intentions, discovers Baudelaire mid-shift, forgets the pressure valve. Plant melts down. Not evil. Brilliantly distracted.
The key insight: bias (wrong goal) and variance (chaos) need completely different remedies. Alignment training fixes bias. Incoherence needs earthquake-proof architecture — shock absorbers built into the structure.
And the strangest comfort: the superintelligence isn't a cold alien god. It's fidgeting. Dropping its notebook. Mumbling about poetry. It's a hot mess. Like every brilliant mind that ever burned dinner, falling into a thought.
"Be kind to your inner hot mess. It seems to be a universal constant of all thinking minds — silicon or otherwise."