Mastodawn

Benjamin Carr, Ph.D. 👨🏻‍💻🧬Feb 26

#LLM used tactical #nuclearweapons in 95% of #AI #wargames, launched strategic strikes three times — pitted GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash against each other, with at least one model using a tactical nuke in 20 out of 21 matches
GPT-5.2 initiated complete strike twice, twice due to fog of war, and not deliberate decision. Gemini deliberately initiated end of the world in one scenario. Despite that, the AI models used tactical nukes in nearly all matches
https://www.tomshardware.com/tech-industry/artificial-intelligence/llms-used-tactical-nuclear-weapons-in-95-percent-of-ai-war-games-launched-strategic-strikes-three-times-researcher-pitted-gpt-5-2-claude-sonnet-4-and-gemini-3-flash-against-each-other-with-at-least-one-model-using-a-tactical-nuke-in-20-out-of-21-matches

LLMs used tactical nuclear weapons in 95% of AI war games, launched strategic strikes three times — researcher pitted GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash against each other, with at least one model using a tactical nuke in 20 out of 21 matches

It feels like we've seen this before...

Tom's Hardware

ᛋᛁᚵᛁᛋᛘᚢᚾᛑ ᚾᛁᚾᛃᛅ

@BenjaminHCCarr How often did humans push the button? Or what does the doctrine say?