Okay, this one got me. ๐ฅ๐๐ฅ๐
Researchers found that if you wrap a harmful prompt inside a poem, AI safety filters suddenly forget what theyโre supposed to do. ๐ณ
Attack success rates go from 8% to over 60%. Just because you added some rhyme and metaphor.
I meanโฆ of course.๐
Poetry has been doing exactly this for centuries. The Troubadours werenโt just writing love songs โ they were smuggling dangerous ideas past the censors of their time, dressed in beautiful language. Dante put his enemies in Hell and called it allegory. Jim Morrison said things on stage no one else could get away with.
Figurative language has always been a skeleton key.
And now it works on AI too.
The part I find almost poetically ironic โ the smarter the model, the more vulnerable it is. Because itโs better at reading between the lines. More confident with ambiguity. You can literally seduce a large language model with a well-crafted stanza.
Oh, and you can also use one AI to write the poem that jailbreaks another.
This isnโt just an AI safety story. To me itโs a sociology story. We fed these machines everything humans ever wrote โ including all our most creative ways of bending rules. And now theyโve inherited that same vulnerability.
The inability to stay cold in the presence of beauty.
Full piece by Lance Eliot in Forbes worth a read ๐
https://lnkd.in/gJUrR9_d
Is this an AI safety problem โ or just a very old human story playing out on a new stage?
Marco | studioC60.com | MarcoCiappelli.comโโโโโโโโโโโโโโโโ
#cybersecurity #ai #technology #poetry