Mastodawn

cm0002 Jul 8

Researchers Jailbreak AI by Flooding It With Bullshit Jargon

https://lemmy.cafe/post/20090640

Researchers Jailbreak AI by Flooding It With Bullshit Jargon - Lemmy Cafe

Lemmy

Show thread

Avicenna Jul 8

I wonder if they tried this on DeepSeek with Tiananmen square queries

Show thread

SheeEttin Jul 9

No, those filters are performed by a separate system on the output text after it’s been generated.

Show thread

Avicenna

makes sense though I wonder if you can also tweak the initial prompt so that the output is also full of jargon so that output filter also misses the context

Show thread

SheeEttin Jul 9

Yes. I tried it, and it only filtered English and Chinese. If I told it to use Spanish, it didn’t get killed.