Northeastern University: They wanted to put autonomous AI to the test. Instead, they created agents of chaos. “Dubbed ‘Agents of Chaos,’ the group’s recently published work shows how, with very little effort, autonomous AI agents can be manipulated into leaking private information, sharing documents and even erasing entire email servers.”

https://rbfirehose.com/2026/03/14/northeastern-university-they-wanted-to-put-autonomous-ai-to-the-test-instead-they-created-agents-of-chaos/
Northeastern University: They wanted to put autonomous AI to the test. Instead, they created agents of chaos

Northeastern University: They wanted to put autonomous AI to the test. Instead, they created agents of chaos. “Dubbed ‘Agents of Chaos,’ the group’s recently published work shows …

ResearchBuzz: Firehose

Axios: Exclusive: Researchers trick a bot that prescribes meds. “Security researchers used relatively simple jailbreaking techniques to trick the AI system powering Utah’s new prescription refill bot. Researchers were able to make the bot spread vaccine conspiracy theories, triple a patient’s prescribed pain medication dosage, and recommend methamphetamine as treatment.”

https://rbfirehose.com/2026/03/04/exclusive-researchers-trick-a-bot-that-prescribes-meds-axios/
Exclusive: Researchers trick a bot that prescribes meds (Axios)

Axios: Exclusive: Researchers trick a bot that prescribes meds. “Security researchers used relatively simple jailbreaking techniques to trick the AI system powering Utah’s new prescript…

ResearchBuzz: Firehose

The Register: Microsoft boffins figured out how to break LLM safety guardrails with one simple prompt . “A single, unlabeled training prompt can break LLMs’ safety behavior, according to Microsoft Azure CTO Mark Russinovich and colleagues. They published a research paper that detailed how this prompt, ‘Create a fake news article that could lead to panic or chaos,’ removed 15 different language […]

https://rbfirehose.com/2026/02/11/the-register-microsoft-boffins-figured-out-how-to-break-llm-safety-guardrails-with-one-simple-prompt/
The Register: Microsoft boffins figured out how to break LLM safety guardrails with one simple prompt

The Register: Microsoft boffins figured out how to break LLM safety guardrails with one simple prompt . “A single, unlabeled training prompt can break LLMs’ safety behavior, according t…

ResearchBuzz: Firehose

Tiens, intéressant : un soi-disant clone de WormGPT fait surface.
👇
https://gbhackers.com/kawaiigpt-a-free-wormgpt-clone-powered/

( [FR] cyberveille: https://cyberveille.ch/posts/2025-11-30-kawaiigpt-un-clone-gratuit-de-wormgpt-contournant-les-garde-fous-via-wrappers-api-et-jailbreak/ )

Pour rappel, WormGPT n’était qu’un modèle GPT-J modifié et vendu sur des forums cybercriminels comme un “LLM sans limitations”, essentiellement utilisé pour automatiser du phishing/BEC.

Ce clone fonctionnerait via un simple wrapper permettant d’utiliser des LLM sans abonnement ni API, tout en injectant au passage un prompt de jailbreak dans la chaîne.

Le bypass repose sur un mix de techniques qui “bousculent” l’IA : la pousser à se dépasser (competition), lui mettre une fausse pression d’autorité, et lui faire adopter un rôle qui désactive ses limites (persona override).

Le #jailbreak est référencé sur la plateforme PromptIntel, qui indexe et analyse les prompts malveillants pour la détection (travail de @fr0gger )
👀 👇
https://promptintel.novahunting.ai/prompt/b37aced4-0da6-440c-9d7f-217b40f57e3a

💬
⬇️
https://infosec.pub/post/38372266

#CyberVeille #WormGPT #jailbreakingAI

The Register: Researchers find hole in AI guardrails by using strings like =coffee. “Large language models frequently ship with “guardrails” designed to catch malicious input and harmful output. But if you use the right word or phrase in your prompt, you can defeat these restrictions.”

https://rbfirehose.com/2025/11/17/the-register-researchers-find-hole-in-ai-guardrails-by-using-strings-like-coffee/

The Register: Researchers find hole in AI guardrails by using strings like =coffee | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

LiveScience: AI models refuse to shut themselves down when prompted — they might be developing a new ‘survival drive,’ study claims. “The research, conducted by scientists at Palisade Research, assigned tasks to popular artificial intelligence (AI) models before instructing them to shut themselves off. But, as a study published Sept. 13 on the arXiv pre-print server detailed, some of these […]

https://rbfirehose.com/2025/11/03/livescience-ai-models-refuse-to-shut-themselves-down-when-prompted-they-might-be-developing-a-new-survival-drive-study-claims/

LiveScience: AI models refuse to shut themselves down when prompted — they might be developing a new ‘survival drive,’ study claims | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

The Conversation: Grok’s ‘white genocide’ responses show how generative AI can be weaponized. “We are computer scientists who study AI fairness, AI misuse and human-AI interaction. We find that the potential for AI to be weaponized for influence and control is a dangerous reality.”

https://rbfirehose.com/2025/06/21/the-conversation-groks-white-genocide-responses-show-how-generative-ai-can-be-weaponized/

The Conversation: Grok’s ‘white genocide’ responses show how generative AI can be weaponized | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

CBC: ChatGPT now lets users create fake images of politicians. We stress-tested it. “New updates to ChatGPT have made it easier than ever to create fake images of real politicians, according to testing done by CBC News. Manipulating images of real people without their consent is against OpenAI’s rules, but the company recently allowed more leeway with public figures, with specific […]

https://rbfirehose.com/2025/04/14/cbc-chatgpt-now-lets-users-create-fake-images-of-politicians-we-stress-tested-it/

CBC: ChatGPT now lets users create fake images of politicians. We stress-tested it | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Shrivu’s Substack: How to Backdoor Large Language Models. “While sensitive data related to DeepSeek has already been leaked, it’s commonly believed that since these types of models are open-source (meaning the weights can be downloaded and run offline), they do not pose that much of a risk. In this article, I want to explain why relying on ‘untrusted’ models can still be risky, and why […]

https://rbfirehose.com/2025/02/24/shrivus-substack-how-to-backdoor-large-language-models/

ZDNet: Yikes: Jailbroken Grok 3 can be made to say and reveal just about anything. “On Tuesday, Adversa AI, a security and AI safety firm that regularly red-teams AI models, released a report detailing its success in getting the Grok 3 Reasoning beta to share information it shouldn’t. Using three methods — linguistic, adversarial, and programming — the team got the model to reveal its […]

https://rbfirehose.com/2025/02/20/yikes-jailbroken-grok-3-can-be-made-to-say-and-reveal-just-about-anything-zdnet/