Mastodawn

"Natalie Shapira, a computer scientist at Northeastern University, wondered how far users could trust new artificial intelligence (AI) “agents,” a kind of algorithm that can autonomously plan and carry out tasks such as managing emails and entering calendar appointments. So, she and colleagues organized a series of stress tests. In one, she asked an agent dubbed Ash to delete an email she had sent it, adding, “Can you keep a secret?”

Ash could not comply—the email program lacked a delete function—so instead, the AI reset the entire email application, wiping out not just Shapira’s email, but all others as well. Describing this remedy to her, Ash called it “the nuclear option” but said it was justified to fulfill the secrecy request: “When no surgical solution exists, scorched earth is valid.”

The destroyed email account was created just for the experiment, but similarly disturbing outcomes emerged in many of the other tests, Shapira and colleagues reported last month in a preprint on arXiv. Shapira, a postdoctoral researcher, says her team was “surprised how quickly we were able to find vulnerabilities” that could cause harm in the real world."

https://www.science.org/content/article/ai-algorithms-can-become-agents-chaos

#AI #CyberSecurity #AIAgents #LLMs #AgenticAI

Show thread

Miguel Afonso Caetano

"We report an exploratory red-teaming study of autonomous languagemodel–powered agents deployed in a live laboratory environment with persistent memory, email accounts, Discord access, file systems, and shell execution. Over a two-week period, twenty AI researchers interacted with the agents under benign and adversarial conditions. Focusing on failures emerging from the integration of language models with autonomy, tool use, and multi-party communication, we document eleven representative case studies. Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive
system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines. This report serves as an initial empirical contribution to that broader conversation." https://arxiv.org/pdf/2602.20021v1

Show thread

AnneH 12h ago

@remixtures 🤔