Very interesting post about breaches and deletions involving LLM "agents". I feel like if I'd read this before yesterday's post, I'd have put the warning elements more strongly.
Here's two of the examples they mention which I thought were particularly illuminating.
1. This exploit actually happened the other day, affecting a Python package called LiteLLM:
"The malware searches the entire machine for private keys, AWS / GCP / Azure credentials, Kubernetes configs, database passwords, .gitconfig, crypto wallet files, etc and uploads them to the attacker’s server."
2. This second exploit is possible in principle if you give an LLM-bot access to your email program. "Although not seen in the wild yet, the mechanism is proven."
"An adversarial prompt embedded in an email is processed by an AI email assistant. The assistant generates a reply containing the same malicious prompt. The reply is sent. Recipients are infected without any human-to-human interaction."
If I understand correctly, this means that _any_ use of so-called "AI agents" puts at risk (for deletion, and potentially for stealing) everything to which that "agent" has access.
The thing is, you might _think_ you've told the bot what not to touch and what not to do, but that effectively means nothing. Once it's set going,
(a) it might accidentally _lose_ part of your original instruction (as in one of the other examples), or
(b) a malicious exploit might give it a _different_ instruction.
The only way to protect valuable data is to keep it separate from LLM "agents".
The writer's conclusion, which sounds correct to me:
"Isolation has to live outside of the agent’s context entirely. A built-in sandbox can be disabled by the agent (as Snowflake and Ona both demonstrated), whereas an OS-level containment presents a much more formidable obstacle since the agent has no direct mechanism to interact with it. As well, a properly sandboxed agent won’t have sensitive information (keys, etc) lying around for it to find, and won’t be able to connect to places that haven’t been allow-listed."
("Sandbox" in this context means an area where you can run software without it touching anything outside its boundaries.)
I think if I were gonna try this stuff out, I'd probably just do it on a separate machine, away from my real things. Any useful results could be transferred across later.
https://yoloai.dev/posts/ai-agent-threat-landscape/
#LLMs #SoCalledAI #AIAgents #security