Seriously, a large percentage of these attacks boil down to downloading untrusted content, mangling it ever so slightly, and then hoping that the AI decides to blindly eval all of it
WHICH
IT
THEN
FUCKING
DOES
I am losing it. What are these absurdly overpaid devs doing with their life?
Are you an AI vendor and you wanna prevent most attacks on the internet? All you need to do is:
1. Make your config files _READ ONLY_ during agent invocation
2. Use Content Security Policies correctly
3. Sanitize + normalize unicode input and output
4. Scan *both* the input and output
… bruh??
What is this, are we back in the 90s? And holy shit the "OWASP 10 Tricks To Not Get Pwned By A Script Kiddy" listicle is somehow shockingly novel and innovative??
I am begging you foundational model code monkeys. Y'all are paid WAY too much to write code this stupid. Get your shit together, PLEASE
I do concede that a stronger fix of building a symbolic execution engine, constructing an AST of tool invocations, and then ensuring that all reachable traversal paths have appropriate policies on them to prevent data integrity violations... is difficult.
But the bar for *competence* is far lower