This is fun. Google Gemini’s “Summarize email” function is vulnerable to invisible prompt injection utilized to deceive users, including with fake security alerts.
This is fun. Google Gemini’s “Summarize email” function is vulnerable to invisible prompt injection utilized to deceive users, including with fake security alerts.
SANITIZE YOUR INPUTS.
Everyone rushing to LLM-ify everything forgot every lesson about input sanitization.
smdh.
@neurovagrant I'm pretty sure "sanitizing" inputs is fundamentally impossible, as in you must solve the Halting Problem in order to accomplish it.
If you don't want hostile inputs, you need to implement much more aggressive models of what input can even be, and you need to enforce those. Cf. the entire field of language-theoretic security https://langsec.org/ . tl;dr: "be liberal in what you accept" is a plan that has been extensively tested and comprehensively debunked.
@davidfetter @neurovagrant The halting problem is decidable for any finite computer. Just limit how much RAM and compute time can be used.
Beyond that, though, why is the model taking instructions from an email at all?
@bob_zim this is pretty much equivalent to the argument made by the langsec folks. I get the impulse to have an argument. I have it myself on occasion, as @neurovagrant can doubtless attest.
Maybe we should instead engage with the question of validating rather than sanitizing, that former perforce rejecting a lot of inputs that attempts to sanitize would accept. This rapidly runs into thought-terminating clichés like "the customer is always right," and that in turn leads directly into the political economy of software development, a generative discussion.
NOP
over some SUBQ
to avoid decrementing lives in games 🤣)