"Disregard That" Attacks
"Disregard That" Attacks
The hypothetical approach I've heard of is to have two context windows, one trusted and one untrusted (usually phrased as separating the system prompt and the user prompt).
I don't know enough about LLM training or architecture to know if this is actually possible, though. Anyone care to comment?