"Disregard that!" attacks

Why you shouldn't share your context window with others

calpaterson.com

I didn’t see the article talk specifically about this, or at least not in enough detail, but isn’t the de-facto standard mitigation for this to use guardrails which lets some other LLM that has been specifically tuned for these kind of things evaluate the safety of the content to be injected?

There are a lot of services out there that offer these types of AI guardrails, and it doesn’t have to be expensive.

Not saying that this approach is foolproof, but it’s better than relying solely on better prompting or human review.

The article does mention this and a weakness of that approach is mentioned too.
Perhaps they asked AI to summarize the article for them and it stopped after the first "disregard that" it read into its context window.