This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
I didn’t see the article talk specifically about this, or at least not in enough detail, but isn’t the de-facto standard mitigation for this to use guardrails which lets some other LLM that has been specifically tuned for these kind of things evaluate the safety of the content to be injected?
There are a lot of services out there that offer these types of AI guardrails, and it doesn’t have to be expensive.
Not saying that this approach is foolproof, but it’s better than relying solely on better prompting or human review.