🎁 GenAI x Sec Advent #16
So you’ve built a GenAI system for cybersecurity, but how do you ensure your users won’t try to use it for something else? Or worse, try to bypass your countermeasures—for example, to retrieve PII or to make your assistant write profanity instead of the nicely configured language you intended? 😈
👨💻 For sure, you can write a good prompt, but it may not contain all the potential bypasses. To avoid this, you can use tools that validate the responses before returning them to the user, kind of a prompt firewall!
One of the option is Guardrails, an open-source project used to enforce the constraints on the outputs of LLMs. It makes sure that the model-generated responses are safe and accurate, and aligned with your requirements!
What I like about guardrails is its Hub, which has multiple predefined “rules” you can directly import into your project. 🤓
You can use it for many things such as validating the output format (for IOC for examples), you can mitigate hallucinations, avoid code exploitation, validate python output, detect jailbreak attempts and many more…
Below is a simple example to show you how it works. What would you ask it to validate that your guardrail is working correctly? 👇
➡️ https://github.com/guardrails-ai/guardrails
#genai #cybersecurity #guardrail #llm