Petr Kaška @ #devConf_CZ spoke about detecting malicious prompts. You can attack your model using https://github.com/Security-FIT/PromptAttacker