Mastodawn

One thing that gets me about AI chat agents is the idea of attack surface. If you have a clearly defined protocol you can curtail most of the possible attacks by narrowing things, only accepting well formed requests, and validating both on the user end and then on the server end before processing anything. An LLM is inherently wide in attack surface given the way it is structured. It can take a prompt which can be any set of characters connected together into tokens. These tokens can’t easily be filtered for intent or goal and yet they can get the LLM to drop other rules or restrictions because they are just other prompts.

A simple coded padlock is not very secure, but a door with no walls is less secure.

Amazon's Rufus AI shopping assistant can be easily jailbroken and tricked into answering other questions — specific prompts break the chatbot's guidelines and reach underlying AI engine - Reddthat