@rayckeith let that be a lesson to sanitize your inputs
@fullywoolly @rayckeith except that is literally impossible in LLMs
@Yuvalne @rayckeith are we only considering the whole internet hoovering LLMs? Isn't that why poisoning them is so effective?
@fullywoolly @rayckeith in any LLM. by their very structure there is no difference between commands and input, it's all just input. you can't sanitize input to make sure it doesn't become a command because there's no separation between input and command. "system prompts" are just input that's not shown to the user.
@Yuvalne @rayckeith ahh I see what you mean. I was thinking more in terms of training sets and like using public domain books or internal projects documentation. Relatively benign and still building a decent probability model.
@fullywoolly @rayckeith doesn't matter what you train your model on, this issue will still exist. so long as the model is capable of interpreting commands in its input, well, it's capable of interpreting commands in its input. and since there's no difference between the commands and the input (it's all input), either your model has prompt injection, or it has no system prompts at all.
that's how Anthropic's Mythos got "jailbroken" despite their testing: there's no system prompt, it's all input.
@Yuvalne @rayckeith hadn't heard about mythos jailbreak. That's hilarious. Thanks for the info. I obviously only have a vague understanding of LLMs and definitely miss the nuances you're talking about. My original comment was specifically a nod to the end of the Bobby Tables xkcd.
@fullywoolly @rayckeith
yeah, of course. and you're correct, you always should sanitize your inputs, which is why the fact these companies are deploying these bots into every tool we use before they've solved the question of how to sanitize input when all input is command and all command is input is really quite worrying.

@benjamineskola The server seems to not be active.

@rayckeith @mhoye

@mkj that's weird, it loaded for me earlier. or maybe it was that my server had it cached or something? apologies for being unhelpful!

@rayckeith @mhoye

@rayckeith Bobby Tables and Iggy Disregard