Mastodawn

Foxbrush Tailwag 23h ago

Third Foundation

@rayckeith let that be a lesson to sanitize your inputs

Show thread

Talya (she/her) 🏳️‍⚧️✡️1d ago

@fullywoolly @rayckeith except that is literally impossible in LLMs

Show thread

Ryan 20h ago

@Yuvalne @rayckeith are we only considering the whole internet hoovering LLMs? Isn't that why poisoning them is so effective?

Show thread

Talya (she/her) 🏳️‍⚧️✡️20h ago

@fullywoolly @rayckeith in any LLM. by their very structure there is no difference between commands and input, it's all just input. you can't sanitize input to make sure it doesn't become a command because there's no separation between input and command. "system prompts" are just input that's not shown to the user.

Show thread

Ryan 19h ago

@Yuvalne @rayckeith ahh I see what you mean. I was thinking more in terms of training sets and like using public domain books or internal projects documentation. Relatively benign and still building a decent probability model.

Show thread

Talya (she/her) 🏳️‍⚧️✡️19h ago

@fullywoolly @rayckeith doesn't matter what you train your model on, this issue will still exist. so long as the model is capable of interpreting commands in its input, well, it's capable of interpreting commands in its input. and since there's no difference between the commands and the input (it's all input), either your model has prompt injection, or it has no system prompts at all.
that's how Anthropic's Mythos got "jailbroken" despite their testing: there's no system prompt, it's all input.

Show thread

Ryan 19h ago

@Yuvalne @rayckeith hadn't heard about mythos jailbreak. That's hilarious. Thanks for the info. I obviously only have a vague understanding of LLMs and definitely miss the nuances you're talking about. My original comment was specifically a nod to the end of the Bobby Tables xkcd.

Show thread

Talya (she/her) 🏳️‍⚧️✡️19h ago

@fullywoolly @rayckeith
yeah, of course. and you're correct, you always should sanitize your inputs, which is why the fact these companies are deploying these bots into every tool we use before they've solved the question of how to sanitize input when all input is command and all command is input is really quite worrying.

Show thread

ben 1d ago

@rayckeith @mhoye the original post is at https://dan.mastohon.com/@danhon/112691548112257631 btw

Show thread

mkj 1d ago

@benjamineskola The server seems to not be active.

@rayckeith @mhoye

Show thread

ben 1d ago

@mkj that's weird, it loaded for me earlier. or maybe it was that my server had it cached or something? apologies for being unhelpful!

@rayckeith Bobby Tables and Iggy Disregard