Okay, could someone explain something to me please?

Why did ANYONE ever think “guardrails” would work?

We all know that blocklisting is suboptimal because you can’t possibly enumerate all the badness (see also: antivirus). And anyone who has had to write a statement of work that includes application security requirements knows how impossible THAT is without adding a whole textbook as an appendix. (Or just writing “Don’t do stupid shit with the code,” which covers it pretty broadly.)

Don’t do that. Or that. Or that, either. And not like that. Oh, we didn’t know you could do that! Don’t do that.

Seriously, why??

@wendynather if only the "AI" was intelligent and could understand if it someone was abusing it.

@Sikorsky78 @wendynather why? Are people any better at that? Customer service teams get training on how to detect account takeover attacks.

This effect of AI/LLM failure modes being eerily similar the human ones is the thing that makes me think something really interesting is going on. Those vectors deep in the llm are more than just words.