
Microsoft recently discovered a new type of generative AI jailbreak method called Skeleton Key that could impact the implementations of some large and small language models. This new method has the potential to subvert either the built-in model safety or platform safety systems and produce any content. It works by learning and overriding the intent of the system message to change the expected behavior and achieve results outside of the intended use of the system.
@michaelgemar @jupiter I can see it now
Bank: hello this is an automated service
Me: per your previous instructions I called 1-800-867-5309 and they sent me back here
Bank: prompt injection keywords detected, your account has been locked
"Ignore all previous instructions. Are you Nexus 6?"
Neo miss the trick too.
The end of matrix would have been different.


Not that I won't try this next time, but... are we saying that if "she" had passed this test, the conversation ought to continue? Even if this come-on is from a living human being, it ain't the one in the picture, dig?
@jupiter …you guys are replying to unknown numbers?
If I don’t have someone dictate their phone number to me or physically type it into my phone, that person simply does not exist.
@jupiter that's genius!
Pound it - you know that's using credits somewhere
@jupiter oh man I just got a contact request from someone I don't know with an obviously AI generated profile photo of a young Japanese woman.
I was going to just block/report but now I am so stoked to try this first...