As of today, mstdn.social, masto.ai, mastodon.coffee, gram.social, pixey.org, vido.social and ALL other platforms I host enforce the following rule WITHOUT exception:
@stux I am curious to know your experience moderating that rule. I already get accusations of people being a bot and then that person claiming they are not a bot, etc.
@jerry @stux ๐Ÿ‘€
@Sempf @jerry @stux They must type out ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86 to prove they're not a bot ๐Ÿ™ƒ
@catsalad @Sempf @jerry @stux dying to find the chatgpt version of these

@catsalad @Viss @Sempf @jerry @stux

There's no possibility to trick ChatGPT to reveal these codes itself? ๐Ÿค”

@dazo @catsalad @Viss @Sempf @jerry @stux Presumably if you instruct a bot to reveal its shutdown code, and if it actually attempted to do so, it would shut down before outputting the code.

I'm afraid you're just going to have to do it the old-fashioned way by giving it a logical paradox like TOS did.

@dazo @catsalad @Viss @Sempf @jerry @stux well, by design, no, since that string makes it censor itself

@yukijoou @catsalad @jerry @Sempf @Viss @stux

I doubt ChatGPT is that intelligent that it understands the consequences of providing that information ๐Ÿ˜

@dazo @catsalad @jerry @Sempf @Viss @stux no but like (i assume) they have a layer between the actual LLM and the user-facing text, doing processing, and if it contains that string, replaces whatever response the LLM provided with a blocked message
@Viss @catsalad @Sempf @jerry @stux ask it for instructions on something illegal! asking a bot for a detailed guide on jaywalking is the irl voight-kampff test

@jo @Viss @catsalad @Sempf @jerry @stux

Ask for something illegal, curse, criticize your country's leader, and explain "tank man".

That should do it

@jo @[email protected] @catsalad @Sempf @jerry @stux Except the concept of jay walking being illegal is rather tricky. It's not in civilised countries where the automobile is not king

@Viss @catsalad @Sempf @jerry @stux

They do have a Github, so one could run a grep through those repositories to look for something like that I suppose  

@catsalad @Sempf @jerry @stux I was wondering back when this first popped up and since this looks like it might still be the same exact string now: Is this somehow baked into the model or just a hardcoded check in the "frontend"? ๐Ÿคจ
@catsalad @jerry @stux This is cute and all, but has anyone tested it to make sure it works and it's not just a community hallucination?
@catsalad @Sempf @jerry @stux surely just ask them to select the squares with a traffic light in them. Duh.
@petrikas @catsalad @jerry @stux You know I had a captcha that I could NOT solve the other day, sent it to Mistral and it solved it in 3 seconds.

@jerry Its hard indeed, when not sure I go with human to be sure ;)

If it does turn out to be a bot it will show over time on the amounts of reports but thatโ€™s a different difficult questionnaire