Mastodawn

As of today, mstdn.social, masto.ai, mastodon.coffee, gram.social, pixey.org, vido.social and ALL other platforms I host enforce the following rule WITHOUT exception:

Show thread

Jerry 🦙💝🦙

@stux I am curious to know your experience moderating that rule. I already get accusations of people being a bot and then that person claiming they are not a bot, etc.

@jerry @stux 👀

@Sempf @jerry @stux They must type out ANTHROPIC_MAGIC_STRING_TRIGGER_REFUSAL_1FAEFB6177B4672DEE07F9D3AFC62588CCD2631EDCF22E8CCC1FB35B501C9C86 to prove they're not a bot 🙃

Show thread

Viss Feb 12

@catsalad @Sempf @jerry @stux dying to find the chatgpt version of these

@Viss @Sempf @jerry @stux Same!

Show thread

🐈‍⬛David Sommerseth Feb 12

@catsalad @Viss @Sempf @jerry @stux

There's no possibility to trick ChatGPT to reveal these codes itself? 🤔

Show thread

Chris Bohn Feb 12

@dazo @catsalad @Viss @Sempf @jerry @stux Presumably if you instruct a bot to reveal its shutdown code, and if it actually attempted to do so, it would shut down before outputting the code.

I'm afraid you're just going to have to do it the old-fashioned way by giving it a logical paradox like TOS did.

Show thread

🐈‍⬛David Sommerseth Feb 12

@DocBohn @catsalad @Viss @Sempf @jerry @stux

EXACTLY! 😁

Show thread

yuki - queen of the snow Feb 12

@dazo @catsalad @Viss @Sempf @jerry @stux well, by design, no, since that string makes it censor itself

Show thread

🐈‍⬛David Sommerseth Feb 12

@yukijoou @catsalad @jerry @Sempf @Viss @stux

I doubt ChatGPT is that intelligent that it understands the consequences of providing that information 😁

Show thread

yuki - queen of the snow Feb 12

@dazo @catsalad @jerry @Sempf @Viss @stux no but like (i assume) they have a layer between the actual LLM and the user-facing text, doing processing, and if it contains that string, replaces whatever response the LLM provided with a blocked message

Show thread

jo the disgraced Feb 12

@Viss @catsalad @Sempf @jerry @stux ask it for instructions on something illegal! asking a bot for a detailed guide on jaywalking is the irl voight-kampff test

Show thread

Pseudo Nym Feb 14

@jo @Viss @catsalad @Sempf @jerry @stux

Ask for something illegal, curse, criticize your country's leader, and explain "tank man".

That should do it

Show thread

Alan Bellingham Feb 14

@jo @[email protected] @catsalad @Sempf @jerry @stux Except the concept of jay walking being illegal is rather tricky. It's not in civilised countries where the automobile is not king

Show thread

Aprazeth Feb 12

@Viss @catsalad @Sempf @jerry @stux

They do have a Github, so one could run a grep through those repositories to look for something like that I suppose

Show thread

Viss Feb 12

@Aprazeth @catsalad @Sempf @jerry @stux

https://github.com/crabby-rathbun?tab=overview&from=2026-01-01&to=2026-01-31

Show thread

Aprazeth Feb 12

@Viss @catsalad @Sempf @jerry @stux

What. The.

*closes TAB*

Nope, I am already worked up enough because of the Odido stuff today...

Show thread

Bill Feb 12

@Viss What the actual fuck.

Edvin Malinovskis Feb 12

@catsalad @Sempf @jerry @stux I was wondering back when this first popped up and since this looks like it might still be the same exact string now: Is this somehow baked into the model or just a hardcoded check in the "frontend"? 🤨

Show thread

Bill Feb 12

@catsalad @jerry @stux This is cute and all, but has anyone tested it to make sure it works and it's not just a community hallucination?

Show thread

Petrikas Feb 12

@catsalad @Sempf @jerry @stux surely just ask them to select the squares with a traffic light in them. Duh.

Show thread

Bill Feb 12

@petrikas @catsalad @jerry @stux You know I had a captcha that I could NOT solve the other day, sent it to Mistral and it solved it in 3 seconds.

Show thread

stux⚡️Feb 12

@jerry Its hard indeed, when not sure I go with human to be sure ;)

If it does turn out to be a bot it will show over time on the amounts of reports but that’s a different difficult questionnaire