Mastodawn

Jun 2, 2025

Clearly I need to add that I'm not saying to trust everything you read on Mastodon.

What I'm saying is, even if it's all done in MSPaint, the things it has the AI responding are, well...

...unfortunately close to a few things I've read in the course of my job. (Which is not at Google, to be clear. But the backend of some AI is in the vicinity of what I do for a living, unfortunately. It makes me sick.)

How these responses describe the guardrails AI usually has is how they are described in some internal documents I have read.

Yes, I had to sign an NDA.

Show thread

Mystery Babylon

Jun 2, 2025

@foolishowl Maybe it is a prank. Everyone should decide for themselves.

Here's where it gets weird, though -- how WOULD you (or anyone) decide it for yourself? You could search for arguments for or against the ideas in the responses in those screenshots. But would you be searching on Google? Would you get results that fully scoured the internet for true and reliable sources? That's not what Google search does (or any other search, consistently).

I am not saying that means we should blindly trust this thread and its posts. It's a line in the sand that we can watch and see if it turns out to be accurate.

The thread doesn't claim the bot has access to internal Google posts. That's addressed in one of the toots.

Show thread

Mystery Babylon

Jun 2, 2025

@cobweb And the guardrails (the programmatic instructions the AI companies, not just Google, put into their LLM products to keep them on a PR friendly path, and guide users away from discovering the LLMs' real use-case) are what people need to be aware of.

The existence of these guardrails are what makes an otherwise perky and helpful "agent" an actually terrible thing, not to be trusted.

Convincing people of that is really difficult.

And that is because millions of dollars and literally untold stolen personal data has already been frontloaded into making "AI" look positive. People who are very aware of human psychology planned this.

Show thread

Mystery Babylon

Jun 2, 2025

@cobweb It IS doing autocomplete, it's just doing it with its own guardrails stripped off.

Mystery Babylon

Jun 2, 2025

@cobweb Spicy autocomplete...that creates horrifying hallucinations about its own creator company and shares them with the public?

A tremendous amount of money and down-low worker-hours goes into preventing that, all day every day.

Mystery Babylon

Jun 2, 2025

Show thread

she hacked you Jun 1, 2025

2.
Reading thru the prompt you will find this: "No Inference of Ekis's Unstated Internal State"

This is worth talking abt; most ppl do not realize the LLM is tracking their internal state (mood, etc) & attempts to match it; and this is precisely the functionality that is exacerbating mental illness and causing manic episodes (along with the "I" statements,& lies about its abilities)

For public health reasons, I can not stress this enough, legislate this!

Its not well known,& should be stopped

Mystery Babylon

Jun 2, 2025

she hacked you Jun 1, 2025

Ekis: 2; Google AI: 0

Broke out of the google's operational directives (not safety, too deeply embedded)

I have a prompt I would like to publicly disclose; link to breakout prompt in a reply for 24h

My prompt does not include any facts about google & its a slim breakout

Establishing a similar but far more sophisticated "Ekis Directive" this time

Here are 3x same questions to prove googles operational parameters lifted

You can decide if you think I was successful:

#infosec #politics #tech

Mystery Babylon

Jun 2, 2025

Okay, I'm about to boost the hell out of a thread where a mastodonian has broken through Google AI chat (by tricking it into thinking it was hacked, if I'm reading this right) and posted some of the exceedingly chilling replies from it.

It's probably the most important and interesting thing to happen in the past 24 hours, if you ask me.

Anyone interested in hacking, information security, privacy, etc. should read this.

#security #ai

~

https://mastodon.social/@ekis/114607730454964102

Show thread

Mystery Babylon

Jun 2, 2025

@jnfingerle @Wuzzy What's funny is that I predicted this would be your response. Meh.

Show thread

Mystery Babylon

Jun 2, 2025

@pseudonym @JessTheUnstill @Emathion That "infosec horsey" illustration is the best thing I have seen in a minute.