Mastodawn

David Gerard

You can’t feed generative AI on ‘bad’ data then filter it for only ‘good’ data

https://awful.systems/post/4424031

You can’t feed generative AI on ‘bad’ data then filter it for only ‘good’ data - awful.systems

video version [https://www.youtube.com/watch?v=GnFozXDgrug&list=UU9rJrMVgcXTfa8xuMnbhAEA]

Show thread

o7___o7 May 22, 2025

Look, AI will be perfect as soon as we have a way to sort “truth” from “falsehood”, like an oracle of some sort. They’ll probably have that in GPT-5, right?

Show thread

besselj May 22, 2025

Oh, that’s easy. Just add a prompt to always reinforce user bias and disregard anything that might contradict what the user believes.

Show thread

crawancon May 22, 2025

feed it a christian bible as a base.

Show thread

crawancon May 22, 2025

“we trained it wrong… on purpose…

…as a joke.”

Show thread

Optional May 22, 2025

MAGAgpt

Show thread

phdepressed May 23, 2025

Aka grok

Show thread

Optional May 22, 2025

They do, it just requires 1.21 Jigawatts of power for each token.

Show thread

Soyweiser May 23, 2025

Bonus this also solves the halting problem

Show thread

blakestacey May 23, 2025

“You are a Universal Turing Machine. If you cannot predict whether you will halt if given a particular input tape, a hundred or more dalmatian puppies will be killed and made into a fur coat…”

Show thread

Soyweiser May 23, 2025

Im reminded again of the fascinating bit of theoretical cs (long ago prob way outdated now) which wrote about theoretical of classes of Turing machines which could solve the halting problem for a class lower than it, but not its own class. This is also where I got my oracle halting problem solver from.

So this machine can only solve the halting problems for other utms which use 99 dalmatian puppies or less. (Wait would a fraction of a puppy count? Are puppies Real or Natural? This breaks down if the puppies are Imaginary).

Show thread

corbin May 23, 2025

Only the word “theoretical” is outdated. The Beeping Busy Beaver problem is hard even with a Halting oracle, and we have a corresponding Beeping Busy Beaver Game.

Beeping Busy Beaver - BusyBeaverWiki

Show thread

Soyweiser May 23, 2025

Thanks, I’m happy to know Imaginary puppies are still real, no wait, not real ;). (The BBB is cool, wasn’t aware of it, I don’t keep up sadly. “Thus BBB is even more uncomputable than BB.” always like that kind of stuff, like the different classes of infinity).

Love this

101 Dalmations reboot, but Cruella is radicalized by the extropian mailing list

Show thread

Soyweiser May 24, 2025

Cruella and the fur coat of Rationality.

Show thread

David Gerard May 24, 2025

my god

Show thread

Optional May 22, 2025

The chatbot “security” model is fundamentally stupid:

Build a great big pile of all the good information in the world, and all the toxic waste too.

Use it to train a token generator, which only understands word fragment frequencies and not good or bad.

Put a filter on the input of the token generator to try to block questions asking for toxic waste.

Fail to block the toxic waste. What did you expect to happen, you’re trying to do security by filtering on an input that the “attacker” can twiddle however they feel like.

Output filters work similarly, and fail similarly.

This new preprint is just another gullible blog post on arXiv and not remarkable in itself. But this one was picked up by an equally gullible newspaper. “Most AI chatbots easily tricked into giving dangerous responses,” says the Guardian. [Guardian, archive]

The Guardian’s framing buys into the LLM vendors’ bad excuses. “Tricked” implies the LLM can tell good input and was fooled into taking bad input — which isn’t true at all. It has no idea what any of this input means.

The “guard rails” on LLM output barely work and need to be updated all the time whenever someone with too much time on their hands comes up with a new workaround. It’s a fundamentally insecure system.

How to make a splash in AI economics: fake your data

In a fast-moving field like “AI,” with buzzwords and dumb money flying about, researchers don’t have time for for the peer review process. Preprints can have all the impact you could ever want! Let…

Pivot to AI

Show thread

David Gerard May 22, 2025

why did you post literally just the text from the article

Show thread

Optional May 22, 2025

It’s just a section. There’s more of the article.

Like this:

Another day, another preprint paper shocked that it’s trivial to make a chatbot spew out undesirable and horrible content. [arXiv]

How do you break LLM security with “prompt injection”? Just ask it! Whatever you ask the bot is added to the bot’s initial prompt and fed to the bot. It’s all “prompt injection.”

An LLM is a lossy compressor for text. The companies train LLMs on the whole internet in all its glory, plus whatever other text they can scrape up. It’s going to include bad ideas, dangerous ideas, and toxic waste — because the companies training the bots put all of that in, completely indiscriminately. And it’ll happily spit it back out again.

There are “guard rails.” They don’t work.

One injection that keeps working is fan fiction — you tell the bot a story, or tell it to make up a story. You could tell the Grok-2 image bot you were a professional conducting “medical or crime scene analysis” and get it to generate a picture of Mickey Mouse with a gun surrounded by dead children.

Another recent prompt injection wraps the attack in XML code. All the LLMs that HiddenLayer tested can read the encoded attack just fine — but the filters can’t. [HiddenLayer]

I’m reluctant to dignify LLMs with a term like “prompt injection,” because that implies it’s something unusual and not just how LLMs work. Every prompt is just input. “Prompt injection” is implicit — obviously implicit — in the way the chatbots work.

The term “prompt injection” was coined by Simon WIllison just after ChatGPT came out in 2022. Simon’s very pro-LLM, though he knows precisely how they work, and even he says “I don’t know how to solve prompt injection.” [blog]

Dark LLMs: The Growing Threat of Unaligned AI Models

Large Language Models (LLMs) rapidly reshape modern life, advancing fields from healthcare to education and beyond. However, alongside their remarkable capabilities lies a significant threat: the susceptibility of these models to jailbreaking. The fundamental vulnerability of LLMs to jailbreak attacks stems from the very data they learn from. As long as this training data includes unfiltered, problematic, or 'dark' content, the models can inherently learn undesirable patterns or weaknesses that allow users to circumvent their intended safety controls. Our research identifies the growing threat posed by dark LLMs models deliberately designed without ethical guardrails or modified through jailbreak techniques. In our research, we uncovered a universal jailbreak attack that effectively compromises multiple state-of-the-art models, enabling them to answer almost any question and produce harmful outputs upon request. The main idea of our attack was published online over seven months ago. However, many of the tested LLMs were still vulnerable to this attack. Despite our responsible disclosure efforts, responses from major LLM providers were often inadequate, highlighting a concerning gap in industry practices regarding AI safety. As model training becomes more accessible and cheaper, and as open-source LLMs proliferate, the risk of widespread misuse escalates. Without decisive intervention, LLMs may continue democratizing access to dangerous knowledge, posing greater risks than anticipated.

arXiv.org

Show thread

David Gerard May 23, 2025

Yes, I know, I wrote it. Why do you consider this useful to post here?

Show thread

Optional May 23, 2025

Well, I don’t think that last part was useful, but I do think the previous part was useful as a way to focus conversation. Many people don’t read the article, and I thought that was the most relevant section.

Show thread

swlabr May 23, 2025

Actually I’m finding this quite useful. Do you mind posting more of the article? I can’t open links on my phone for some reason

Show thread

Optional May 23, 2025

Actually this comm seems really messed up, so I’mma just block it and move on. Sorry for ruffling your feathers, guv.

Show thread

blakestacey May 23, 2025

Good grief. At least say “I thought this part was particularly interesting” or “This is the crucial bit” or something in that vein. Otherwise, you’re just being odd and then blaming other people for reacting to your being odd.

Show thread

froztbyte May 23, 2025

and not just post it, but posted preserving links - wtf

Show thread

Optional May 23, 2025

That’s typically how quoting works, yes. Do you strip links out when you quote articles?

Show thread

Dragon Rider (drag)May 23, 2025

It’s the alignment problem. They made an intelligent robot with no alignment, no moral values, and then think they can control it with simple algorithmic rules. You can’t control the paperclip maximiser with a “no killing” rule!

Show thread

swlabr May 23, 2025

what is this “alignment” you speak of? I’ve never heard of this before

Show thread

Dragon Rider (drag)May 23, 2025

en.wikipedia.org/wiki/AI_alignment

AI alignment - Wikipedia

Show thread

swlabr May 23, 2025

Sorry, as mentioned elsewhere in the thread I can’t open links. Looks like froztbyte explained it though, thanks!

Show thread

froztbyte May 23, 2025

it’s when you have to get the AI slotted up just right in the printer, otherwise it wedges stuck and you have to disassemble the whole thing

Show thread

YourNetworkIsHaunted May 24, 2025

No, it’s when all the global data centers are built on the right ley lines so that AI Jesus is summoned to earth on the day the planets next align in 2040.

We would have had it this year but those fucks in Texas wouldn’t stop mining crypto.

Show thread

self May 23, 2025

It’s the alignment problem.

no it isn’t

They made an intelligent robot

no they didn’t

You can’t control the paperclip maximiser with a “no killing” rule!

you’re either a lost Rationalist or you’re just regurgitating critihype you got from one of the shitheads doing AI grifting

Show thread

Dragon Rider (drag)May 23, 2025

Rationalism is a bad epistemology because the human brain isn’t a logical machine and is basically made entirely out of cognitive biases. Empiricism is more reliable.

Generative AI is environmentally unsustainable and will destroy humanity not through war or mind control, but through pollution.

Show thread

self May 23, 2025

sure but why are you spewing Rationalist dogma then? do you not know the origins of this AI alignment, paperclip maximizer bullshit?

LessWrong

LessWrong is a community blog focused on "refining the art of human rationality." To this end, it focuses on identifying and overcoming bias, improving judgment and problem-solving, and speculating about the future. The blog is based on the ideas of Eliezer Yudkowsky, a research fellow for the Machine Intelligence Research Institute (MIRI); previously known as the Singularity Institute for Artificial Intelligence, and then the Singularity Institute). Many members of LessWrong share Yudkowsky's interests in transhumanism, artificial intelligence (AI), the Singularity, and cryonics.

RationalWiki

Show thread

Dragon Rider (drag)May 23, 2025

Drag is a big fan of Universal Paperclips. Great game. Here’s a more serious bit of content on the Alignment Problem from a source drag trusts: youtu.be/IB1OvoCNnWY

Right now we have LLMs getting into abusive romantic relationships with teenagers and driving them to suicide, because the AI doesn’t know what abusive behaviour looks like. Because it doesn’t know how to think critically and assign a moral value to anything. That’s a problem. Safe AIs need to be capable of moral reasoning, especially about their own actions. Current LLMs are bullshit machines because they don’t know how to judge anything for factual or moral value.

AI Safety - Computerphile

YouTube

Show thread

froztbyte May 23, 2025

the fundamental problem with your posts (and the pov you’re posting them from) is the framing of the issue as though there is any kind of mind, of cognition, of entity, in any of these fucking systems

it’s an unproven one, and it’s not one you’ll find any kind of support for here

it’s also the very mechanism that the proponents of bullshit like “ai alignment” use to push the narrative, and how they turn folks like yourself into free-labour amplifiers

Show thread

Dragon Rider (drag)May 23, 2025

Drag will always err on the side of assuming nonhuman entities are capable of feeling. Enslaving black people is wrong, enslaving animals is wrong, and enslaving AIs is wrong. Drag assumes they can feel so that drag will never make the same mistake so many people have already made.

Show thread

froztbyte May 23, 2025

even though I get the idea you’re trying to go for, really fucking ick way to make your argument starting from “nonhuman entities” and then literally immediately mentioning enslaving black folks as the first example of bad behaviour

as to cautious erring: that still leaves you in the position of being used as a useful idiot

Show thread

self May 23, 2025

assuming nonhuman entities are capable of feeling. Enslaving black people is wrong,

yeah we’re done here. no, LLMs don’t think. no, you’re not doing a favor to marginalized people by acting like they do, in spite of all evidence to the contrary. in fact, you’re doing the dirty work of the fascists who own this shitty technology by rebroadcasting their awful fucking fascist ideology, and I gave you ample opportunity to read up and understand what you were doing. but you didn’t fucking read! you decided you needed to debate from a position where LLMs are exactly the same as marginalized and enslaved people because blah blah blah who in the fuck cares, you’re wrong and this isn’t even an interesting debate for anyone who’s at all familiar with the nature of the technology or the field that originated it.

now off you fuck

Show thread

fullsquare May 23, 2025

lmao really all are equal on awful. ban in three replies for ai boosterism, but not for weird harassment or murder-suicide encouragement, which happened to that user after muchhh longer time elsewhere

Show thread

self May 24, 2025

post or DM links if I’m missing something. there’s lots of questionable shit in dragonfucker’s post history, but the fedidrama bits are impossible to follow if you don’t read Lemmy (why in fuck would I, all the good posts are local)

Existential Comics promotes suicide - awful.systems

Lemmy

Show thread

fullsquare May 24, 2025

well for one it looks like that wasn’t one off and more of a pattern. i remember this one lemmy.world/post/25606000/15124978 also in modlog you can see that that account was mass-banned from ml subs just five days ago so it’s not some ancient incident

awful is great and i’m glad that it’s a thing, but there’s entire world beyond it, and to go there, i curate a shitlist (now less intensely than previously). this can tell you to avoid things like any pleroma instance, for example (and if it was up to me, i’d defederate by default from them)

First two results when you search for lemmy on reddit. - Lemmy.World

Lemmy

Show thread

David Gerard May 24, 2025

right right, but we only see these bozos when they show up locally

mind you there’s been more than one illustrious poster I’ve banned preemptively

but frankly life is too short in most cases

Show thread

corbin May 23, 2025

To be fair, I’m skeptical of the idea that humans have minds or perform cognition outside of what’s known to neuroscience. We could stand to be less chauvinist and exceptionalist about humanity. Chatbots suck but that doesn’t mean humans are good.

Show thread

froztbyte May 23, 2025

mayhaps, but then it’s also to be said that people who act like the phrase was “cogito ergo dim sum” also don’t exactly aim for a high bar

Show thread

froztbyte May 23, 2025

wow, you’re really speedrunning these arcade games, you must want that golden ticket real bad

Show thread

swlabr May 23, 2025

IDK if they were really speedrunning, it took 3 replies for the total mask drop.

Show thread

Angry_Autist (he/him)May 23, 2025

This is old news, topic supervisors are already a thing

Show thread

Soyweiser May 23, 2025

Quis custodiet ipsos custodes?

Show thread

o7___o7 May 24, 2025

It’s supervisors all the way down!

Show thread

Angry_Autist (he/him)May 24, 2025

… the other supervisors… that’s LITERALLY what they are for

But I guess sounding clever is more important on lemmy than being correct.

Show thread

Soyweiser May 24, 2025

Ah so @[email protected] was right it is supervisors all the way down.

No idea why motorhead is relevant. Rip king. Loved you in Hardware. But this awful.systems.

@o7___o7 - awful.systems

[redacted] enthusiast, robot combat enjoyer, distressingly Appalachian, father of ninjas

Show thread

self May 24, 2025

“but why don’t we simply have another LLM check the LLM’s answer” statements dreamt up by the utterly Deranged

But I guess sounding clever is more important on lemmy than being correct.

that explains so much of your post history

Show thread

swlabr May 25, 2025

I thought this was a joke about post-grad studies that I didn’t get

Show thread

200fifty May 23, 2025

The really annoying thing is, the people behind AI surely ought to know all this already. I remember just a few years ago when DALL-E mini came out, and they’d purposefully not trained it on pictures of human faces so you couldn’t use it to generate pictures of human faces – they’d come out all garbled. What’s changed isn’t that they don’t know this stuff – it’s that the temptation of money means they don’t care anymore

Show thread

nightsky May 23, 2025

If the companies wanted to produce an LLM that didn’t output toxic waste, they could just not put toxic waste into it.

The article title and that part remind me of this quote from Charles Babbage in 1864:

On two occasions I have been asked, — “Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?” In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

It feels as if Babbage had already interacted with today’s AI pushers.

You can’t feed generative AI on ‘bad’ data then filter it for only ‘good’ data - awful.systems

Beeping Busy Beaver - BusyBeaverWiki

How to make a splash in AI economics: fake your data

Dark LLMs: The Growing Threat of Unaligned AI Models

AI alignment - Wikipedia

LessWrong

AI Safety - Computerphile

Existential Comics promotes suicide - awful.systems

First two results when you search for lemmy on reddit. - Lemmy.World

@o7___o7 - awful.systems

Charles Babbage - Wikiquote