“CSAM generated by AI is still CSAM,” DOJ says after rare arrest
“CSAM generated by AI is still CSAM,” DOJ says after rare arrest
Check my other comments. My thought was compared to a hammer.
Hammers aren’t trained to act or respond on their own from millions of user inputs.
What it’s able and intended to do is besides the point, if it’s also capable of generating inappropriate material.
Let me spell it more clearly. AI wouldn’t know what a pussy looked like if it was never exposed to that sort of data set. It wouldn’t know other inappropriate things if it wasn’t exposed to that data set either.
Do you see where I’m going with this? AI only knows what people allow it to learn…
You realize that there are perfectly legal photographs of female genitals out there? I've heard it's actually a rather popular photography subject on the Internet.
Do you see where I'm going with this? AI only knows what people allow it to learn...
Yes, but the point here is that the AI doesn't need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it's capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.
As others have analogized in this thread, if you murder someone with a hammer that doesn't make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It's how you used it that is illegal.
Yes, I get all that, duh. Did you read the original post title? CSAM?
I thought you could catch a clue when I said inappropriate.
Yes. You're saying that the AI trainers must have had CSAM in their training data in order to produce an AI that is able to generate CSAM. That's simply not the case.
You also implied earlier on that these AIs "act or respond on their own", which is also not true. They only generate images when prompted to by a user.
The fact that an AI is able to generate inappropriate material just means it's a versatile tool.
Alright, well let’s play an innocent hypothetical here.
Let’s pretend you only know some magic word model (doesn’t exist without thousands or millions of images by the way).
But anyways, let’s say you’re the AI. Now, with no vision of the world, what would you, as an AI, say if I asked you about how crescent wrenches and channel locks reproduced?
Now try the same hypothetical question again. This time, you actually have a genuine set of images of clean new tools, plus information that tools can’t reproduce.
And now let’s go to the modern day. Where AI has zillions of images of rusty redneck toolboxes, and a bunch of janky dialogue…
After all that, then where do crowbars come from?
AI is just as dumb as the people using it.
The AI had CSAM in its training model:
An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
I just want to clarity that you’ve bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner… but it can never create anything novel.
AI hasn’t exactly kicked out a Picasso with a naked young girl missing an ear yet has it?
I sure hope not!
But if it can, then that seriously indicates it must have some bad training data in the system…
I won’t be testing these hypotheses.
…no
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Yes. Because they did
Sadly that’s what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.
AI image generation shouldn’t be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.
Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
Not exactly. This would be more akin to a company that will 3D printer metal parts and assemble them for you. You use this service and have them create and assemble a gun for you. Then you use that weapon in a violent crime. Should the company have known better that you were having them create an illegal weapon on your behalf?
A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used to to generate CSAM).
That has nothing to do with the developer of the AI, and everything to do with the person using it.
I stand by my analogy.
Reading that article:
Given it’s public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn’t consider that their fault either.
I think it’s reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn’t be on users (here meaning the devs of Stable Diffusion) of that data to ensure there’s no illegal content within the billions of images in a public dataset.
That’s a different story now that users have been informed of the content within this particular data, but I don’t think it should have been assumed to be their responsibility from the beginning.
There’s CSAM in the training set[1] used for these models so some elephants have been murdered to make this piano.
So at best we don’t know whether or not AI CSAM without CSAM training data is possible. “This AI used CSAM training data” is not an answer to that question. It is even less of an answer to the question “Should AI generated CSAM be illegal?” Just like “elephants get killed for their ivory” is not an answer to “should pianos be illegal?”
If your argument is that yes, all AI CSAM should be illegal whether or not the training used real CSAM, then argue that point. Whether or not any specific AI used CSAM to train is an irrelevant non sequitur. A lot of what you’re doing now is replying to “pencils should not be illegal just because some people write bad stuff” with the equivalent of “this one guy did some bad stuff before writing it down”. That is completely unrelated to the argument being made.
Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
That's not how generative AI works. It's capable of creating images that include novel elements that weren't in the training set.
Go ahead and ask one to generate a bonkers image description that doesn't exist in its training data and there's a good chance it'll be able to make one for you. The classic example is an "avocado chair", which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
So where’s the blame go?
First, you need to figure out exactly what it is that the "blame" is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there's no blame to begin with.
If the problem is possession of CSAM, then that's on the guy who generated them since they didn't exist at any point before then. The trainers wouldn't have needed to have any of that in the training set so if you want to blame them you're going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn't prove anything.
If the problem is the creation of CSAM, then again, it's the guy who generated them.
If it's the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
You obviously don’t understand squat about AI.
AI only knows what has gone through it’s training data, both from the developers and the end users.
Hell, back in 2003 I wrote an adaptive AI for optical character recognition (OCR). I designed it for English, but also with a crude ability to learn.
I could have taught that thing hieroglyphics if I wanted to. But AI will never generate things that it’s never seen before.
Funny that AI has an easier time rendering inappropriate material than it does human hands…
You obviously don't understand squat about AI.
Ha.
AI only knows what has gone through it's training data, both from the developers and the end users.
Yes, and as I've said repeatedly, it's able to synthesize novel images from the things it has learned.
If you train an AI with pictures of green cars and pictures of red apples, it'll be able to figure out how to generate images of red cars and green apples for you.
The only example I can think of with what you said is just a couple brief innocent scenes from The Blue Lagoon.
Short of that, I don’t know (nor care for any references to) any other legal public images or video of anything as such.
I dunno, I’m just bumfuzzled how AI, whether public or private, could have sufficient information to generate such things these days.
Do a Google Image search for "child" or "teenager" or other such innocent terms, you'll find plenty of such.
I think you're underestimating just how well AI is able to learn basic concepts from images. A lot of people imagine these AIs as being some sort of collage machine that pastes together little chunks of existing images, but that's not what's going on under the hood of modern generative art AIs. They learn the underlying concepts and characteristics of what things are, and are able to remix them conceptually.
And conceptually, if I had never seen my cousin in the nude, I’d never know what young people look naked.
No that’s not a concept, that’s a fact. AI has seen inappropriate things, and it doesn’t fully know the difference.
You can’t blame the AI itself, but you can and should blame any and all users that have knowingly fed it bad data.
I don't believe if you're fully arguing in good faith here.
I'm assuming you've seen a naked adult, and if you had never seen a naked young person, I don't believe for one second you would be unable to infer what a naked young person might look like. You might not know for certain, but your best guess would likely be very accurate.
Generative AI can absolutely make those same inferences, so it does not need inappropriate training material for it to generate it.
The AI knows what a young person looks like.
It knows what a clothed adult looks like.
It knows what an unclothed adult looks like.
An AI trained on 100% legal material could make that inappropriate inference without even trying.
Now, have all the popular AI models actually been trained on 100% legal material? I have no way of knowing that answer, but you're incorrect to assume that just because it can output inappropriate images, that absolutely 100% proves that data was also included in its training input.
Is an image of a child inappropriate? Fully clothed, nothing going on.
Is the image of an adult engaging in sexual activity inappropriate?
Based on those two concepts, it can generate inappropriate child sexual imagery.
You may have done OCR work a while ago, but that is not the same type of machine learning that goes into typical generative AI systems in the modern world. It very much seems as though you are profoundly misunderstanding how this technology operates if you think it can’t generate a novel combination of previously trained concepts without a prior example.