Data poisoning: how artists are sabotaging AI to take revenge on image generators

https://lemmy.world/post/9701663

Data poisoning: how artists are sabotaging AI to take revenge on image generators - Lemmy.World

Data poisoning: how artists are sabotaging AI to take revenge on image generators::As AI developers indiscriminately suck up online content to train their models, artists are seeking ways to fight back.

Let’s see how long before someone figures out how to poison, so it returns NSFW Images
companies would stumble all over themselves to figure out how to get it to stop doing that before going live. source: they already are. see bing image generator appending “ethnically ambiguous” to every prompt it receives
The Nightshade poisoning attack claims that it can corrupt a Stable Diffusion in less than 100 samples. Probably not to NSFW level. How easy it is to manufacture those 100 samples is not mentioned in the abstract
Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model's ability to respond to individual prompts. We introduce Nightshade, an optimized prompt-specific poisoning attack where poison samples look visually identical to benign images with matching text prompts. Nightshade poison samples are also optimized for potency and can corrupt an Stable Diffusion SDXL prompt in <100 poison samples. Nightshade poison effects "bleed through" to related concepts, and multiple attacks can composed together in a single prompt. Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images. Finally, we propose the use of Nightshade and similar tools as a last defense for content creators against web scrapers that ignore opt-out/do-not-crawl directives, and discuss possible implications for model trainers and content creators.

arXiv.org

yeah the operative word in that sentence is “claims”

I’d love nothing more than to be wrong, but after seeing how quickly Glaze got defeated (not only did it make the images nauseating for a human to look at despite claiming to be invisible, not even 48 hours after launch there was a neural network trained to reverse its effects automatically with like 95% accuracy), suffice to say my hopes aren’t high.

You seem to have more knowledge on this than me, I just read the article 🙂