I keep seeing webmasters talking about how to block AI scrapers (through user agents and IP blocks) and not enough webmasters talking about the far better option of rigging their site to return complete gibberish or transgender werewolf erotica* when AI scrapers are detected.

*depending on which one you think is funnier to poison the AI models with

@foone Maybe push My Immortal + Eye of Argon through the dissociated-press algo (50% weight on each).

I wonder if I can make this static site compatible. Perhaps for each post I pre-generate a slop version that sits next to the real post, and I use an .htaccess file to pull from the slop file instead of the real file when the useragent matches?

@foone I decided to just serve AI scrapers a Markov-mangled version of my own blog posts.

I love the idea of poisoning them with specific topics, but honestly, the output of the Dissociated Press algorithm is probably the most effective possible poison I can make!

Technical deets: https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scrapers/

Poisoning AI scrapers | Brain on Fire

@varx @foone You might be interested in a small project I hacked together recently, “Poison the WeLLMs”.

https://codeberg.org/MikeCoats/poison-the-wellms

I’ve stuck some examples of its output in my blog here,

https://mikecoats.com/poison-the-wellms/

poison-the-wellms

A reverse-proxy that serves diassociated-press style reimaginings of your upstream pages, poisoning any LLMs that scrape your content.

Codeberg.org
@mike Nice! I like the idea of stripping out the more complex HTML. My poisoned posts tend to end up with a lot of unclosed tags. :-)