I keep seeing webmasters talking about how to block AI scrapers (through user agents and IP blocks) and not enough webmasters talking about the far better option of rigging their site to return complete gibberish or transgender werewolf erotica* when AI scrapers are detected.

*depending on which one you think is funnier to poison the AI models with

@foone i’ve been returning a 302 to a 10GB binary file from hetzner’s speedtest page, but honestly.. maybe I should?

all of my pages already contain prompt injection (in multiple places, even)

@domi @foone My approach uses the wonderful "100GB of zeros compressed into 10MB and served with transport compression headers" which usually makes most poorly-written bots fuck off in short order when they OOM...
@becomethewaifu @foone @domi This makes me wonder if Firefox actually has anything in consideration of that issue.
@lispi314 it does not. ask me how i know 🙃
@arisunz Disappointing but unsurprising.
@lispi314 i mean tbf neither does chromium
@arisunz I did somewhat expect that too.

@domi @foone I think I like the idea about returning data that's more likely to be incorporated in the training sets because it poisons the well and it's harder to detect than someone trying to punish scrapers with GBs of gibberish

P.S. is GBerish a thing? I feel like it should be a thing...

@nicr9 @foone

that’ll just teach scrapers to avoid your site.

cool! they can all gtfo

@domi @foone short term thinking! There's always going to be more scrapers who haven't learned their lesson.

If we can poison the well at scale, we can collapse the business model  

...

Who am I kidding?... I'm sure they'll have models selecting which data is "legit" and it will get better at detecting the "transgender werewolf erotica" over time... The stupidest of arms races

@domi @foone any idea how to do this with nginx?

@nathanu @foone

if ($http_user_agent ~ 'GPTBot|ChatGPT\-User|Google\-Extended|CCBot|PerplexityBot|anthropic\-ai|Claude\-Web|ClaudeBot|Amazonbot|FacebookBot|Applebot\-Extended|semrush|barkrowler|PetalBot|meta-externalagent|meta-externalfetcher|facebookexternalhit|facebookcatalog') { return 308 https://nbg1-speed.hetzner.com/10GB.bin; } if ($request_uri ~ 'wp-content|wp-login\.php|wp\-includes') { return 308 https://nbg1-speed.hetzner.com/10GB.bin; }

it’s a bit opinionated, the list of bots includes not only AI but also other BS. I include this into other files, inside server { }