Should you be wondering why @LWN #LWN is occasionally sluggish... since the new year, the DDOS onslaughts from AI-scraper bots has picked up considerably. Only a small fraction of our traffic is serving actual human readers at this point. At times, some bot decides to hit us from hundreds of IP addresses at once, clogging the works. They don't identify themselves as bots, and robots.txt is the only thing they *don't* read off the site.

This is beyond unsustainable. We are going to have to put time into deploying some sort of active defenses just to keep the site online. I think I'd even rather be writing about accounting systems than dealing with this crap. And it's not just us, of course; this behavior is going to wreck the net even more than it's already wrecked.

Happy new year :)

@corbet @LWN

"Any kind of active defense is going to have to figure out how to block subnets rather than individual addresses, and even that may not do the trick. "

if you're using iptables, ipset can block individual ips (hash:ip), and subnets (hash:net).

Just set it up last night for my much-smaller-traffic instances, feel free to DM

https://ipset.netfilter.org/

IP sets

ipset

@adelie @LWN Blocking a subnet is not hard; the harder part is figuring out *which* subnets without just blocking huge parts of the net as a whole.
@corbet @adelie @LWN I have been using pyasn to block entire subnets. It's effective, but only in the same way carpet bombing is. I'm sure I've blocked legitimate systems, but c'est la vie.

@corbet @LWN

Probably a good question for the fedi as a whole. I started with any 40x response in my logs, added any spamhaus hits from my mail server, and any user-agents with "bot" in the name. Plus facebook in particular has huge ipv4 blocks just for scraping, also easy to block.

ASRG (@[email protected])

Attached: 1 image ## **Sabot in the Age of AI** A list of offensive methods & strategic approaches for facilitating (algorithmic) sabotage, framework disruption, & intentional data poisoning. ### **Selected Tools & Frameworks** - **Nepenthes** — [Endless crawler trap.](https://zadzmo.org/code/nepenthes) - **Babble** — [Standalone LLM crawler tarpit.](https://git.jsbarretto.com/zesterer/babble) - **Markov Tarpit** — [Traps AI bots & feeds them useless data.](https://git.rys.io/libre/markov-tarpit) - **Sarracenia** — [Loops bots into fake pages.](https://github.com/CTAG07/Sarracenia) - **Antlion** — [Express.js middleware for infinite sinkholes.](https://github.com/shsiena/antlion) - **Infinite Slop** — [Garbage web page generator.](https://code.blicky.net/yorhel/infinite-slop) - **Poison the WeLLMs** — [Reverse proxy for LLM confusion.](https://codeberg.org/MikeCoats/poison-the-wellms) - **Marko** — [Dissociated Press CLI/lib.](https://codeberg.org/timmc/marko/) - **django-llm-poison** — [Serves poisoned content to crawlers.](https://github.com/Fingel/django-llm-poison) - **konterfAI** — [Model-poisoner for LLMs.](https://codeberg.org/konterfai/konterfai) - **Quixotic** — [Static site LLM confuser.](https://marcusb.org/hacks/quixotic.html) - **toxicAInt** — [Replaces text with slop.](https://github.com/portasynthinca3/toxicaint) - **Iocaine** — [Defense against unwanted scrapers.](https://iocaine.madhouse-project.org) - **Caddy Defender** — [Blocks bots & pollutes training data.](https://defender.jasoncameron.dev) - **GzipChunk** — [Inserts compressed junk into live gzip streams.](https://github.com/gw1urf/gzipchunk) - **Chunchunmaru** — [Go-based web scraper tarpit.](https://github.com/BrandenStoberReal/Chunchunmaru) - **IED** — [ZIP bombs for web scrapers.](https://github.com/NateChoe1/ied) - **FakeJPEG** — [Endless fake JPEGs.](https://github.com/gw1urf/fakejpeg) - **Pyison** — [AI crawler tarpit.](https://github.com/JonasLong/Pyison) - **HalluciGen** — [WP plugin that scrambles content.](https://codeberg.org/emergentdigitalmedia/HalluciGen) - **Spigot** — [Hierarchical Markov page generator.](https://github.com/gw1urf/spigot) --- *This is a living resource — regularly updated to reflect the shifting terrain of collective techno-disobedience and algorithmic Luddism.*

tldr.nettime

@corbet @LWN You know, what we need is a clearinghouse for this like there are for piholes and porn and such. Could someone with some followers get #AIblacklist trending?

Post your subnets with that hashtag. If we get any traction, I'll host the list.