It had to happen, eventually. My AI crawler antagoniser, https://www.ty-penguin.org.uk/~auj/spigot/ has been seeing sustained traffic of between 300 and 500 thousand hits per hour. I've not been particularly bothered by that, but a couple of days ago, my provider, @bitfolk, sent me a bandwidth warning: I'm on track to hit 2TBytes of outbound bandwidth this month and end up paying for the excess.

So I've added firewalling - if more than 5% of machines in a /23 network hit spigot within an hour, then the entire network gets a temporary block until it completely stops hitting my server. Hopefully that will cut things back enough to avoid charges.

The thing that amazes me is that the list has already accumulated nearly 10,000 entries. Put another way, I'm already blocking 0.12% of the whole IPV4 address space because it's being used for web crawling.

An infinite maze of twisty little pages

Well that seems to have annoyed them.

Over the past few hours, request rates have been ramped up to nearly 900,000 hits per hour from nearly 700,000 distinct IP addresses. This is not including the many thousands that are firewalled off, but still trying their best. I'm turning page generation off for a bit while I ponder what to do next.

@pengfold have you tried linkmaze and quixotic?

https://marcusb.org/hacks/quixotic.html

Quixotic

Quixotic is a nonsense generator designed to help static website operators confuse and confound bots and content-stealing LLM scrapers.

@mxfraud spigot is the one I wrote which does a very similar job. It's been a fun project. The obvious solution, if I wanted to stop tinkering, would be to turn it off completely. Almost everything else on my site is static, and I doubt I'd even notice the load caused by a million requests an hour!

@pengfold I didn't realise, nice one!

I also have go-pot running, which instead of giving the attacker shitty pages faster, gives fake secrets slower: https://github.com/ryanolee/go-pot

I've seen people using crowdsec, and it seems to work well for them.
I looked at the config and didn't quite managed to get it running.
It does seem like it would help with the 700k unique IP problem tho.
https://github.com/crowdsecurity/crowdsec

GitHub - ryanolee/go-pot: A service for giving away secrets to bots ...Probably slightly too many!

A service for giving away secrets to bots ...Probably slightly too many! - ryanolee/go-pot

GitHub

@mxfraud its a fine balancing act. I want to poison their well by supplying garbage, but that does mean engaging with these abusive bots. Over the past 18 months, it's generally been relatively easy. But they're all engaging in increasingly DDoS-like behaviour and it's getting less easy to provide the garbage while maintaining service.

At least this is just a toy that I can turn off. I pity folks trying to cope with this sort of thing professionally. As it happens, I was doing exactly that until about 4 years ago, when I was given a chance at playing in a different field.

@pengfold mine seemed to have changed their algorythm and lost interest, so I get no more DDOS.
That or my ISP blocked it without telling me.

Since my server is here, I don't really pay for the bandwidth, so I was just so happy to waste their cpu cycle.

I guess the gap between hobby and professional (eg. Cloud) solution is wider than before, and I'm happier doing the hobby part than the professional part that's for sure.
I agree to be able to ignore it, is good. But pushing it to maximum nonsense until it breaks is just so liberating :)