Ugh, I might need to put part of my website behind something like Anubis after all 
@dressupgeekout i think we all will, eventually. Is it AI scraper bots? Far from a permanent solution, but so far I've gotten by with blocking user agents and IPs.
@gordoooo_z Yes, it's ClaudeBot et al. super-aggressively scraping my new cgit instance. The host runs NetBSD, so I've also been looking into blacklistd(8)

@dressupgeekout
If you haven't tried it yet, there are regularly updated blocklists for many webservers available here:
https://github.com/ai-robots-txt/ai.robots.txt

I have this on my webserver and I can see from the logs that the server is responding with a few thousand 403s every day to crawlers and bots, so it's helping a bit at least.

@gordoooo_z

GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

GitHub
@82mhz @gordoooo_z This looks to be a valuable resource. Thank you for sharing. What a hassle. I've never needed to take these kinds of measures before... I guess this will force me to get better at server administration, heh

@dressupgeekout
It's infuriating. And there is no guarantee that this will keep them out as they are actively working on finding ways around blocks. But at least we can make it a little harder. Anubis is good, but has the downside that it annoys the users too, and sometimes even makes the site inaccessible, which is too much collateral damage for me.

Anyway, good luck implementing it, I hope it helps a little!

@gordoooo_z

@82mhz Thank you for sharing ai.robots.txt with me, I think it's made a difference on my website!
@dressupgeekout
That's awesome, happy to hear it! 😊
@dressupgeekout according to at least one source, claudebot respects robots.txt. I'm not prepared to take one random source's word for it but it's simple enough to implement and find out, assuming you haven't already, that is?