@announcements

Blocking Aggressive Crawlers

I just updated the blocklist for the Mastodon and BookWyrm instances. The logs showed meta-webindexer getting stuck in ridiculous infinite /rss/rss/ loops — a total waste of server resources pushing BookWyrm to 100% CPU utilization.

Additional Bans:

Meta-Webindexer/ExternalAgent: Recursive scraping gone rogue.

ClaudeBot: Keeping local content out of AI training sets.

Semrush & SERanking: Commercial SEO bots have no business here.

If you’re self-hosting and notice weird CPU spikes or odd path patterns, I highly recommend auditing your User Agents. It’s an easy way to protect your performance and your users' privacy. 🛠️

#SelfHosted #MastodonAdmin #BookWyrm #Nginx #Privacy #SysAdmin

@Moritz @announcements are you blocking in nginx?

@nwcs @announcements

Yes, exactly

@Moritz @announcements

```
# ai bots
if ($http_user_agent ~* ".*(ChatGPT|ChatGPT-User|openai|OAI-SearchBot|Google-Extended|GPTBot|ClaudeBot|Claude-Search|meta-externalagent|PerplexityBot|anthropic|TerraCotta).*") {
return 404;
}
```

@nwcs @announcements

Nice, thank you. That list is more comprehensive than what I have so far.

@Moritz @announcements yea i build it up from some of the sites that I run

@Moritz @announcements

".*(RecordedFuture|lua-resty-http|AhrefsBot|SemrushBot|BLEXBot|DotBot|DataForSeoBot|Engine|IonCrawl|InternetMeasurement|Barkrowler|Cloud|masscan|Detection|l9scan|l9tcpid|l9explore|tchelebi|EdgeWatch|ltx71|AwarioSmartBot|Pandalytics|Scrapy|t3versionsBot|CensysInspect|Python|python-httpx|Python-urllib|Java|Go-http-client|wpbot|Odin|Custom-AsyncHttpClient|Download\ Demon|Cortex-Xpanse|zgrab|AliyunSecBot|node|2ip\ bot|okhttp|CCBot|ALittle\ Client|Wappalyzer|axios).*"

@Moritz @announcements ymmv, some that I use on my end
@Moritz @announcements single user instance here had the same bots hammering away so i added a redirect to my ai tarpit server running https://iocaine.madhouse-project.org/
iocaine - the deadliest poison known to AI

@anne

Cool, I need to try that, thanks for sharing I've noticed the crawlers already backing off after seeing 4xx for a few minutes.