Mastodawn

Blocking Aggressive Crawlers

I just updated the blocklist for the Mastodon and BookWyrm instances. The logs showed meta-webindexer getting stuck in ridiculous infinite /rss/rss/ loops — a total waste of server resources pushing BookWyrm to 100% CPU utilization.

Additional Bans:

Meta-Webindexer/ExternalAgent: Recursive scraping gone rogue.

ClaudeBot: Keeping local content out of AI training sets.

Semrush & SERanking: Commercial SEO bots have no business here.

If you’re self-hosting and notice weird CPU spikes or odd path patterns, I highly recommend auditing your User Agents. It’s an easy way to protect your performance and your users' privacy. 🛠️

#SelfHosted #MastodonAdmin #BookWyrm #Nginx #Privacy #SysAdmin

@Moritz Bless you.

@Moritz @announcements are you blocking in nginx?

Yes, exactly

@Moritz @announcements

```
# ai bots
if ($http_user_agent ~* ".*(ChatGPT|ChatGPT-User|openai|OAI-SearchBot|Google-Extended|GPTBot|ClaudeBot|Claude-Search|meta-externalagent|PerplexityBot|anthropic|TerraCotta).*") {
return 404;
}
```

Nice, thank you. That list is more comprehensive than what I have so far.

Show thread

NWCS 3d ago

@Moritz @announcements yea i build it up from some of the sites that I run

Show thread

NWCS 3d ago

@Moritz @announcements

Show thread

NWCS 3d ago

@Moritz @announcements ymmv, some that I use on my end

Show thread

Anne Fokke Kootstra 3d ago

@Moritz @announcements single user instance here had the same bots hammering away so i added a redirect to my ai tarpit server running https://iocaine.madhouse-project.org/

iocaine - the deadliest poison known to AI

Cool, I need to try that, thanks for sharing I've noticed the crawlers already backing off after seeing 4xx for a few minutes.