Yesterday my VPS set off a warning, as it was hit by a huge spike in incoming traffic, peaking at 55GB at 2:15pm and lasting for an hour.

Upon investigating, it turns out it was my PeerTube instance that was targeted.

Where did the traffic come from?

meta-externalagent (aka Meta's web crawler which is used to grab content to train its AI system).

I feel a little bit violated thinking my Fediverse promo video was grabbed by it, sigh.

#AIcritic #NoAI

@_elena Would you be able to use a user agent block list like ai.robots.txt? I have a cron job that updates it daily from their git repo and then restarts nginx.

Except I strip out the part that refers known agents to robots.txt and just give them a 403, because none of them ever honor the robots file anyway.

https://github.com/ai-robots-txt/ai.robots.txt

GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

GitHub