My entire website is 44MB in size. Most of this are images of course.

Yesterday 1,2GB of data was transferred from my webserver, including EVERY image several dozen times.

Either a lot of people discovered my blog last week and spent the whole day reading ALL of my posts, or there's some AI scraping going on again.

I'd hate to do it, but I'm seriously considering putting some Anubis or Cloudflare protection in front of it. Stuff like this really pisses me off...

@82mhz I think I saw your website recently on some mastodon post. The infoboxes in Fediverse software are downloaded by every participating server on their own (depending on followers, hashtags etc that can throw a wide net), which can amount to a lot of traffic.

I documented my Mastodon-stampede-optimization-with-a-side-of-AI-blockage at https://patrick.georgi-clan.de/posts/caching-mastodon-preview-card-responses/, although that still won't reduce the traffic, just the CPU overhead for creating the same response a thousand times...

For further optimization, the server would have to send fediverse servers an optimized response that only contains OpenGraph information (which is all they care about), which is more involved...

Caching Mastodon Preview Card Responses

I ran into two instances recently where people remarked that the Fediverse can be a bit of a Distributed Denial of Service attack: When posts link to an URL, some Fediverse software helpfully tries to collect some metadata from the page to show a preview card, like any modern social media software is supposed to do. The problem is that in the Fediverse, the post gets replicated to all servers that are supposed to receive the post, through subscriptions or reposts, and every single one of these servers will download the same file for the same data, usually within a very short period of time.

Personal Ramblings

@patrick
Hi, thanks for the suggestion! I heard this phenomenon being called the "Mastodon hug of death" 😄

I don't know why they don't just implement a random delay before fetching the metadata, that would immediately mitigate the problem of hundreds or thousands of instances hammering a server at the same time, but whatever.

This has not been an issue for me so far, I guess my webhoster has enough capacity and my account is quite small so there's not too many Mastodon instances showing up, I think it was less than 200 last time I checked.

Andre suggested blocking AI bots via .htaccess which is similar to what you're doing as far as I can tell:

https://fedi.jaenis.ch/@andre/statuses/01JY8TCDWF2PA4QFC0NQMF4CWH

@82mhz @patrick you know, I just thought about this phenomenon today. Wouldn't it be super easy to generate the link preview once on the origin server of the post and all other instances grab it from there or link directly to it? Somewhat similar to how pictures are handled.
It wouldn't be compatible with all servers right away, but it sounds like the obvious solution to me and shouldn't be too hard to implement.
@irgndsondepp @patrick
Could be easy I guess, but this problem has existed in Mastodon for years and nothing is being done to fix it, so I guess it's not much of a priority -.-