Mastodon stampede. "Federation" now apparently means "DDoS yourself." Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops...
https://jwz.org/b/yj6w
Mastodon stampede

"Federation" now apparently means "DDoS yourself." Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding. The server is basically unusable for 30 to 60 seconds until the stampede of Mastodons slows down. Presumably each of those IPs is an instance, none of which ...

@jwz yeah I often get your 503 page if I click the link if it's been recently posted.
@jwz MariaDB/MySQL cope so bad under high load it's insane. Maybe having some sort of "staticizing" mechanism to snapshot the dynamic content and then serve it through nginx with some fine tuning would help? (compression? connection reuse? cache-instructing headers?)
@lucent Again, I don't really need you leaning over my desk and saying "You know what you OUGHTA do", thanks.
@jwz @lucent we're only doing this because it's so shocking. Like, you're a bit of a legend and we did not expect not only for these issues to knock your blog over, but for you to be so salty over it.
nah, the stress is understandable. @jwz copypasted the same message to all people that replied because everyone was pushing the same solution in different flavours.
the issue of fediverse instances pinging as soon as the post gets forwarded is legitimate, but hard to tackle (i.e. would you trust a pre-crawled preview coming from another server?).

@lucent right, so if he has a lot of followers, he's on a lot of home feeds so these are at least mostly legitimate. I know I clicked as soon as I saw it.

My problem isn't that he's surprised by a traffic spike, but that he's trying to make it Mastodon's problem. He should own his own setup and stop pretending like 1000 hits is some sort of DDoS. It's ok to say "I got knocked over, I need to consider caching" instead of "I got knocked over, fuck you for visiting" which is how this comes off.

@lucent Also MariaDB/MySQL default to an un-tuned state, so if he changed a few defaults he'd probably get another result.

Imagine a MariaDB getting knocked over by 1K queries/sec, that's a sad as shit MariaDB.

But he blocked me out of butthurt so 🤷 good luck have fun

You'll know too for sure that depends always on the "weight" of those queries. Resource "heavy" stuff like Wordpress getting hammered by bots exhausting your query pool is quite the bad experience.

Last website I knew it had to cope with massive peak traffic I just asked the other people working on it to have some sort of "static exporter" instead of having yet another WP instance so I could have very aggressive caching in place. I still saturated my port at the provider, thank god it's not 2010 anymore when bandwidth was metered 
I get your point of view, but as I said I get also being pissed at stuff crashing because of Fediverse software pinging you back as soon as your post lands on an instance. Fortunately nothing bad happened, I'm not hurt, I apologized for being the "yet another guy that posted the same solution".
Needless to say though, complaining to the bubble without bringing the issue up to either the mentioned software devs or to the W3C pushing for a standard to deal with this situation, obviously makes the whole rant void. Bigger websites or more aggressive setups would easily cope with the average Fedi requests.

@lucent yeah I mean who among us is above getting pissed off and blaming that stupid hyperactive microservice for our problems.

But as an Sr SRE, I don't have time for recriminations. I assess the situation and find ways *I* can move forward rather than shouting at the clouds. It's up to *me* to answer to why there's no backpressure strategy.

Anyway if you're not blocked, maybe mention the database thing. It's not caching, so maybe it'll be helpful. That's abnormally weak for MariaDB.

I mean, "mysqltuner.pl" is a search result away from any search engine and for sure points out solutions good enough to counter performance issues *that bad*, I think and hope that it wasn't ignored as a point of failure.
@lucent hope against hope, amiright? do they still do mysqlbouncer?
@alexhammy209 probably not. Anyways mysqltuner is still actively maintained (last commit 25 days ago) and supports MariaDB and Percona too, including their specific DB engines. Always worth a shot when fine tuning on a lazy day.
@lucent oh shoot a bit of googling confirms, mysqlproxy is ⚰️
@alexhammy209 @lucent you’re reading into it. Your assessment says more about you than him. “I don’t need anyone to tell me the solution thanks” is not the same as “fuck you for visiting”
@mitka not for commenting here, but for visiting jwz.com. I can definitely sympathize and understand being mad at people pointing out simple solutions I already knew about but neglected to implement. @lucent
@alexhammy209 @lucent ig what’s the diff between mastodon and 100 RSS feed readers? ig it’s just that it’s a thundering herd, rather than spaced out by random chance of polling..

@calebjasik We call it the "single throat to choke" principle. If you have to blame someone for your problems, it's best for it to be centralized.

@lucent

@calebjasik @alexhammy209 @lucent seems like a cluster of Masto instances fetch in a manner that a cluster of feed readers aren’t https://better.boston/@crschmidt/109412294646370820
Christopher Schmidt (@[email protected])

Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human). Every Mastodon instance fetches and stores their own local copy of my 750kb preview image. (I was inspired by to look by @[email protected]'s post: https://mastodon.social/@jwz/109411593248255294.)

Better Boston
@alexhammy209 @lucent no, it's saying "I only have 4k followers, rendering a preview and no other page views shouldn't even knock over a raspberry pi."
There's a clear fix though. The preview should be generated from the server that published it in about 5 sizes or so, and the ActivityPub server should serve it with the post metadata.
@lucent @jwz @alexhammy209 Large parts of Twitter run on MySQL, I doubt it cannot handle your load ;-)
yeah, that has been brought up already with all the "mysql-tuner" chat, some fine tuning would for sure help. doesn't change the fact that each Mastodon instance pings the link to crawl the preview and causes most of the spike ultimately having MariaDB to fail.
@lucent @jwz @alexhammy209 Isn’t federation built on (conditional) trust? Yes, trusting server A to properly represent the words of users on server A is slightly different from trusting server A to give you a preview of third party server B. But not THAT different.
Sorry for adding stress to the situation, just checked the various other replies reaching my instance and I've been yet-another-one adding to the pile of workarounds to an issue that should be tackled on the instances' side too 
@jwz User engagement is such a curse. But seriously, better caching might help?

@steve @jwz On the Mastodon side, since instances don’t share cache (they can’t, it’s not centralized), the best thing they could do is schedule the job to fetch data about a URL with a small random amount of delay.

On jwz’s side, request collapsing, rate limiting, or caching would solve this problem. Rate limiting is probably the easiest, because then the randomized backoff algorithms will take effect and delay appropriately.

@steve @jwz One could run a transparent proxy and share it among Mastodon instances, but this has abuse potential all around.

@jwz This is precisely why WebSub (aka: PubSubHubbub) was created. Polling sucks. Blog updates should be pushed, not polled.

We've been here before.

See: https://www.w3.org/TR/websub/

WebSub

WebSub provides a common mechanism for communication between publishers of any kind of Web content and their subscribers, based on HTTP web hooks. Subscription requests are relayed through hubs, which validate and verify the request. Hubs then distribute new and updated content to subscribers when it becomes available. WebSub was previously known as PubSubHubbub.

@bobwyman @jwz why does it only use symmetric HMAC tags? A node which fetches HMAC tagged data from the original server can only verify it locally, it can't prove authenticity to other nodes which it may relay the cached data to, they instead have to trust the relaying node.
@jwz are you suggesting the load came from Mastodon servers crawling your syndicated post?
@ronaldwidha @jwz The load came because every Mastodon server crawls the content for OpenGraph metadata (to provide a preview card) 0-60 seconds after the post is created. And since the post is pushed to all Mastodon servers "as quickly as it can be", they all do the crawl themselves.
@crschmidt @jwz that makes sense. maybe a good excuse to put some caching in front. CDNs like cloudfront is very affordable
@ronaldwidha @crschmidt aaaaannnd *plonk*
@jwz have you considered being good at computers instead of bad at them? I know you’ve been doing this stuff since I was in diapers, but I know some words about things, and maybe if I recite them, it will help you be better at computers! /s
@jwz yes! this seems built into the spec! and there's a notion of a `shared inbox` but it's not clear that it's useful for fixing this
@jwz everyone posts graphs about the "number of mastodon users" but it'd be interesting to see graphs of "the number of cross-instance follows" over time

@arthegall @jwz the shared inbox is for a single instance.

The issue is that when jwz writes something, the post is propagated across the network. Each instance will fetch OpenGraph / other metadata from the links in a post.

ActivityPub is RDF and thus this metadata _could_ be looked up by the source server and transmitted once… but you have to trust the source server.

@arthegall @jwz the naive approach is simply for every instance to do the lookup themselves. This is what Mastodon does and is the issue. But it does this with no delay and because the network is pretty fast, it results in a spike.

Some are suggesting to use WebPub/PubSubHubbub/etc. but it’s not really the solution - the posts are already being pushed and not polled. It’s the OpenGraph metadata that’s not pushed.

@arthegall @jwz So given that, I’d say that Mastodon should do the easy first step and schedule these jobs to run randomly 500-5000ms in the future rather than ASAP (this may not be possible in vanilla Sidekiq but it is definitely possible). Then consider passing along OpenGraph metadata in the <a/> so lookups don’t always need to occur.
@ZiggyTheHamster @arthegall You are already trusting the source server to tell you that this was my post and what URL was in it.
@jwz So, you want us to unfollow you? Won't happen. You not boring.
@jwz Interesting. Do you have an estimate of how many of those might be "organic" hits: actual people following the link in the toot?
@Kazinator Zero. He's talking about the Sidekiq job in Mastodon which fetches the URL to (potentially) create a preview card.
@nemobis I see. Does that usefully respect robots.txt? If not, it could be blocked at the request level. Easily so if it announces itself as a particuliar agent type.
@Kazinator @nemobis Blocking the traffic would make the post broken in all those Mastodon instances. The post preview would be missing.

@rcade @nemobis

That hardly qualifies as "broken". The link would be there right?

Most hyperlinks on the internet do not come with previews. Is the exception rather than the rule.

@Kazinator @nemobis Preview cards with images make links look better and get far more clicks.

@rcade @nemobis

Even if that is true (where are the numbers to prove it), it seems to me that since toots can include images, you can have that cake and eat it too: you can attach your own screenshot of the page to accompany the link, and block the preview requests.

The image travels through the fediverse without generating redundant hits on your server.

Proof of concept:

Here is hackernews: https://news.ycombinator.com

Hacker News

@Kazinator @nemobis I don't know anything about Mastodon's design yet. Couldn't the Twitter card data be pulled in to the post when it is published to avoid the need to manually add an image?

@rcade @Kazinator @nemobis I don't think "Click through rate" is something we should honestly be caring about.

This isn't an advertising network.

@ubergeek @Kazinator @nemobis You're arguing that Mastodon should not support link previews. It does, so the debate is whether it is supporting them sustainably.

@rcade @Kazinator @nemobis Hardly.

What I said is that "Click through rate" isn't even a metric we should consider.

Again, this isn't an advertising network, and we don't care about CTR or total impressions. We care about socializing with humans.

My suggestion elsewhere in this thread is that link previews should be an optional toggle in the admin UI, to disable it instance-wide, disabled by default. And/or, users should also have that toggle, for their experiences, defaulted to off.

Reason being? Several. One is a the huge traffic surge we're causing to other network citizens. The other is... frankly, goatse, or other gore/shock porn. Shit, even making it so people who may have content sensitivities wouldn't get blasted by things they didn't ask to see.

Bottom line: You metric you suggested above is a bad one that should have no place in discussing features present in social media software.

@ubergeek @Kazinator @nemobis It isn't contrary to the goals of Mastodon to want people to click the links in your posts.

That's one of the only two reasons everybody undertook the hassle of implementing Twitter-style preview cards on their websites.

The other is to give users more information before they decide whether to click the link.

Giving instances and users more control on whether to show link previews is a good idea.

@rcade Yes, many sites undertook the effort to do preview cards, because of capitalist drives for economic return.

There's very little social value in preview cards, really. Sure, the argument could be made "Decide on clicking the link", but most of that decision should be made because of who is providing the link to you, not because of what a preview card shows you.

Ie, gargron shares a link about new Masto changes, I don't need a preview card to give me info: I have if from the person sharing the link. Same with the local antifascist groups. They provided the link, I trust them, and I don't need a preview card to tell me more, truth be told.

Random person sharing link? I probably will start to consider the sources, or open in an incognito window. I don't trust the preview card, anyways, and at worst, it's preloading data in my browser from a potentially malicious site looking to track people.

@Kazinator @jwz In the first 60 seconds, from skimming the logs, it's very obvious that it's almost none of them. For my account (658 followers), it was 41 servers, each making two requests (one for the page, one for the opengraph preview image), triggering 82 requests... and one human.

And of course, this happens if _anyone_ posts one of my links.

@jwz CDN?

@eludom Out of curiosity, did you think to yourself:

"I know this guy's been running a blog for decades, but if I post these three letters, a lightbulb is going to go on. The first thing that popped into my head is not only going to be the solution, but I'm the first one who thought of it. I'm one who fixed the problem. Go me."

Is that how you thought this would go?