We ran hachyderm.io, which is ruby on rails + postgres, off of 32 vCPUs, 128GB of ram, and 2 failing hard drives RAID 10'd with international NFS and IOPS performance measuring in the *dozens*

And 45,000 active users

Had the storage been faster than a 1997 floppy disk, we could've cut compute and ram by 80% and not suffered that much at all

Y'all really don't understand how far a single laptop can really push things anymore cause we waste computers so hard

https://mastodon.social/@GeePawHill/112613403362243583

@hazelweakly the amount of compute available in your electronic toothbrush today would destroy a victorian schoolchild, etc

@atax1a @hazelweakly Tired: A monkey at a typewriter.

Wired: A victorian schoolchild with an abacus.

@hazelweakly @GeePawHill Current processors are so powerful that programmers look for tasks for idle cycles, such as prefetch, AI prediction, etc.

@gglockner @GeePawHill it's just wild when you think about it. "we're gonna do 300% of the work because, statistically, it'll be a very very tiny bit faster than doing 100% of the work and you wouldn't even notice the other 200% anyways"

And they're right, you don't notice

@hazelweakly one time at $bigtechco I worked on a service that had a custom-written event sourcing architecture deployed on k8s to a cluster of five very expensive cloud VMs. The userbase was 10 people who each used it maybe once every day. It was an internal app. It was slow as shit.
@ahelwer but a database request to another machine (that has the data in RAM already) is about as fast as a fetching the data from a spinning disc! Clearly, we just need to store everything in RAM on remote machines - that’s the lesson, right? @hazelweakly
@hazelweakly that storage setup! i thought "international" was autocorrect going wild at first then realized with horror what you meant
@hazelweakly I see some info in the slides from your SRECon '23 talk, but is it public what you all run on now, like what size of deployment?

@rf yes unfortunately international was not a typo 😅

We don't have a fully public writeup yet but mostly out of laziness than anything else. I'm happy to write something up soonish but from memory: vaguely about 64 cores and 256 gb of ram spread across 8 machines, and it could likely be solidly less at this point? It's been awhile since I looked and I bet I am way off somewhere!

@hazelweakly @rf The current performance improvement push is to get the end user closer to a CDN node (especially in South and East Asia) but the main mastodon stuff is generously provisioned.
@hazelweakly @mhoye Yes! A lot of CPU is held by IO wait states. Lower latency storage makes stateful apps faster. But the payoff isn’t just speed but also in freed up CPU. (Old school storage nerd here)

@bplein @mhoye oh absolutely! During the heydey of the hachyderm migration we once got a CPU clogged up to 1700%+ cpu load!

I have literally never seen that much before or since. It was WILD

@hazelweakly @mhoye The growth of hachyderm was incredible and to see the cool enterprise cloud approach was also very cool (I was admiring the work from afar). Great transparency, which helped smaller instances learn from the bumps along the way.

@hazelweakly since taking a break from working in big Tech, I am still doing something small consulting things on the side.

I'm actually seeing the over engineering thing happen even at the small business level. People ask about building some form of phone app or custom database. Then we start talking about data sizes in the hundreds or possibly thousands of rows. With a dozen users. 🙄

"You know, a shared spreadsheet can handle that many rows and users".

@hazelweakly

Not exactly misunderstanding the capacity of a single computer. But in this case, misunderstanding the capacity of a pretty common tool.

It feels like the same type of "brain worms" though.

@hazelweakly selfhosting #SharePoint Server doesn't seem that bad anymore 😅

@hazelweakly yeah.

I did use aws for my product.

It also runs on... A really tiny machine there and a bit of other things. None of the big crap. And honestly that made sense. I will defend that over the crappy vps. But also we are talking pennies for all the customers we could serve so eh.

@hazelweakly reminds me of the time we were sizing DNS revolvers for a large national ISP. "You could run this whole countries' DNS on a raspberry pi. Maybe two for redundancy".

@hazelweakly I spent way too much time trying to figure out what you typed that ended up being “international NFS.” What kind of nonsense word salad is this?

Then I realized that’s exactly what you typed. And then I bet it was small file or small random IO and it was probably NFSv3 so you didn’t even have compound operations. Throw in some TCP with exponential back off using a saturated/oversubscribed residential link. Good on you for getting dozens of IOPS. I likely have many wrong guesses.

@mgerdts the residential link wasn't oversubscribed somehow but the hard drives were broken as fuck hahaha

Right? Good on us for getting dozens of IOPS

I can't believe postgres kept up with things somehow. What a miracle of modern computing

@mgerdts you nailed the rest of the assumptions though :)

Oh! It was also on an experimental branch of zfs with some performance stuff not tuned correctly! And the NFS stuff also wasn't tuned for performance!

What a mess. But it was a hacking project in a playground, not a world class production instance. Whoops