Mastodawn

We ran hachyderm.io, which is ruby on rails + postgres, off of 32 vCPUs, 128GB of ram, and 2 failing hard drives RAID 10'd with international NFS and IOPS performance measuring in the *dozens*

And 45,000 active users

Had the storage been faster than a 1997 floppy disk, we could've cut compute and ram by 80% and not suffered that much at all

Y'all really don't understand how far a single laptop can really push things anymore cause we waste computers so hard

https://mastodon.social/@GeePawHill/112613403362243583

Show thread

mx alex tax1a - 2020 (5)Jun 14, 2024

@hazelweakly the amount of compute available in your electronic toothbrush today would destroy a victorian schoolchild, etc

Show thread

mhoye Jun 14, 2024

@atax1a @hazelweakly Tired: A monkey at a typewriter.

Wired: A victorian schoolchild with an abacus.

Show thread

Greg Glockner Jun 14, 2024

@hazelweakly @GeePawHill Current processors are so powerful that programmers look for tasks for idle cycles, such as prefetch, AI prediction, etc.

Show thread

Hazel Weakly Jun 14, 2024

@gglockner @GeePawHill it's just wild when you think about it. "we're gonna do 300% of the work because, statistically, it'll be a very very tiny bit faster than doing 100% of the work and you wouldn't even notice the other 200% anyways"

And they're right, you don't notice

Show thread

Andrew Helwer Jun 14, 2024

@hazelweakly one time at $bigtechco I worked on a service that had a custom-written event sourcing architecture deployed on k8s to a cluster of five very expensive cloud VMs. The userbase was 10 people who each used it maybe once every day. It was an internal app. It was slow as shit.

Show thread

0xC0DEC0DE07E9 Jun 14, 2024

@ahelwer but a database request to another machine (that has the data in RAM already) is about as fast as a fetching the data from a spinning disc! Clearly, we just need to store everything in RAM on remote machines - that’s the lesson, right? @hazelweakly

Show thread

rf Jun 15, 2024

@hazelweakly that storage setup! i thought "international" was autocorrect going wild at first then realized with horror what you meant

Show thread

rf Jun 15, 2024

@hazelweakly I see some info in the slides from your SRECon '23 talk, but is it public what you all run on now, like what size of deployment?

Show thread

Hazel Weakly Jun 15, 2024

@rf yes unfortunately international was not a typo 😅

We don't have a fully public writeup yet but mostly out of laziness than anything else. I'm happy to write something up soonish but from memory: vaguely about 64 cores and 256 gb of ram spread across 8 machines, and it could likely be solidly less at this point? It's been awhile since I looked and I bet I am way off somewhere!

Show thread

Eashwar Jun 15, 2024

@hazelweakly @rf The current performance improvement push is to get the end user closer to a CDN node (especially in South and East Asia) but the main mastodon stuff is generously provisioned.

Show thread

Bill Plein🌶Jun 15, 2024

@hazelweakly @mhoye Yes! A lot of CPU is held by IO wait states. Lower latency storage makes stateful apps faster. But the payoff isn’t just speed but also in freed up CPU. (Old school storage nerd here)

Show thread

Hazel Weakly Jun 15, 2024

@bplein @mhoye oh absolutely! During the heydey of the hachyderm migration we once got a CPU clogged up to 1700%+ cpu load!

I have literally never seen that much before or since. It was WILD

Show thread

Bill Plein🌶Jun 15, 2024

@hazelweakly @mhoye The growth of hachyderm was incredible and to see the cool enterprise cloud approach was also very cool (I was admiring the work from afar). Great transparency, which helped smaller instances learn from the bumps along the way.

Show thread

Gaëtan Perrault Jun 15, 2024

@hazelweakly since taking a break from working in big Tech, I am still doing something small consulting things on the side.

I'm actually seeing the over engineering thing happen even at the small business level. People ask about building some form of phone app or custom database. Then we start talking about data sizes in the hundreds or possibly thousands of rows. With a dozen users. 🙄

"You know, a shared spreadsheet can handle that many rows and users".

Show thread

Gaëtan Perrault Jun 15, 2024

@hazelweakly

Not exactly misunderstanding the capacity of a single computer. But in this case, misunderstanding the capacity of a pretty common tool.

It feels like the same type of "brain worms" though.

Show thread

Sass, David Jun 15, 2024

@hazelweakly selfhosting #SharePoint Server doesn't seem that bad anymore 😅

Show thread

Thomas Depierre Jun 15, 2024

@hazelweakly yeah.

I did use aws for my product.

It also runs on... A really tiny machine there and a bit of other things. None of the big crap. And honestly that made sense. I will defend that over the crappy vps. But also we are talking pennies for all the customers we could serve so eh.

Show thread

Pieter Lexis Jun 15, 2024

@hazelweakly reminds me of the time we were sizing DNS revolvers for a large national ISP. "You could run this whole countries' DNS on a raspberry pi. Maybe two for redundancy".

Show thread

Mike Gerdts Jun 15, 2024

@hazelweakly I spent way too much time trying to figure out what you typed that ended up being “international NFS.” What kind of nonsense word salad is this?

Then I realized that’s exactly what you typed. And then I bet it was small file or small random IO and it was probably NFSv3 so you didn’t even have compound operations. Throw in some TCP with exponential back off using a saturated/oversubscribed residential link. Good on you for getting dozens of IOPS. I likely have many wrong guesses.

Show thread

Hazel Weakly Jun 15, 2024

@mgerdts the residential link wasn't oversubscribed somehow but the hard drives were broken as fuck hahaha

Right? Good on us for getting dozens of IOPS

I can't believe postgres kept up with things somehow. What a miracle of modern computing

Show thread

Hazel Weakly Jun 15, 2024

@mgerdts you nailed the rest of the assumptions though :)

Oh! It was also on an experimental branch of zfs with some performance stuff not tuned correctly! And the NFS stuff also wasn't tuned for performance!

What a mess. But it was a hacking project in a playground, not a world class production instance. Whoops