So, what happened was that one of the web servers (host 2) was struggling and I decided to upgrade it like I did yesterday for (host 1): https://mastodon.social/@mastohost/109292313550726464
Yesterday it was really smooth and the upgrade to the cloud instance only caused 90 seconds of downtime.
Today, everything went wrong:
- the network interface for the private IP changed
- when changing the configuration to the new network interface and doing a network restart, it just stopped responding
So, I was on the phone debugging the problem with OVH and we were able to bring the instance back online.
This caused the service to be partially down (because host 1 was still running for anyone that was using that DNS configuration) for a little over an hour.
Really sorry about the trouble this might have caused.
Traffic grew 10x in one week and it was impossible for me to predict (or afford) to scale for this before it happened.
Hope you understand.
Just published my latest newsletter which includes a quick write up on #mastodon and great content by @alexelcu @[email protected] @kev and others! Enjoy and I appreciate you all subscribing to my newsletter! Thanks all. https://notes.softinio.com/p/softinios-notes-on-software-engineering?sd=pf
@mastohost Totally understand. The level of downtime we've had is still almost none over the several years we've been on masto.host.
With the week we've had on the network I'd say you're doing great!
Keep it up!
(pun intended 😉 )
@mastohost Hey, with the explosion you're seeing, I'm amazed things have gone as smoothly as they have. Don't stress it too much.
Heck, early Twitter was down way more than this.