I'm very late to comment on this, but holy shit I'm shocked at Cloudflare's recent postmortem.

First off, the finger-pointy tone of the post doesn't sit well with me. While their provider clearly made mistakes in terms of communications, you have to own your own availability in the way you engineer around your providers' limitations and mistakes.

1/2

Second, I would expect Cloudflare to do a MUCH better job insulating(?) their architectural decisions from assumptions about the underlying DC infra and power grid.

Finally, for the love of glob, why the FUCK would you locate three datacenters within such a tight geographic radius AND WITHIN A SUBDUCTION ZONE and claim this prevents you from natural disasters AND meets the definition of HA???

2/2

@obfuscurity "Unfortunately, we discovered that a subset of services that were supposed to be on the high availability cluster had dependencies on services exclusively running in PDX-04."
@obfuscurity the power stuff sucked, but that's what they should be focusing on.
@mattray I'm genuinely surprised at how much detail was shared that reveals just how bad they are at engineering HA infra.