I'm very late to comment on this, but holy shit I'm shocked at Cloudflare's recent postmortem.

First off, the finger-pointy tone of the post doesn't sit well with me. While their provider clearly made mistakes in terms of communications, you have to own your own availability in the way you engineer around your providers' limitations and mistakes.

1/2

Second, I would expect Cloudflare to do a MUCH better job insulating(?) their architectural decisions from assumptions about the underlying DC infra and power grid.

Finally, for the love of glob, why the FUCK would you locate three datacenters within such a tight geographic radius AND WITHIN A SUBDUCTION ZONE and claim this prevents you from natural disasters AND meets the definition of HA???

2/2

@obfuscurity

It fits a pattern I've seen many times. (Not saying this is what happened, just that it fits.)

The initial engineering is done by someone competent, who understands the overview.

Bits and pieces get push down to less knowledgeable engineers. The competent folks leave or get promoted.

You're left with folks lack vision of the whole. They might be competent engineers, but without the overview they make bad decisions.

Sigh.

@mwl @obfuscurity exactly this, a thousand times this. I personally observed this pattern in six different national scale banks in previous jobs, and I'm watching it happen again in slow motion at my current job. (Only s/leave/are being laid off in descending order of seniority/g)