I was reading this Tumblr post on how cars have changed over the decades.

There’s some perception that older cars, made with steel, are safer because they can crash and have little visible damage.

People who understand physics know that the intentional crumpling of modern cars sheds momentum to protect the squishy meat sack inside the car. A car looking *terrible* after a crash these days often means the people inside walked away (as long as they were wearing their seat belts).

So, when it comes to computers and networks, other than things like tar pits, how do we “shed momentum”? What are the technical means we can use to ensure the safety of human users, even when the system “crashes”? What kicks in when all of your preventative measures/network resilience ultimately fails?

And is anyone hiring people who like to think about these sorts of problems in about 7 months? :)

@TindrasGrove lots of topics shape different parts of this conversation. You can't prevent failure, so the focus is to prevent a service disruption. RTO and RPO for data. Error budgets for application reliability. Broadcast domains, STP, routing protocol design and ECMP for networking. There are different types of architectural considerations for infrastructure vs logical systems design. You really have to pick one piece and drill down, but can't focus only on that one area. Zoom out and everyone will tell you it's a disaster recovery discussion. But that's short-sighted. A better solution is to avoid disaster and design to prevent it = disaster avoidance. Move one step more and the topic widens to business continuity. This is really the starting point. You have to define what's critical, interdependencies, and how much loss you can sustain, then build the mitigation strategies. Not having this defined and a playbook for business continuity is exactly what should keep CxOs up at night.