A single point of failure triggered the Amazon outage affecting millions
A DNS manager in a single region of Amazon's sprawling network touched off a 16-hour debacle.
https://arstechnica.com/gadgets/2025/10/a-single-point-of-failure-triggered-the-amazon-outage-affecting-millions/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

@arstechnica

This is not some AI failure, nor the fault of some hapless manager.

This was a failure to recognize the potential for cascading error handling.

It's DNS, for sure. If i had a dime for every hour of DNS-related outage i ould be living a whole different life.

The issue is not that bad DNS data got pushed out. That is a recoverable situation.

The issue is that other automated tools were unable to recognize that reversion was actually recovery.

Anyway, good luck with vibe-coding