Next afternoon talk: "Beyond Blanket Freezes: Enabling Safe Innovation During Critical Events at Netflix" by Prachi Jain and Sandhya Narayan, Netflix.
Next afternoon talk: "Beyond Blanket Freezes: Enabling Safe Innovation During Critical Events at Netflix" by Prachi Jain and Sandhya Narayan, Netflix.
Show of hands: "Who has been in a no-deploys-for-a-week code freeze before? Oh, all of you, excellent!"
Same problem occurs during every blanket freeze: people don't stop building, they just stop shipping. Changes pileup, risk builds, and critical patches that really ought to be pushed out are just sitting there.
Freezes don't remove risk, they just reschedule and concentrate it.
What if we tuned our controls to the real impact for each service:
1. blast radius
2. impact to customers
3. acceptable risk tradeoffs
This lets us tune tiered responses to the actual level of risk.
Very curious to know how they decide what tier each service belongs in. In my experience, everybody thinks their own service is tier 0.
They appear to have a set of automated tools that look at a variety of signals to determine if a change is safe to ship.
Who can bypass these freezes? A bypass decision is based on "event type + service tiering + risk signals + resilience data (canary/staggered rollouts)"