Tired of postmortems that fix nothing? Learn how to run blame-free, human-centered incident reviews that drive real change in this talk by Agata Skorupka at DevOpsDays London. This talk shares practical techniques to foster psychological safety, improve collaboration, and move beyond ineffective root cause analysis. Build stronger teams through postmortems that actually make a difference.

#Postmortems #DevOps #Leadership

Program: https://devopsdays.org/events/2025-london/program/agata-skorupka

Tickets: https://ti.to/devopsdays-london/2025

The neverending griefing discussion

I've been doing MMOs and online worlds a long time. And that means that I've written and said a lot of things on the Internet over the years, about designing them.

One of the funny things about reactions to the various vision blogs for Stars Reach is the number of people who have

https://www.raphkoster.com/2024/08/07/the-neverending-griefing-discussion/

#GameTalk #Gamemaking #GameDesign #griefing #postmortems #StarsReach #starsreach #VwDesign

The neverending griefing discussion

I’ve been doing MMOs and online worlds a long time. And that means that I’ve written and said a lot of things on the Internet over the years, about designing them. One of the funny thin…

Raph's Website
Later this month, we'll have the recording of our second episode of #OfficeOfTheITGuy. I am seeking a seasoned guest to talk about incidents and #postMortems. Extra credit if you have something to showcase on the show. #IT #DevOps #Podcast
Just 1 #primary in and I'm already 🤮 of the #PostMortems...

@thisismissem @sgf "blame" is not the same thing as "assigning responsibility".

A good red flag for this is teams that say "We do #blameless #postmortems by not naming anyone in the postmortem".

No! You know you have a blameless postmortem culture when you *can* name people in postmortems without it causing problems.

This can be exceptionally hard to achieve, but it's worth it.

Edit: see also @danslimmon https://blog.danslimmon.com/2023/04/20/its-fine-to-use-names-in-post-mortems/

It’s fine to use names in post-mortems

The purpose of the blameless post-mortem is not to make everyone feel comfortable. Discomfort can be healthy and useful. The purpose of the blameless post-mortem is to let us find explanations deep…

Dan Slimmon

heard that Twitter is DDoSing itself (?)

This is a good opportunity to announce I specialize in software perf & scalability. Reducing hosting costs. And parachuting in to solve hard bugs or otherwise "rescue" sites or projects farked up by a prior approach

as a paid consultant

#DDoS
#Twitter
#performance
#scalability
#scaling
#tuning
#CostReduction
#ResourceMinimization
#troubleshooting
#rescues
#rewrites
#systems
#RootCauseAnalysis
#regressions
#postmortems
#architecture
#efficiency
#SRE

It’s fine to use names in post-mortems

The purpose of the blameless post-mortem is not to make everyone feel comfortable. Discomfort can be healthy and useful. The purpose of the blameless post-mortem is to let us find explanations deep…

Dan Slimmon

"Eventually this customer has had enough. They leave. This represents both a sizable blow to revenue and a scathing indictment of your product’s reliability at scale. But, on the bright side, both MTTR and MTBF benefit enormously! That’ll look great on the quarterly slide deck." (~700w)

https://blog.danslimmon.com/2023/04/04/incident-metrics-tell-you-nothing-about-reliability/ #sre #devops #incidentresponse #postmortems

Incident metrics tell you nothing about reliability

When an incident response process is created, there arise many voices calling for measurement. “As long as we’re creating standards for incidents, let’s track Mean-Time-To-Recover…

Dan Slimmon

We do that anyway after incidents with #postmortems, but good time to reflect on procedures that we typically do.

Can absolutely recommend this practice, it also is a great time for the team to share past #incident stories with each other...
[4/6]

There's still value in low-technical postmortems.

What made this incident low impact? has your team implemented various safety nets to reduce harmful effects?

How did you know that a rollback was the right thing to do?
Could you have implemented a fix-forward instead?

Who else did you need to involve? or were you able to fully execute the incident and any runbooks by yourself without disrupting anyone else?

#Incidents #IncidentResponse #IncidentManagement #Postmortems #ICM #IRM