Last talk of this conference is "The Power of Stories" by Lorin Hochstein, aka @norootcause

#srecon

What kind of muppet are you?

#srecon

Chaos muppet
50%
Order muppet
50%
Poll ended at .

When something negative, surprising, or unexpected (like an incident) happened, the human response is "How could this bad thing have happened????????"

Stories are important because they are a tool that humans have developed to make sense of senseless events.

#srecon

The incident prevention grand challenge is "How do you get the right information into the heads of the right people at the exact moment when they need it?"

#srecon

"Stories" stick in your head, which is helpful for getting information into the heads of people who need it.

#srecon

"Only a fool learns from his own mistakes, a wise man learns from the mistakes of others" -- unknown

We've also talked a lot about vicarious learning, e.g., learning from watching or listening to someone else.

Example: Shoulder-surfing while someone is looking up dashboards.

#srecon

"Stories" are good when you can't shoulder-surf.

#srecon

We're comparing "nursing" to "SRE" -- for example, a patient is deteriorating but it isn't showing up in their vitals yet. In engineering, this is like the system is in bad shape but no alerts have fired yet.

#srecon

Claim: good stories have two properties to be "useful" for some nebulous definition of "useful"

#srecon

1. The story needs to be anomalous -- something in the story needs to disrupt your mental model of the world. There needs to be a disconnect between your belief and reality. Example: "This should never happen!" which means "I never expected that to happen!"

#srecon

2. The story needs to be immutable (this is kindof a weird term to use): important details are preserved as the story gets passed on.

#srecon

We're discussing how this applies to the Therac-25 story.

The simple story is that "it's about race conditions", but the real story is a lot more complicated.

One example: the machine would frequently error and the errors were harmless, and in this case the error meant "the patient's dose is too high" but the operators didn't have that info.

#srecon

When it comes to incidents, there are different kinds or styles of stories you can tell:

1. "The horror story": we failed over and the problem followed us to the new region! 😱
2. "The morality tale": the engineer ignored the failing test and the bad code made it to production 😡

#srecon

The details of a story depend a lot on the perspective of the storyteller.

Example: the Challenger disaster. Feynman wrote an appendix to the "official" synopsis of the accident. He said this was a story about "management underestimating risks"

#srecon

Another story, told by Edward Tufte (of Visual Display of Quantitative Information fame) said this was about "poor information presentation and bad visualization".

Not surprising that an info-vis guy said it was an info-vis problem 🤣

#srecon

A third story told by Diane Vaughan said that challenger was a story about normalization of deviance.

If you tuned a noisy alert, that is normalization of deviance: if the alert is firing, but the system is healthy, we make the alert less noisy! We do this all the time.

#srecon

"So what, Lorin, you told us all this stuff, what am I supposed to do with it?"

One take-away: when you do your incident writeups, tell it as a story.

#srecon

(Is this a good time to plug ACRL's one public postmortem? 🤣 🤣 🤣 🤣 🤣)

https://blog.appliedcomputing.io/p/postmortem-intermittent-failure-in

#srecon

Postmortem: Intermittent Failure in SimKube CI Runners

On Wednesday, November 26, 2025, while testing changes to ACRL’s SimKube CI Runner, an ACRL employee discovered an intermittent failure in the runner.

Applied Computing Research Labs

How to get better at storytelling? Lots of practice! If you're involved in incidents you'll have lots of practice, and you can tell them to people at SRECon :D

#srecon

"No one ever made a decision because of a number. They need a story" Daniel Kahneman.

#srecon