Mastodawn

drmorr 4d ago

Last talk of this conference is "The Power of Stories" by Lorin Hochstein, aka @norootcause

#srecon

Show thread

drmorr

What kind of muppet are you?

#srecon

Chaos muppet

50%

Order muppet

50%

Poll ended at Mar 27 at 11:48pm.

Show thread

drmorr 4d ago

When something negative, surprising, or unexpected (like an incident) happened, the human response is "How could this bad thing have happened????????"

Stories are important because they are a tool that humans have developed to make sense of senseless events.

#srecon

Show thread

drmorr 4d ago

The incident prevention grand challenge is "How do you get the right information into the heads of the right people at the exact moment when they need it?"

#srecon

Show thread

drmorr 4d ago

"Stories" stick in your head, which is helpful for getting information into the heads of people who need it.

#srecon

Show thread

drmorr 4d ago

"Only a fool learns from his own mistakes, a wise man learns from the mistakes of others" -- unknown

We've also talked a lot about vicarious learning, e.g., learning from watching or listening to someone else.

Example: Shoulder-surfing while someone is looking up dashboards.

#srecon

Show thread

drmorr 4d ago

"Stories" are good when you can't shoulder-surf.

#srecon

Show thread

drmorr 4d ago

We're comparing "nursing" to "SRE" -- for example, a patient is deteriorating but it isn't showing up in their vitals yet. In engineering, this is like the system is in bad shape but no alerts have fired yet.

#srecon

Show thread

drmorr 4d ago

Claim: good stories have two properties to be "useful" for some nebulous definition of "useful"

#srecon

Show thread

drmorr 4d ago

1. The story needs to be anomalous -- something in the story needs to disrupt your mental model of the world. There needs to be a disconnect between your belief and reality. Example: "This should never happen!" which means "I never expected that to happen!"

#srecon

Show thread

drmorr 4d ago

2. The story needs to be immutable (this is kindof a weird term to use): important details are preserved as the story gets passed on.

#srecon

Show thread

drmorr 4d ago

We're discussing how this applies to the Therac-25 story.

The simple story is that "it's about race conditions", but the real story is a lot more complicated.

One example: the machine would frequently error and the errors were harmless, and in this case the error meant "the patient's dose is too high" but the operators didn't have that info.

#srecon

Show thread

drmorr 4d ago

When it comes to incidents, there are different kinds or styles of stories you can tell:

1. "The horror story": we failed over and the problem followed us to the new region! 😱
2. "The morality tale": the engineer ignored the failing test and the bad code made it to production 😡

#srecon

Show thread

drmorr 4d ago

The details of a story depend a lot on the perspective of the storyteller.

Example: the Challenger disaster. Feynman wrote an appendix to the "official" synopsis of the accident. He said this was a story about "management underestimating risks"

#srecon

Show thread

drmorr 4d ago

Another story, told by Edward Tufte (of Visual Display of Quantitative Information fame) said this was about "poor information presentation and bad visualization".

Not surprising that an info-vis guy said it was an info-vis problem 🤣

#srecon

Show thread

drmorr 4d ago

A third story told by Diane Vaughan said that challenger was a story about normalization of deviance.

If you tuned a noisy alert, that is normalization of deviance: if the alert is firing, but the system is healthy, we make the alert less noisy! We do this all the time.

#srecon

Show thread

drmorr 4d ago

"So what, Lorin, you told us all this stuff, what am I supposed to do with it?"

One take-away: when you do your incident writeups, tell it as a story.

#srecon

Show thread

drmorr 4d ago

(Is this a good time to plug ACRL's one public postmortem? 🤣 🤣 🤣 🤣 🤣)

https://blog.appliedcomputing.io/p/postmortem-intermittent-failure-in

#srecon

Postmortem: Intermittent Failure in SimKube CI Runners

On Wednesday, November 26, 2025, while testing changes to ACRL’s SimKube CI Runner, an ACRL employee discovered an intermittent failure in the runner.

Applied Computing Research Labs

Show thread

drmorr 4d ago

How to get better at storytelling? Lots of practice! If you're involved in incidents you'll have lots of practice, and you can tell them to people at SRECon :D

#srecon

Show thread

drmorr 4d ago

"No one ever made a decision because of a number. They need a story" Daniel Kahneman.

#srecon