Theory-heavy way of looking at reliability issues, which also includes suggestions on how to write about incidents: https://www.usenix.org/publications/loginonline/evolution-sre-google
It gives an introduction to STAMP (System-Theoretic Accident Model and Processes), which I had never heard of before reading this.
The book they reference, by MIT professor Nancy Leveson is available free online! https://direct.mit.edu/books/oa-monograph/2908/Engineering-a-Safer-WorldSystems-Thinking-Applied
I have some reading to do!