@matrix 🥺
That must have been rough and tough.
We love you! 🧡
Congratulation on the recovery, @matrix
While the postmortem should focus on what went wrong and how any likely reoccurrence of failures can be mitigated at acceptable cost, be sure to celebrate the successful recovery from catastrophic failure in production *without loss of data*, including meaningful communication to us.
Many organisations with far more resources and responsibilities fail to achieve even a fraction of this.

@interru Good point. Google's Bigtable picked P: when two replicas can't communicate for some time, the replication log grows on both sides, and will eventually get synced with some policy (e.g. highest timestamp wins).
While horrifying for a banking system, it's probably a fine compromise for an IM.
@interru Also, I don't know much about how Matrix federation works, but doesn't matrix.org need to store all messages for all "xyx:matrix.org" rooms, and also cache messages of rooms hosted elsewhere if at least a local user joined them?
Sounds like every large server will eventually process almost every large room in the entire network...
@matrix good luck on the remediation actions 🫡
@matrix as an advertisement for decentralization this is a bit harsh, but definitely effective!
(J/k, of course. Good luck with the recovery and thanks!)