This is a great writeup of a DB corruption bug and its detection and resolution. Much respect to Claire, these "things that should always happen in the right order have happened in the wrong order because of some particular set of extreme conditions, with surprise downstream consequences" bugs are absolutely the worst. "Let's reason backwards from effects to causes, with the caveat that causality maybe sometimes doesn't exist" is so hard.

https://thomasp.vivaldi.net/2023/07/28/what-happened-to-vivaldi-social/

What happened to Vivaldi Social? | Thomas Pike’s other blog

A deep dive into the events of Saturday 8 July 2023, when user accounts started disappearing from the Vivaldi Social Mastodon instance.

Thomas Pike’s other blog

@mhoye this is not stated explicitly in the postmortem, but it appears to me that the root cause is that their database setup uses _asynchronous_ replication while performing read queries on the secondary (slave) server.

If I'm right, this wouldn't have happened with synchronous replication (at the expense of a huge drop in write performance).

Having deployed this kind of setup for a customer, an event like this was my #1 fear, and I asked my customer to explicitly assume the risks.

@dek @mhoye that, and the fact that creating the account and setting its URL are not done in the same transaction, which is an application design issue.