Mastodawn

🪧 IRC Log: rm -rf /var/opt/gitlab/postgresql/data:
https://kmcd.dev/posts/irc-log-rm-rf-/var/opt/gitlab/postgresql/data/
#Humor #Irc #Rm #Incident

IRC Log: rm -rf /var/opt/gitlab/postgresql/data

The Incident Log: January 31, 2017 [23:00] *** Topic: DB Replication Lag | Status: 🔴 Critical [23:05] *** tired_sysadmin has joined [23:10] <tired_sysadmin> Replication is stuck again. The secondary node (db2) is refusing to sync. [23:11] <tired_sysadmin> I’m going to wipe the data directory on db2 and let it pull a fresh copy from master. [23:12] <tired_sysadmin> rm -rf /var/opt/gitlab/postgresql/data [23:12] <tired_sysadmin> Weird. It’s taking a while. Usually empty directories delete instantly. [23:13] <helper_dev> Hey, why did the website just go 500? [23:13] <tired_sysadmin> … [23:13] <tired_sysadmin> I’m looking at my terminal prompt. [23:14] <tired_sysadmin> It says root@db1. [23:14] <helper_dev> db1 is Prod. You are deleting Prod. [23:15] <tired_sysadmin> CTRL+C CTRL+C CTRL+C [23:15] <tired_sysadmin> Okay, I stopped it. How much is left? [23:16] <helper_dev> Checking… The directory is 4.5KB. [23:16] <tired_sysadmin> We had 300GB of data. [23:17] <helper_dev> Okay, don’t panic. We have 5 different backup mechanisms. Let’s check S3. [23:20] <helper_dev> S3 bucket is empty. The backup script has been failing silently since version 8.1. [23:21] <tired_sysadmin> Check the Azure disk snapshots. [23:22] <helper_dev> Not enabled. [23:23] <tired_sysadmin> …LVM snapshots? [23:24] <helper_dev> We take them every 24 hours. We just lost 6 hours of data. [23:25] <tired_sysadmin> I am going to live stream the restoration on YouTube so people don’t kill us. Postmortem of database outage of January 31

kmcd.dev