I have discovered tonight that two tiers of my three tier personal ZFS backup strategy, of which I am very proud, hasn’t been working for over 6 months.

Tier 1 - local replication of all datasets to a server at the opposite end of the house. ✅ working fine

Tier 2 - zfs replication to my old UK server, horribly broken. Checksum errors on 2 of 4 drives. Proxmox keeps kernel panicking after a few hrs, not quite sure why yet (thank goodness for IPMI via Tailscale)

Tier 3 - Restic to Minio running on a Synology box which has completely shit the bed after an update in two ways. 1. Minio has upgraded their data architecture and provided no clear upgrade path - unacceptable! 2. DSM webUI will not load. Systemd errors are out the wazoo and the only option is a factory restore after several hours triaging.

A great opportunity to the double check that past assumptions are still valid and that monitoring is not totally made of cheese moving forward.

What do you do for your personal backups?

@ironicbadger
It sounds like the monitoring piece is the main shortcoming of your approach. I use the free Grafana plan and Prometheus-style alerts. I use sanoid and syncoid, so submitted some code to make that spit out the right type of metrics. One lesson I've learnt is to have something write the last time each thing you rely on successfully completed, or confirmed OK, and you can alert if that is too long ago -- it finds issues you don't think to check for.