I have discovered tonight that two tiers of my three tier personal ZFS backup strategy, of which I am very proud, hasn’t been working for over 6 months.

Tier 1 - local replication of all datasets to a server at the opposite end of the house. ✅ working fine

Tier 2 - zfs replication to my old UK server, horribly broken. Checksum errors on 2 of 4 drives. Proxmox keeps kernel panicking after a few hrs, not quite sure why yet (thank goodness for IPMI via Tailscale)

Tier 3 - Restic to Minio running on a Synology box which has completely shit the bed after an update in two ways. 1. Minio has upgraded their data architecture and provided no clear upgrade path - unacceptable! 2. DSM webUI will not load. Systemd errors are out the wazoo and the only option is a factory restore after several hours triaging.

A great opportunity to the double check that past assumptions are still valid and that monitoring is not totally made of cheese moving forward.

What do you do for your personal backups?

@ironicbadger
0 - various local scripts and crons jobs to a single location on primary server
1 - local pull ZFS replication to dedicated backup server
2 - remote ZFS replication - testing at the moment
3 - restic to Backblaze B2 for very important stuff

All of the above is monitored by CheckMK. Started with Nagios which is really enough, but CheckMK just a bit nicer.

Monitoring is a key, I'll know backup is not working OR zfs have checksum errors within hours max.

@amatashkin how’s the checkmk free tier for you?
@ironicbadger works good, auto discovery really makes life easier and removes some Nagios pain. Experience with Nagios really make life easier but it is doable from scratch. I went RAW edition, which is actual nagios engine under the hood. Opposed to the free tier which is using proprietary engine under the same GUI a little bit faster and have some nice features. Conditions similar to the old Tailscale free plan - enough for a small deployment, like 5-10 server.