Mastodawn

Alex Kretzschmar

I have discovered tonight that two tiers of my three tier personal ZFS backup strategy, of which I am very proud, hasn’t been working for over 6 months.

Tier 1 - local replication of all datasets to a server at the opposite end of the house. ✅ working fine

Tier 2 - zfs replication to my old UK server, horribly broken. Checksum errors on 2 of 4 drives. Proxmox keeps kernel panicking after a few hrs, not quite sure why yet (thank goodness for IPMI via Tailscale)

Tier 3 - Restic to Minio running on a Synology box which has completely shit the bed after an update in two ways. 1. Minio has upgraded their data architecture and provided no clear upgrade path - unacceptable! 2. DSM webUI will not load. Systemd errors are out the wazoo and the only option is a factory restore after several hours triaging.

A great opportunity to the double check that past assumptions are still valid and that monitoring is not totally made of cheese moving forward.

What do you do for your personal backups?

dugite-code Oct 23, 2023

@ironicbadger I'm still rolling old school with several removable hard drives I rotate and I physically carry to my office every day. The monitoring is the interesting point. I WAS using #Graylog for log aggregation and alerts but the #elasticsearch process was eating RAM. I keep meaning to try #Nagios and/or #Grafana

Chuck Finley Oct 23, 2023

@ironicbadger I recently switched my Restic backups from Minio to the Restic rest server and all that Minio upgrade headache is gone.

Alex Kretzschmar Oct 23, 2023

@minisculevestibule this was my plan tonight! I knew Minio was broken. But that DSM update from Synology has added a whole new load of fun to that situation.

John Warne Oct 23, 2023

@ironicbadger A mix of Proxmox backups and the excellent Kopia (onsite and offsite via TailScale) with Uptime Kuma push monitors for all

Alex Kretzschmar Oct 23, 2023

@johnny5w not heard of Kopia before, looks interesting.

Why this over something like restic or borg?

John Warne Oct 23, 2023

@ironicbadger In doing lots of research with the various backup solutions it ticked more of the boxes for me and has a great GUI. It's WAY faster and more efficient than my borg setup(s). Haven't tried restic but from what I heard Kopia is better at compression/dedup and faster as well. Honestly at this point it's an embarrassment of riches whichever solution you go with.

Alex Kretzschmar Oct 23, 2023

@johnny5w your last sentence is *chefs kiss*

digitalfox Oct 23, 2023

@ironicbadger Oof… I'm glad you found out before a disaster struck. A while ago, backups stopped working for me, which prompted me to make a "every month, manually check on backups" calendar reminder to ensure my and the family's systems are working as expected. More monitoring to come in the future.

As to my backups, while I don't have anything nearly as comprehensive, I do have some level of higher and lower priority differentiation while working with a tight budget for three machines - desktop, laptop, and phone:

1. Dropbox (…not great, but I'm on a free account with referral space) for high priority, not dependent on my own sysadmin'ng data sync and quasi-backup, including some paths with Dropsync on Android

2. Déjà Dup to a local NixOS backup server with two NAS drives, using BTRFS "RAID1" - my most recent project, replacing a USB drive plugged into a router, and my first time using NixOS (opting for impermanence at that)

3. Nextcloud on a server at a friend's house in another state, providing a weekly Android app backup target (just recovered using this last week after an accidental factory reset), VPS rclone backup target, and some syncing

4. Unison, selectively opting in various paths in my home directory, manually syncing in a star topology between that server at a friend's house and my two computers

Things fall through the cracks, which sucks for me as a data hoarder (and a data-fox), but I've at least avoided the worst of disasters (often self-inflicted) over my years of computer tinkering.

Joshua Lee Oct 23, 2023

@ironicbadger I actually run a pull from my remotes backups first which has a notification sent me to on my desktop that way I know that there's a connection, it'll also send me the last 6 hours worth of logs too. Which I tell myself I'd check at least weekly not to say that happens.
But I also use rsync which ar the last line will tell you if it succeeds or not.

prozak Oct 23, 2023

@ironicbadger I… haha… pray?
I aggressively started a data tiering campaign. Have my main NAS, that’s the brain. It holds everything. But I segment on “internet acquired items” and irreplaceable. All the datasets get locally replicated to another backup NAS. But then the irreplaceable tier, goes as follows: cold data goes into zip files and up to the cloud to Filen, then the warm data goes to iCloud and at some point goes to filen tooand I do have two USB disks that rotate borg

Alex Kretzschmar Oct 23, 2023

@Prozak filen?

prozak Oct 23, 2023

@ironicbadger Filen.io a DEU E2E encryption cloud storage startup, they started like 3 years ago and still don’t disappoint. Feels like what spideroak should have achieved

Alexey Oct 23, 2023

@ironicbadger
0 - various local scripts and crons jobs to a single location on primary server
1 - local pull ZFS replication to dedicated backup server
2 - remote ZFS replication - testing at the moment
3 - restic to Backblaze B2 for very important stuff

All of the above is monitored by CheckMK. Started with Nagios which is really enough, but CheckMK just a bit nicer.

Monitoring is a key, I'll know backup is not working OR zfs have checksum errors within hours max.

Alex Kretzschmar Oct 23, 2023

@amatashkin how’s the checkmk free tier for you?

Alexey Oct 23, 2023

@ironicbadger works good, auto discovery really makes life easier and removes some Nagios pain. Experience with Nagios really make life easier but it is doable from scratch. I went RAW edition, which is actual nagios engine under the hood. Opposed to the free tier which is using proprietary engine under the same GUI a little bit faster and have some nice features. Conditions similar to the old Tailscale free plan - enough for a small deployment, like 5-10 server.

Jürgen Haas Oct 23, 2023

@ironicbadger
I'm running @borgbackup locally and to a remote host with cronjobs, that also report success to healthcheck.io, so that I get notified immediately if a backup fails.
In addition, I also run monthly borgbackup checks, that verify the validity of the backups. That part is also watched by healthcheck.io
This setup hasn't failed for around 4 years on 40ish servers, desktops and laptops.

Alex Kretzschmar Oct 23, 2023

@jurgenhaas @borgbackup would love some more detail on the backup verification checks you perform!

Jürgen Haas Oct 24, 2023

@ironicbadger @borgbackup
There is a `borg check` command which verifies the backup structure and consistency. There are several levels, i.e. repository, archives, data, or even extract. The latter taking a lot of bandwidth, though, for remote backups. We're checking the first 2 by default.
If any issues are reported, there is then a repair command, but I don't know exactly how that works, just that is helps to repair the inconsistencies, if any had been detected in the past.

BorgBackup Oct 29, 2023

@jurgenhaas @ironicbadger There are quite good docs about what borg check --repair does (older docs had a description that is quite close to the code, more recent docs have a rewritten description).

BorgBackup Oct 29, 2023

@jurgenhaas @ironicbadger borg check has no "like an extraction" option, but guess you mean --verify-data (which cryptographically verifies the file content chunks).

if one wants an extraction-like check, there is borg extract --dry-run which does most of the steps except writing data to disk.

Cooljimy84 Oct 23, 2023

@ironicbadger Borg backup, which is local and rep off to the cloud. Backups are on zfs mirrored pairs local.
I also do a fillw copy every few months to a removable drive.

Jon Seager Oct 23, 2023

@ironicbadger I use syncthing for every day syncing and my server is set to receive only, then I recently introduced Borg on NixOS after years of Duplicati - love the simplicity of the Borg + Borgbase combo on nix! https://github.com/jnsgruk/nixos-config/blob/main/host/thor/extra.nix

nixos-config/host/thor/extra.nix at main · jnsgruk/nixos-config

jnsgruk's NixOS configuration flake ❄️. Contribute to jnsgruk/nixos-config development by creating an account on GitHub.

GitHub

glenn Oct 23, 2023

My home server backs up to BackBlaze via Duplicacy.

All the family laptops have the urbackup client backing up to an HP Prodesk Mini PC running Urbackup Server on Ubuntu. It does an nightly offsite backup to S3 also with Duplicacy.

GrayDanceOutfit Oct 23, 2023

@ironicbadger local backup to a desktop via syncoid. Remote backup to zfs.rent using syncoid.
I do monitoring with telegraf+influx+grafana and notifications with grafana+pushover. If my backups are more than 6hours I get a notification in my phone.

Alex Kretzschmar Oct 23, 2023

@GrayDanceOutfit ah I always forget about zfs.rent! Such a great option.

Sean Thrailkill Oct 23, 2023

@ironicbadger zfs replication + restic. Restic takes my Minio data and sends it to B2. I highly recommend adding the "onSuccess" and "onFailure" flags to restic and having it ping something like uptime Kuma so you can quickly see when backups fail.

Ministerofimpediments Oct 23, 2023

@ironicbadger
I have 3 layers as well.

A collection of old, unlabeled, flash drives of random sizes.

A stack of spiral bound notebooks.

Hope.

(All that and a bunch of other computers, cloud services, and the fact that I don’t have much to back up)

Alex Kretzschmar Oct 23, 2023

@ministerofimpediments hope. It’s the hope that gets ya! lol

Ethan Oct 24, 2023

@ironicbadger o shit that sucks.... I guess at some point I should do some logging other than checking the logs every now and then

Jim Salter Oct 24, 2023

@ironicbadger six months? You gotta up that active monitoring game, fam

Shift Systems Tiny Oct 24, 2023

@ironicbadger I use a shell script I wrote to snapshot vms and backup them up using borg called pango-backup and borgmatic for file backup all going to borg base. Along with quarterly restore tests I use shift-mon to alert me when a backup service as failed and borg base has a feature that will email you if no data is seen for a certain amount of time. Shift-mon also monitors zfs and smart stats. I'm the shift-mon maintainer so sorry about the shameless plug

Alex Kretzschmar Oct 24, 2023

@tiny_pangolin link me!

Aaron has moved Oct 26, 2023

@ironicbadger
It sounds like the monitoring piece is the main shortcoming of your approach. I use the free Grafana plan and Prometheus-style alerts. I use sanoid and syncoid, so submitted some code to make that spit out the right type of metrics. One lesson I've learnt is to have something write the last time each thing you rely on successfully completed, or confirmed OK, and you can alert if that is too long ago -- it finds issues you don't think to check for.