How do you effectively backup your high capacity (20+ TB) local NAS?

https://lemmy.world/post/43604046

How do you effectively backup your high capacity (20+ TB) local NAS? - Lemmy.World

I have a 56 TB local Unraid NAS that is parity protected against single drive failure, and while I think a single drive failing and being parity recovered covers data loss 95% of the time, I’m always concerned about two drives failing or a site-/system-wide disaster that takes out the whole NAS. For other larger local hosters who are smarter and more prepared, what do you do? Do you sync it off site? How do you deal with cost and bandwidth needs if so? What other backup strategies do you use? (Sorry if this standard scenario has been discussed - searching didn’t turn up anything.)

I don’t. Of my 120tb, I only care about the 4tb of personal data and I push that to a cloud backup. The rest can just be downloaded again.
Do you have logs or software that keeps track of what you need to redownload? A big stress for me with that method is remembering or keeping track of what is lost when I and software can’t even see the filesystem anymore.

I don’t know of a pre-wrapped utility to do that, but assuming that this is a Linux system, here’s a simple bash script that’d do it.

#!/bin/bash # Set this. Path to a new, not-yet-existing directory that will retain a copy of a list # of your files. You probably don't actually want this in /tmp, or # it'll be wiped on reboot. file_list_location=/tmp/storage-history # Set this. Path to location with files that you want to monitor. path_to_monitor=path-to-monitor # If the file list location doesn't yet exist, create it. if [[ ! -d "$file_list_location" ]]; then mkdir "$file_list_location" git -C "$file_list_location" init fi # in case someone's checked out things at a different time git -C "$file_list_location" checkout master find "$path_to_monitor"|sort>"$file_list_location/files.txt" git -C "$file_list_location" add "$file_list_location/files.txt" git -C "$file_list_location" commit -m "Updated file list for $(date)"

That’ll drop a text file at /tmp/storage-history/files.txt with a list of the files at that location, and create a git repo at /tmp/storage-history that will contain a history of that file.

When your drive array kerplodes or something, your files.txt file will probably become empty if the mount goes away, but you’ll have a git repository containing a full history of your list of files, so you can go back to a list of the files there as they existed at any historical date.

Run that script nightly out of your crontab or something ($ crontab -e to edit your crontab).

As the script says, you need to choose a file_list_location (not /tmp, since that’ll be wiped on reboot), and set path_to_monitor to wherever the tree of files is that you want to keep track of (like, /mnt/file_array or whatever).

You could save a bit of space by adding a line at the end to remove the current files.txt after generating the current git commit if you want. The next run will just regenerate files.txt anyway, and you can just use git to regenerate a copy of the file at for any historical day you want. If you’re not familiar with git, $ git log to find the hashref for a given day, $ git checkout <hashref> to move where things were on that day.

EDIT: Moved the git checkout up.

That’s incredibly helpful and informative, a great read. Thanks so much!
Abefinder/Neofinder is great for cataloging but it costs money. If you do a limited backup it’s good to know what you had. I use tape formatted to LTFS and Neofind both the source and the finished tape.