@mcc @nogweii @whitequark btrfs isn't a journalling FS -- it's copy-on-write, which is subtly different.
The problem with unexpected power-off is when the hardware lies. btrfs requires that when the disk says that data's hit permanent storage, it really has. In some cases of buggy firmware, disks can pass a write barrier while the data's still only in cache. With a power-fail, that can lead to metadata corruption, because the FS has updated the superblock, pointing to an incomplete transaction.
@darkling @nogweii @whitequark I see. But it seems like that would be no greater a problem for BTRFS than ext4.
"it's copy-on-write, which is subtly different"
Does it have different performance characteristics? Intuitively it seems like it must but I can't really justify the idea it does moreso than modern journaling/autodefrag
@mcc @nogweii @whitequark I don't know about performance. I can describe the algorithm.
Due to the way it works (the copy-on-write part), a lost write is going to effectively drop an entire page of metadata, rather than simply not updating an existing page. It *never* writes updated data in place, except for the superblocks, which have fixed locations. So the damage in the missed-write case is rather larger than with non-CoW FSes
@mcc @whitequark I've worn out* I think three sets of HGST disks on btrfs over the years and have had little problems.
* from old age in SMART data
@mcc @whitequark I think this is a scenario where external influences are critical. I would use ext4. Mostly because its quirks are well known and if I had to recover it, there’s tons of resources all the way down to physical recovery companies.
BTRFS I think is missing all that infrastructure.
@mcc @petrillic @whitequark btrfs will allow you multiple copies of your data and metadata on a single disk.
It might protect against some disk issues, but probably not that many. SSDs will just stop working altogether on controller or dram failure and lose all of the disk at once.
I hope you are aware that SSDs are not recommended to be kept unpowered - the 10 year data retention relies on scrubbing that happens only when the power is on.
There are a couple things you can do here... One is BTRFS has checksums so it will *detect* when the data has rotted in the drawer, whereas ext4 doesn't.
Also, BTRFS you can set the mode of data storage to DUP and you'll get TWO copies of every data block (at the expense of being able to store about half the stuff), BTRFS can then do a scrub and detect corrupted blocks and fix them from the good copy.
Finally, you can do compression, snapshots, and sends
snapshots are good for keeping history of things, and send is good for offsite backup.
Oh, and you can do deduplication, which might let you store more stuff?
I have NEVER lost a btrfs drive to anything but hardware failure, I've been using it since about 2012 or something.
I think this is a fair high level view. Another things about zfs is the license and such makes integrating it into a "normal" desktop system or whatever a pain in the ass. For example you can't just add a package in Debian.
I 100% suggest you format your single backup drive as btrfs, set DUP for data if you have a big enough drive, and mount it with compress=zstd unless you're storing highly compressed data already.
oh it looks like it does now... dkms the debian kernel module system or something similar has been around for a long time, but zfs support is I think relatively new (say last 5 years?)
if you want to use DKMS stuff make sure you install the linux-headers for your linux-kernel package !
@petrillic @mcc @whitequark I don’t personally use Btrfs right now, so can’t comment on it directly. I know it’s similar to ZFS in this regard, but I don’t know the details of how it differs.
General performance of ZFS is really impressive, even on single drives, mostly due to the concept of async writes. It buffers a bunch of async writes in RAM as a “transaction group”, then flushes them all in a mostly-sequential write. The state of the filesystem is always consistent, though applications may lose a few seconds of data if the system is rebooted before a transaction group flushes.
@petrillic @mcc I am only n=1 but will mention I use ZFS on all my single-drive systems (mostly laptops) and have zero complaints with performance. I appreciate being able to back up entire filesystems to my NAS (also ZFS) with checksums, snapshots, encryption etc. intact.
My biggest frustration is the lack of rebalancing support, specifically on pools big enough I can't copy everything off. Having to install separate kernel modules is only a mild irritation for me though, YMMV
@mcc btrfs
my headmate, who is obsessive over data integrity, runs btrfs on her NAS with zero issues. it has nice things like snapshotting and such. the reputation btrfs has dates back to many years ago and i don't think the issues people distrust it for have mattered for quite a while
@glyph @whitequark @mcc Sort of...
Symbian carry a *lot* of out-of-tree patches to btrfs. I believe they did *something* to integrate MD-RAID with btrfs, and Synology's "btrfs" isn't entirely compatible with mainline any more.
AIUI, ZFS really requires multiple drives to be effective.
You might gain a little value from extra checksums on file system blocks on a single drive, but if those checksums ever start failing on a hard drive there is a high likelihood that most of the drive is about to fail completely.
I had researched ZFS a fair bit as I planned to build my own FreeBSD NAS around 3-4 drives in ZFS, but eventually decided to buy an off-the-shelf ZFS NAS from the TrueNAS people.
@mcc in btrfs - every file can have different compression, if you're crazy enough. What I do is set compression on the root folder of a new fs, and let that be inherited everywhere.
btrfs property set . compression zstd:8 ; chattr +c .
I think those are both true in general though I don't know ext4 well enough to compare in depth.
1) One of the fundamental ideas of ZFS is Copy-On-Write. This makes it function similarly to a VCS, in that this makes snapshots nearly free. It sets a checkpoint where from now until you release the snapshot, your new present state of the file system stores only the changed blocks.
2) ZFS supports several compression algorithms all of which (including the default) work very well.
+
3) ZFS also has built-in "ZFS send" and "ZFS receive" functions for copying an entire ZFS filesystem to new media of similar or different drive layout, on the same system or over a network.
I've got limited experience with those, but it seems to me like they work well.
Oh, forgot to say about the compression:
2.A.) I always think of compressing and decompressing as slowing things down. The reverse seems to be true - ZFS benchmarks I've looked at say that having strong compression integrated in the FS actually *speeds up* the file system, because it saves more than enough disk writes/reads to make up for the CPU overhead.
It also can do auto deduplication if you like - more useful fall-out of the COW mechanism - but that's a bit too freaky for me.
The other thing about ZFS that's a bit hard to explain, and frankly I don't know well enough to know if I'm explaining it right, is that it seems to integrate much more detailed knowledge of physical drives than most file systems.
It talks to SCSI or SATA at a very low level, uses SMART data from HDDs, does slow background "scrubbing" of the drives over time, to force the drive to see & reallocate sectors starting to fail, etc.
I don't know all the details, but it seems like good stuff.
@CliftonR "It talks to SCSI or SATA at a very low level"
Imagine I plugged a SATA drive into a USB3 enclosure. Should I assume this will not happen the way ZFS hopes?
Ya, I was wondering that myself as I wrote it. It's another damn good question.
The answer is I really don't know how much it may affect that, or to what extent it can see "through" the USB3/SATA converter. If Google still worked properly it would be easier to find out.
I earlier mentioned Michael Lewis @mwl as a ZFS expert (which he is) and he seems like a nice guy, and you know, the good kind of tech weirdo.
So I am, with minor hesitation, tagging him in now to correct any misinformation I may be spreading about ZFS.
He might also find your base question interesting, what kind of file system is best to put on a single standalone drive being used as a system or data backup.
I've never seen that discussed much, though it's a great question to ask.
Single disk system? Set copies=2 for error correction.
ZFS snapshots are the most efficient of any filesystem thanks to copy-on-write.
ZFS is fine for backing up, but error correction applies to the data it gets. Send garbage, you'll have high-integrity garbage.
Compression trades CPU cycles for disk I/O. Most hosts today have more CPU than IOPS, so it's a fair trade.