"Due to potential legal incompatibilities between the CDDL and GPL, despite both being OSI-approved free software licenses which comply with DFSG, ZFS development is not supported by the Linux kernel"
@mcc been that way for decades now
@whitequark I have a new hard drive I intend to use primarily for backup and I am currently considering BTRFS or ZFS for the Linux part instead of ext4 (because I hear they can do some thing of storing extra error-checking data to protect against physical disk corruption). In your view, if I intend to use mainline Debian indefinitely, will BTRFS, ZFS, both, or neither give me the least pain getting things working?
@whitequark A few people are commenting on BTRFS reliability problems which is weird because I thought the whole point was to be "the more reliable fs". Debian's wiki links to this bewildering compatibility table that looks like a bunch of stuff I don't care about (the only features I care about are reliability, and some of zfs's auto-backup stuff sounded compelling) but the weird "mostly ok" line around defragmentation/autodefragmentation worries me a little https://btrfs.readthedocs.io/en/stable/Status.html
Status — BTRFS documentation

@mcc @whitequark I've been using btrfs for years without any problems, however I also never used the known to be problematic RAID5/6 support.
@mcc @whitequark I've been running btrfs on servers for years now, no filesystem bugs so far. (One issue had arisen around power being cut leading to some data corruption but that wasn't btrfs' fault)
@nogweii @whitequark i thought the entire point of a journaling fs was that cutting power doesn't lead to data corruption (unless the corruption was at the app level I suppose)
@mcc @nogweii @whitequark it doesn't *if the hardware upholds its end of the bargain*. no fs can protect against hardware that does not fulfill the guarantees its supposed to provide, and the only corruption I've had in btrfs was indeed due to faulty hardware. Btrfs has self-validation features so when faulty hardware breaks things btrfs is noisier about it than many fses, and that leads to a perception that it is worse when its just better at knowing what's broken.

@mcc @nogweii @whitequark btrfs isn't a journalling FS -- it's copy-on-write, which is subtly different.

The problem with unexpected power-off is when the hardware lies. btrfs requires that when the disk says that data's hit permanent storage, it really has. In some cases of buggy firmware, disks can pass a write barrier while the data's still only in cache. With a power-fail, that can lead to metadata corruption, because the FS has updated the superblock, pointing to an incomplete transaction.

@darkling @nogweii @whitequark I see. But it seems like that would be no greater a problem for BTRFS than ext4.

"it's copy-on-write, which is subtly different"

Does it have different performance characteristics? Intuitively it seems like it must but I can't really justify the idea it does moreso than modern journaling/autodefrag

@mcc @nogweii @whitequark I don't know about performance. I can describe the algorithm.

Due to the way it works (the copy-on-write part), a lost write is going to effectively drop an entire page of metadata, rather than simply not updating an existing page. It *never* writes updated data in place, except for the superblocks, which have fixed locations. So the damage in the missed-write case is rather larger than with non-CoW FSes

@mcc @whitequark i have a friend who despises btrfs and has had so many problems with it but i've been using it for years and it's always worked pretty well so idk? i think it can be a bit janky and confusing sometimes but imo a lot of that is just because it's so different from traditional filesystems
the one thing i can think of is that i've recently been getting some qgroup related warnings on my server that i'm not sure the cause of, i ran the command the message recommended to address the issue last night so we will see if it makes the messages go away
@mcc @whitequark You don't want to defrag something that's been snapshotted (because it breaks the reflink copy, and you end up with ~twice the data usage). It's stable; it just has this unexpected side-effect that many people don't know about until they try it.

@mcc @whitequark I've worn out* I think three sets of HGST disks on btrfs over the years and have had little problems.

* from old age in SMART data

@mcc @whitequark No data loss except once I ended up having a directory that would crash the os if accessed. I had been migrating file systems with DD onto new disks since the first usable versions of btrfs. Eventually on the next set of disks (maybe 8 years ago) I did a clean mkfs and never had any problems since.
@mcc So it's stable enough but it does have some issues with database files which don't do well with the btrfs data model, leading to massive fragmentation caused by the random writes.
This is not an issue if you're copying files whole like for a backup.
@mcc for a single backup disk you would probably be better off with xfs or ext4 simply because there's no need for the btrfs special features.
If it ever comes to data recovery then these two will be better known, and easier on the recovery process.
@mcc except if you're backing up a btrfs to btrfs. It then becomes possible to make snapshots on the master and then stream those to the backup, recreating snapshots there.
@whitequark @mcc Honestly the only times I have had problems with BTRFS have been when I have done deeply silly things, and even then it's always been recoverable. Been running some big storage pools of slow-spinning rust for years and no trouble. And I have *terrible* luck!
@mcc @whitequark I have been burned by btrfs multiple times, and ZFS has been fantastic. 🤷
@mcc @whitequark just my take but I consider ZFS aimed at arrays and such. Single drive I’m just not sure you’re going to get any benefit and it might actually be substantial worse.
@petrillic @whitequark do you think there is an advantage of BTRFS over ext4 for a single drive, single computer, non RAID, my sole/primary goal is "i want it to last as long in a room-temperature drawer as possible"?

@mcc @whitequark I think this is a scenario where external influences are critical. I would use ext4. Mostly because its quirks are well known and if I had to recover it, there’s tons of resources all the way down to physical recovery companies.

BTRFS I think is missing all that infrastructure.

@mcc @petrillic @whitequark btrfs will allow you multiple copies of your data and metadata on a single disk.

It might protect against some disk issues, but probably not that many. SSDs will just stop working altogether on controller or dram failure and lose all of the disk at once.

I hope you are aware that SSDs are not recommended to be kept unpowered - the 10 year data retention relies on scrubbing that happens only when the power is on.

@mcc @petrillic @whitequark

There are a couple things you can do here... One is BTRFS has checksums so it will *detect* when the data has rotted in the drawer, whereas ext4 doesn't.

Also, BTRFS you can set the mode of data storage to DUP and you'll get TWO copies of every data block (at the expense of being able to store about half the stuff), BTRFS can then do a scrub and detect corrupted blocks and fix them from the good copy.

Finally, you can do compression, snapshots, and sends

@mcc @petrillic @whitequark

snapshots are good for keeping history of things, and send is good for offsite backup.

Oh, and you can do deduplication, which might let you store more stuff?

I have NEVER lost a btrfs drive to anything but hardware failure, I've been using it since about 2012 or something.

@dlakelan @petrillic @whitequark it kinda seems like btrfs has all the same features of zfs, and people like zfs more, but i don't see a lot of reasons other than "vibes" or "super fancy code techniques that matter in high end situations i don't hit"

@mcc @petrillic @whitequark

I think this is a fair high level view. Another things about zfs is the license and such makes integrating it into a "normal" desktop system or whatever a pain in the ass. For example you can't just add a package in Debian.

I 100% suggest you format your single backup drive as btrfs, set DUP for data if you have a big enough drive, and mount it with compress=zstd unless you're storing highly compressed data already.

@dlakelan it sounds like debian has some kind of "spooky" system now where every time it updates the kernel it silently in the background spends a few minutes compiling a kernel so it can integrate the zfs module?

@mcc

oh it looks like it does now... dkms the debian kernel module system or something similar has been around for a long time, but zfs support is I think relatively new (say last 5 years?)

if you want to use DKMS stuff make sure you install the linux-headers for your linux-kernel package !

@petrillic @mcc @whitequark I don’t personally use Btrfs right now, so can’t comment on it directly. I know it’s similar to ZFS in this regard, but I don’t know the details of how it differs.

General performance of ZFS is really impressive, even on single drives, mostly due to the concept of async writes. It buffers a bunch of async writes in RAM as a “transaction group”, then flushes them all in a mostly-sequential write. The state of the filesystem is always consistent, though applications may lose a few seconds of data if the system is rebooted before a transaction group flushes.

@petrillic @mcc I am only n=1 but will mention I use ZFS on all my single-drive systems (mostly laptops) and have zero complaints with performance. I appreciate being able to back up entire filesystems to my NAS (also ZFS) with checksums, snapshots, encryption etc. intact.

My biggest frustration is the lack of rebalancing support, specifically on pools big enough I can't copy everything off. Having to install separate kernel modules is only a mild irritation for me though, YMMV

@mcc btrfs

my headmate, who is obsessive over data integrity, runs btrfs on her NAS with zero issues. it has nice things like snapshotting and such. the reputation btrfs has dates back to many years ago and i don't think the issues people distrust it for have mattered for quite a while

@whitequark @mcc Seconded. I've been using since it landed in the kernel in 2.6.29, and I've had two broken filesystems in all that -- one shortly after I started using it, and one as a result of my own cock-up replacing a failed disk.
@mcc none of what i said applies to btrfs raid 5/6 because we have zero experience with that in particular
btrfs(5) — BTRFS documentation

@whitequark @mcc That's reliable, as long as you don't want it to actually handle a broken disk. (So, not actually useful for the precise case that you need it for). I'd recommend steering clear of parity RAID until those issues are fixed. But don't hold your breeath.
@darkling @whitequark @mcc FWIW I have handled multiple broken disks on a synology with SHR, which I know is btrfs and I think is raid5 underneath.

@glyph @whitequark @mcc Sort of...

Symbian carry a *lot* of out-of-tree patches to btrfs. I believe they did *something* to integrate MD-RAID with btrfs, and Synology's "btrfs" isn't entirely compatible with mainline any more.

@darkling @whitequark @mcc yeah, I was wrong about that. it’s btrfs *over* mdraid, which is a weird choice. in practice, it works very well, but I guess I am in big trouble if I ever want to migrate to a new storage solution
@glyph @darkling @whitequark mdraid sounds like if you tried to say "android" and "mermaid" at the same time. as you can see, i have nothing to add to this conversation
@glyph @whitequark @mcc I think it's not even one over the other in the expected layers. They're actually integrated together, somehow. The standard btrfs tools don't work properly on Synology devices.

@mcc @whitequark

AIUI, ZFS really requires multiple drives to be effective.

You might gain a little value from extra checksums on file system blocks on a single drive, but if those checksums ever start failing on a hard drive there is a high likelihood that most of the drive is about to fail completely.

I had researched ZFS a fair bit as I planned to build my own FreeBSD NAS around 3-4 drives in ZFS, but eventually decided to buy an off-the-shelf ZFS NAS from the TrueNAS people.

@CliftonR ok. is it accurate zfs can be snapshotted and restored more efficiently (in terms of on-disk cost) than ext4?somebody also said something about btrfs allowing zstd compression (for some of the disk? for all of the disk?)

@mcc in btrfs - every file can have different compression, if you're crazy enough. What I do is set compression on the root folder of a new fs, and let that be inherited everywhere.

btrfs property set . compression zstd:8 ; chattr +c .

@mcc

I think those are both true in general though I don't know ext4 well enough to compare in depth.

1) One of the fundamental ideas of ZFS is Copy-On-Write. This makes it function similarly to a VCS, in that this makes snapshots nearly free. It sets a checkpoint where from now until you release the snapshot, your new present state of the file system stores only the changed blocks.

2) ZFS supports several compression algorithms all of which (including the default) work very well.
+

@mcc

3) ZFS also has built-in "ZFS send" and "ZFS receive" functions for copying an entire ZFS filesystem to new media of similar or different drive layout, on the same system or over a network.

I've got limited experience with those, but it seems to me like they work well.

@mcc

Oh, forgot to say about the compression:

2.A.) I always think of compressing and decompressing as slowing things down. The reverse seems to be true - ZFS benchmarks I've looked at say that having strong compression integrated in the FS actually *speeds up* the file system, because it saves more than enough disk writes/reads to make up for the CPU overhead.

It also can do auto deduplication if you like - more useful fall-out of the COW mechanism - but that's a bit too freaky for me.

@mcc

The other thing about ZFS that's a bit hard to explain, and frankly I don't know well enough to know if I'm explaining it right, is that it seems to integrate much more detailed knowledge of physical drives than most file systems.

It talks to SCSI or SATA at a very low level, uses SMART data from HDDs, does slow background "scrubbing" of the drives over time, to force the drive to see & reallocate sectors starting to fail, etc.

I don't know all the details, but it seems like good stuff.

@CliftonR "It talks to SCSI or SATA at a very low level"

Imagine I plugged a SATA drive into a USB3 enclosure. Should I assume this will not happen the way ZFS hopes?

@mcc

Ya, I was wondering that myself as I wrote it. It's another damn good question.

The answer is I really don't know how much it may affect that, or to what extent it can see "through" the USB3/SATA converter. If Google still worked properly it would be easier to find out.

@mcc

I earlier mentioned Michael Lewis @mwl as a ZFS expert (which he is) and he seems like a nice guy, and you know, the good kind of tech weirdo.

So I am, with minor hesitation, tagging him in now to correct any misinformation I may be spreading about ZFS.

He might also find your base question interesting, what kind of file system is best to put on a single standalone drive being used as a system or data backup.

I've never seen that discussed much, though it's a great question to ask.

@CliftonR @mcc

Single disk system? Set copies=2 for error correction.

ZFS snapshots are the most efficient of any filesystem thanks to copy-on-write.

ZFS is fine for backing up, but error correction applies to the data it gets. Send garbage, you'll have high-integrity garbage.

Compression trades CPU cycles for disk I/O. Most hosts today have more CPU than IOPS, so it's a fair trade.

@mwl @CliftonR "Single disk system? Set copies=2 for error correction."

Is this an option with brtfs or is it a zfs-only concept?

@mwl @mcc

TIL "copies=2", thank you!

@CliftonR @mcc if I may - I'm not a ZFS dev but I have done the odd bit of debugging & patch contributing over the last decade and a bit -

ZFS only talks to devices at the usual block level, ie. read block/write block/discard block, so it will work fine with any fairly usual block device (SD card, HDD in USB enclosure, etc).

It verifies checksums every time a block is read, but a scrub (which reads every block) is only triggered if you explicitly request one (eg from a cron job)

@CliftonR @mcc For ZFS deduplication, $WORK is in the frankly uncommon territory where dedupe actually makes business sense for us, despite my repeated past attempts to figure out how to ditch it. My non-expert advice from this experience is "don't".

No data loss from it, I'm not worried about that, but you periodically hit edge cases (deadlocks, etc) that disappear entirely when dedup is not used.

ZFS team fix them as they see them, but it's time consuming, and in most use cases not worth it.

@fwaggle @mcc

Very good to know, TY!

I guess my gut feelings about tech choices continue to be well-trained, for the most part.

@CliftonR @mcc Yep! In my very limited experience, dedupe and L2ARC are ZFS' siren songs. They lure you in because on the surface they seem like they'll solve a bunch of problems for you, but they'll actually cause more problems than they solve for *most* people.