One benefit NFS & 9p have over sshfs though is that it's a lot easier to expose a read-only chroot with them.

Sure one can do the same with sshfs but the sheer amount of additional configuration needed gets a bit silly.

sshfs & 9p are both most likely more reliable than NFS though.

@lispi314
Tbh I really despise 9p. For whatever reason it just always fails to be setup properly. I kinda gave up on trying to fix it by now.

What are you using that it apparently just works and that you'd consider it reliable?

#9p

@agowa338 Not sure about reliability, I've used it a few times with diod.

My post is more a testament to the sheer clusterfuck that is NFS' cache management with nfs-ganesha (which is supposed to be higher reliability than the kernel's (a pretty low bar), that's sort of its entire branding nonsense) and apparently cross-version incompatibility bullshit (upgrading mostly Debian-based homelab I saw some "interesting" behavior).

@agowa338 I mostly refuse to use networked filesystem & device servers that integrate into the kernel because the second something goes wrong it starts fucking up kernel memory and at best requires a full system restart to fix, which is unacceptable.
@agowa338 FUSE exists. io_uring & liburing exist. ublk exists. There is no reason to implement any of it in the kernel anymore.
1. FUSE Overview — The Linux Kernel documentation

@lispi314

Huh, never hit one of these bugs. But I keep hitting FUSE bugs, where anything going wrong while a syscall that the kernel delegated to a FUSE is getting stuck the entire process that made that call becomes unkillable and you've to reboot the entire system...

(Like e.g. when an application tries to write a file into the FUSE mounted filesystem and the backend fails in a weird way)

and because you can't kill that process you can't remount the filesystem either...

@agowa338 Kernel NFS had a fun habit of doing the same thing, with extra problems sprinkled on top.

It's why I dropped it in the first place.

@lispi314

Oh and also more things should use the networked filesystem layer of the kernel instead of the regular blockdevice one. Esp. the FUSE things like SSHFS or rclone.

Why? Becaus nbd devices behave way more nicely with network packet loss and delays and such.

(Or it's just that qemu handles them better as that's basically the only thing I've been heavily using that uses nbd devices instead of regular block device for its mounts)

@agowa338

Oh and also more things should use the networked filesystem layer of the kernel instead of the regular blockdevice one. Esp. the FUSE things like SSHFS or rclone.

Tell me more, I'm not sure I follow quite what you're referring to.

Why? Because nbd devices behave way more nicely with network packet loss and delays and such.

I'm not sure what the default nbd-client uses besides the fact that it uses a bespoke kernel-integrated driver for it.

A bit like iSCSI.

In this case I think the niceness of your experience is more attributable to a good driver, if I do not misunderstand.

(Or it's just that qemu handles them better as that's basically the only thing I've been heavily using that uses nbd devices instead of regular block device for its mounts)

I don't know what QEMU does with them at all, besides having heard in passing it does something.

@lispi314

Well when you have something on the network mounted through /dev/nbd* instead of FUSE it just behaves way more nicely. But as I said before that may just be skewed because of qemu being the main thing I used with it...

(you can mount image files, like e.g. img/vhd/vhdx/... files through a command as a network blockdevice. Thereby you can avoid having to specify offsets when mounting and such)

@lispi314

Also I've been told that /dev/nbd is really what the kernel developers want you to use when there is a potential for high latency and network-io involved. But I'm not sure how much of that is (still) true...

@agowa338 Hm, I'm not sure, but I've been moving everything off tgt & onto nbd as a consequence of limitations with tgt (and refusal to use the kernel driver server).

Given some of the new trends in enterprise NAS stuff, it's likely a lot of the iSCSI will become legacy maintenance too.

@lispi314
If I recall correctly it is in parts because the kernel handles it differently as it expects the syscalls towards it to take way longer than for FUSE which is primarily designed for local-ish filesystems that should return more or less instantly...

Oh and all of the iSCSI stuff becoming legacy doesn't really surprise me after having experienced that all of the documentation for the in-kernel iSCSI initiator went offline (at least intermittently) a while ago...

@agowa338 Ah. There isn't really an alternative for filesystem (not block-device) to FUSE as far as this goes.

There was some people talking about asynchronous & zero-copy stuff but that was different and went nowhere.

There is now stuff for FUSE-over-io-uring (lwn) which might have an impact as far as latency & asynchronism goes.

Oh and all of the iSCSI stuff becoming legacy doesn't really surprise me after having experienced that all of the documentation for the in-kernel iSCSI initiator went offline (at least intermittently) a while ago...

Yikes. Not fun.

FUSE-over-io-uring design documentation — The Linux Kernel documentation

@lispi314

> There isn't really an alternative for filesystem (not block-device) to FUSE as far as this goes.

Yea, true. But I have to admit, that I was primarily thinking of places where people used FUSE to expose a (kind of) block device.

Also I have to admit that all of the networked FUSE stuff has become significantly better over the past years too.

Oh and for iSCSI the domain was "linux-iscsi [.] org" and all of the other places just referenced it...

@agowa338 > Oh and for iSCSI the domain was "linux-iscsi [.] org" and all of the other places just referenced it...

Nothing spells long-term reliable like "we're basically completely separate but maintained in-tree"⸮
@agowa338 I think it's a better omen to just be completely separate and in user-space at that point.

Plus that way it doesn't depend on any given kernel.

@lispi314

Well I'm kinda split on that. Even though having all of that in userspace is good for security reasons it is also extremely bad for performance reasons as it adds a bunch of slow context switches and layers of indirection.

Esp. if you're running low performing hardware having a kernelspace-only driver for something that does heavy IO makes a significant difference....

@agowa338 That's where the io_uring zero-copy stuff would matter a lot, yeah.

Sure you'd still get the context switching but if it's still zero-copy?

@lispi314

well when you're on low-performing hardware every single goto matters as it kills the CPU instruction cache...

@lispi314 and even if it is zero-copy it'll almost certainly still get purged out of L1 cache in most cases.

Which is also something that significantly hurts when you're on the lower end hardware.

@agowa338 So, when we say lower-end, we mean Pentium 4 or we mean 6502? Because yeah, with the latter you're kind of fucked with anything that isn't basically an end-to-end monolith.

(At least as far as static systems & conventional architectures go.)

@lispi314

It's relative. The heavier you consider these performance tweaks the more you can run on lower end hardware.

Same for when you don't use k8s and CNI indirection layers with services and all compared to just throwing it on there natively "old fashion".

@agowa338 At a certain point the best option is to either give up on static systems and massively downgrade expectations or to switch to dynamic systems capable of adjusting to user activities (or otherwise smarter design).

(A multithread-aware JIT with adequate code hints could prove the safety of optimizing operations as a block, for example.)

By smarter design: Consider that a capability-addressed system in a high-level language could do basically the same thing Qubes OS does with an order of magnitude less resources. (Even if it's still a static system.)

@lispi314

As I just wrote in the other post what I was trying to get at was that security, usability, and performance form an impossible trinity.

(Just look at e.g. VMware how much you waste just for the management overhead of a VCF9 deployment. Or your average k8s cluster. It's insane. It doesn't matter for google scales but it for sure does for way smaller ones. Especially with the currently inflated prices for hardware...)

Are security and reliability fundamentally incompatible?

8 comments

Lobsters

@lispi314

Just for context, what I mean with relative is that when you want to self-host a lot of things a i7-4770K can be low-end. But for other things a raspberry pi would be considered low end. Or using an older fritzbox (CPE router) as your web server with download/filesharing capabilities or similar.

security, usability, and performance basically form an impossible trinity.

@lispi314

well I wish they'd have kept the documentation "in tree" (within the official kernel documentation) as well instead of referencing to their own domain that they just let expire at some point...