One benefit NFS & 9p have over sshfs though is that it's a lot easier to expose a read-only chroot with them.

Sure one can do the same with sshfs but the sheer amount of additional configuration needed gets a bit silly.

sshfs & 9p are both most likely more reliable than NFS though.

@lispi314
Tbh I really despise 9p. For whatever reason it just always fails to be setup properly. I kinda gave up on trying to fix it by now.

What are you using that it apparently just works and that you'd consider it reliable?

#9p

@agowa338 Not sure about reliability, I've used it a few times with diod.

My post is more a testament to the sheer clusterfuck that is NFS' cache management with nfs-ganesha (which is supposed to be higher reliability than the kernel's (a pretty low bar), that's sort of its entire branding nonsense) and apparently cross-version incompatibility bullshit (upgrading mostly Debian-based homelab I saw some "interesting" behavior).

@agowa338 I mostly refuse to use networked filesystem & device servers that integrate into the kernel because the second something goes wrong it starts fucking up kernel memory and at best requires a full system restart to fix, which is unacceptable.

@lispi314

Oh and also more things should use the networked filesystem layer of the kernel instead of the regular blockdevice one. Esp. the FUSE things like SSHFS or rclone.

Why? Becaus nbd devices behave way more nicely with network packet loss and delays and such.

(Or it's just that qemu handles them better as that's basically the only thing I've been heavily using that uses nbd devices instead of regular block device for its mounts)

@agowa338

Oh and also more things should use the networked filesystem layer of the kernel instead of the regular blockdevice one. Esp. the FUSE things like SSHFS or rclone.

Tell me more, I'm not sure I follow quite what you're referring to.

Why? Because nbd devices behave way more nicely with network packet loss and delays and such.

I'm not sure what the default nbd-client uses besides the fact that it uses a bespoke kernel-integrated driver for it.

A bit like iSCSI.

In this case I think the niceness of your experience is more attributable to a good driver, if I do not misunderstand.

(Or it's just that qemu handles them better as that's basically the only thing I've been heavily using that uses nbd devices instead of regular block device for its mounts)

I don't know what QEMU does with them at all, besides having heard in passing it does something.

@lispi314

Well when you have something on the network mounted through /dev/nbd* instead of FUSE it just behaves way more nicely. But as I said before that may just be skewed because of qemu being the main thing I used with it...

(you can mount image files, like e.g. img/vhd/vhdx/... files through a command as a network blockdevice. Thereby you can avoid having to specify offsets when mounting and such)

@lispi314

Also I've been told that /dev/nbd is really what the kernel developers want you to use when there is a potential for high latency and network-io involved. But I'm not sure how much of that is (still) true...

@agowa338 Hm, I'm not sure, but I've been moving everything off tgt & onto nbd as a consequence of limitations with tgt (and refusal to use the kernel driver server).

Given some of the new trends in enterprise NAS stuff, it's likely a lot of the iSCSI will become legacy maintenance too.

@lispi314
If I recall correctly it is in parts because the kernel handles it differently as it expects the syscalls towards it to take way longer than for FUSE which is primarily designed for local-ish filesystems that should return more or less instantly...

Oh and all of the iSCSI stuff becoming legacy doesn't really surprise me after having experienced that all of the documentation for the in-kernel iSCSI initiator went offline (at least intermittently) a while ago...

@agowa338 Ah. There isn't really an alternative for filesystem (not block-device) to FUSE as far as this goes.

There was some people talking about asynchronous & zero-copy stuff but that was different and went nowhere.

There is now stuff for FUSE-over-io-uring (lwn) which might have an impact as far as latency & asynchronism goes.

Oh and all of the iSCSI stuff becoming legacy doesn't really surprise me after having experienced that all of the documentation for the in-kernel iSCSI initiator went offline (at least intermittently) a while ago...

Yikes. Not fun.

FUSE-over-io-uring design documentation — The Linux Kernel documentation

@lispi314

> There isn't really an alternative for filesystem (not block-device) to FUSE as far as this goes.

Yea, true. But I have to admit, that I was primarily thinking of places where people used FUSE to expose a (kind of) block device.

Also I have to admit that all of the networked FUSE stuff has become significantly better over the past years too.

Oh and for iSCSI the domain was "linux-iscsi [.] org" and all of the other places just referenced it...

@agowa338 > Oh and for iSCSI the domain was "linux-iscsi [.] org" and all of the other places just referenced it...

Nothing spells long-term reliable like "we're basically completely separate but maintained in-tree"⸮
@agowa338 I think it's a better omen to just be completely separate and in user-space at that point.

Plus that way it doesn't depend on any given kernel.

@lispi314

Well I'm kinda split on that. Even though having all of that in userspace is good for security reasons it is also extremely bad for performance reasons as it adds a bunch of slow context switches and layers of indirection.

Esp. if you're running low performing hardware having a kernelspace-only driver for something that does heavy IO makes a significant difference....

@agowa338 That's where the io_uring zero-copy stuff would matter a lot, yeah.

Sure you'd still get the context switching but if it's still zero-copy?

@lispi314

well when you're on low-performing hardware every single goto matters as it kills the CPU instruction cache...

@lispi314 and even if it is zero-copy it'll almost certainly still get purged out of L1 cache in most cases.

Which is also something that significantly hurts when you're on the lower end hardware.

@agowa338 So, when we say lower-end, we mean Pentium 4 or we mean 6502? Because yeah, with the latter you're kind of fucked with anything that isn't basically an end-to-end monolith.

(At least as far as static systems & conventional architectures go.)

@lispi314

It's relative. The heavier you consider these performance tweaks the more you can run on lower end hardware.

Same for when you don't use k8s and CNI indirection layers with services and all compared to just throwing it on there natively "old fashion".

@agowa338 At a certain point the best option is to either give up on static systems and massively downgrade expectations or to switch to dynamic systems capable of adjusting to user activities (or otherwise smarter design).

(A multithread-aware JIT with adequate code hints could prove the safety of optimizing operations as a block, for example.)

By smarter design: Consider that a capability-addressed system in a high-level language could do basically the same thing Qubes OS does with an order of magnitude less resources. (Even if it's still a static system.)

@lispi314

As I just wrote in the other post what I was trying to get at was that security, usability, and performance form an impossible trinity.

(Just look at e.g. VMware how much you waste just for the management overhead of a VCF9 deployment. Or your average k8s cluster. It's insane. It doesn't matter for google scales but it for sure does for way smaller ones. Especially with the currently inflated prices for hardware...)

@agowa338 Eh, the trinity isn't that impossible if you rework the design.

As my own edit regarding Qubes OS mentions.

Security & performance are often in opposition because of bad design. It's only at the furthest extremes that one really has to choose.

k8s & vmware waste so much as a result of VMs (or a lot more than that, for k8s) which are themselves solely necessary in that capacity because of how broken the underlying computation model is.

@lispi314

Well you can optimise your deployment, but you will never be able to archive the objective maximum possible security, performance, and usability all at once.

And the HOW you deal with these constraints and in what areas you optimise is exactly what we're talking about right now. You can either compromise on some performance capabilities and get more usability and/or security. Or you do the opposite and sacrifice security and/or usability for pure performance...

@lispi314

Also see this post for what I mean with wasting a lot of resources for management components in a vcf9 deployment: https://chaos.social/@agowa338/116130946238939340

@agowa338 The thing is the capability system I suggested increases all three.

That's kind of what I mean by "the tension really only exists at the absolute extremes".

@lispi314 Not really. You'd for sure have some usability impacts as it isn't the commonly known easy-peasy design that an 8 year old can just work with (like k8s once it is deployed, for better or worse).

So you'd still only be able to get two but not all three.

@agowa338 For the Qubes example?

You could actually make a lot easier to use and a lot easier to understand even for an 8 years old.

Because an absurd amount of the current work & security overhead goes into compensating for other underlying mechanisms working at cross purposes.

The capability design allows one to just strip out the majority of the complexity.

The current common design actually is pretty abysmal in newbie-friendliness if you want to do anything more than use it in the absolute most common use-path. (First flaw you reach: You need to be familiar with GNU/Linux administration, systemd and VM administration on top of those two.)

The same applies to work/job scheduling with k8s & such.

Sending a reified thread/lambda with a message box to another computer without being able to interfere with anything it's not actively given a capability to really shouldn't be that hard.

@lispi314

well maybe "complexity" or "design" would be a better wording for what I was trying to get at with "usability" then.

But I think we're kinda on the same page by now the lines of "it depends" and you've to weight your options carefully.

As long as you can waste resources you don't have to spend much thoughts on "adding yet another layer" and similar.

@agowa338 @lispi314 tradeoffs don't prevent you from being well behind the Pareto frontier

@hayley @lispi314

Nobody said that you need to throw everything completely into the gutter.

Just that you've to spend time on your design to work out which areas you want to optimise into to get what you want and need out of it.

And if you're on the brim of being able to run your workload then what I said initially becomes a matter of concern.

Are security and reliability fundamentally incompatible?

8 comments

Lobsters

@hayley @agowa338

Completely backwards. Distributed systems are much harder to secure than monoliths and arose for economic reasons despite the increased complexity.

That one depends a lot on who you're trying to secure things from. If we include p2p interaction in "distributed systems" (which is a philosophical debate, I suppose).

@lispi314 @hayley

yea, threat modelling. So important.