PSA: So you want to be a good kid, and understand that UNIX file system paths are kind wonky, and not stable references to inodes. So you drink the Linux cool-aid, and become a heavy O_PATH user: you pin all inodes via fds, validate them before you open them, use openat() heavily to get from one inode to descendents and are extra careful everywhere. And you think you saw the light.

But then one day, you realize, you *actually* have been doing it all wrong.

Because here's the thing: if you go from one dir for which you have an O_PATH fd to another dir contained in it, via openat(fd, name, O_PATH|O_CLOEXEC), then this actually doesn't trigger autofs mounts. So if "name" actually is an autofs mount, you end up pinning an inode on the autofs, and not one on the file system it's supposed to be overmounted with. So if you then use openat(…, O_PATH|O_CLOEXEC) to go further down, it will *always* fail, because the autofs doesn't contain any further…

…inodes this could possibly open for you.

Yikes.

But here I am, to help you, if you find yourself in this situation. As it turns out there actually *is* a system call you can use here that does what is needed, and (almost) noone knows about it: open_tree().

(See: https://github.com/brauner/man-pages-md/blob/main/open_tree.md)

If used without the OPEN_TREE_CLONE flag (and that part is crucial) it is equivalent to openat() with O_PATH, except in one regard: it will trigger automounts if you want, and it will thus get…

man-pages-md/open_tree.md at main · brauner/man-pages-md

Contribute to brauner/man-pages-md development by creating an account on GitHub.

GitHub

…you an O_PATH fd to the overmounting fs, not the autofs one.

Yay!

TLDR: there's a good chance many (most?) of the openat(O_PATH) calls in the wild are kinda wrong, and everyone should have used open_tree() instead...

Lesson learned:

https://github.com/systemd/systemd/pull/38048

(And of course, @brauner thanks for enlighening me about this fix)

chase: when chasing paths, trigger automounts by poettering · Pull Request #38048 · systemd/systemd

As it turns out open() with O_PATH does not trigger autofs, you get a reference to the autofs inode, if not triggered. But there's a way out: open_tree() (when specified without OPEN_TREE_CLONE...

GitHub
And here's another lesson: it's one thing to realize that POSIX fs APIs are actually awful and an insecure mess, it's quite another thing to actually get it right even with the much better Linux fs APIs.
@pid_eins My lesson from all this is that people are apparently still using autofs
@muvlon @pid_eins systemd gives you auto mount units and they're pretty nifty

@muvlon yeah, systemd sets up some by default. for example /proc/sys/fs/binfmt_misc is one, and the ESPs are set up like this too by default, if you let systemd take care of that for you (which your *really* should, because it means the ESP with its brittle FAT file system remains unmounted most of the time, unless *actually* accessed).

Hence, yeah, it's a pretty commonly used thing.

@pid_eins this feels like a linux bug that is unfortunately on us to work around... I've never used autofs, is there any sensible reason why openat() shouldn't trigger autofs mounts?
@mildsunrise compat with the status quo seems like a good reason to me.
@pid_eins so openat() shouldn't trigger autofs mounts because if it did, that'd break "compatibility with the status quo"? do you have any examples in mind?
@mildsunrise my educated guess is that you might fuck up the shutdown process of systemd systems because we try to disassemble the mount tree properly, and do so in a fashion that doesn't trigger automounts, doesn't hang on dead nfs or fuse and so on, since we must be ready to deal with all that because network is dead already, or backing daemons are gone already. In all these cases you want to validate inodes only superficially, not look inside them, hence O_PATH is useful there.
@mildsunrise note that openat() with O_PATH doesn't follow autofs, but openat() without does, in case there's any confusion there.
@pid_eins indeed there was confusion there, thanks! in that case the behavior makes more sense and I can see why it's preferable
@pid_eins but still, does openat(mountpoint_fd, "subdir", O_PATH) trigger a mount? I'd personally expect that it would, since you're actually reading the mountpoint's contents at this point
@mildsunrise @pid_eins or it seems that it would at least make sense to have a flag that allows such mounts to occur...
@mildsunrise as I said, openat() *with* O_PATH does *not* honour autofs, it does not trigger the autofs. It returns you a reference of the inode belonging the "autofs" superblock, not an inode belonging to the superblock of the file system that is supposed to overmount the autofs superblock.
@pid_eins ugh, right, the inode of the mountpoint changes once it's been mounted. i see..
@pid_eins @brauner
this could be an extremely wrong / nonsensical question because idk how much of this works internally, but does it also apply to using openat() *without* O_PATH?
@refi64 @brauner no, if you do not use O_PATH nothing of the above matters. But you kinda have to use O_PATH if you want to operate securely, because it allows you to pin an inode first, figure out its details and then act on it. If you open it directly you might end up talking to a driver because you opened a device node accidentally, or all kinds of other weird stuff.
@pid_eins 'ls' misbehaves on automounted NFS, probably because of this. 'ls /mnt/point' fails until you do 'ls /mnt/point/' once.
@alg0w

@tetrislife @alg0w hmm, if I strace coreutils' "ls" I see it doesn't use O_PATH or anything. It instead uses statx() and explicitly specifies AT_NO_AUTOMOUNT to avoid triggering automounts. Seems very much on purpose hence.

(And of course is super-racy, in classic POSIX style, because after the statx() by path it uses open() by path to actually open the thing.)

TLDR: different situation. Seems on purpose in that case. What it does is not particularly enlightened towards races.