inotify file limits can be confusing/misleadingI've recently been watching a build for a package on my server fail inconsistently ~50% of the times I build updates, and I was confused as to why it was missing the binary cache in the first. Oh fun, how reproduceable. I spent an hour or two yesterday reading the code and the failing tests and realized it was because the test was stress-testing inotify system calls:
Traceback (most recent call last):
File "/nix/store/zv1kaq7f1q20x62kbjv6pfjygw5jmwl6-python3-3.12.7/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/build/source/src/documents/tests/test_management_consumer.py", line 30, in run
self.cmd.handle(directory=settings.CONSUMPTION_DIR, oneshot=False, testing=True)
File "/build/source/src/documents/management/commands/document_consumer.py", line 251, in handle
self.handle_inotify(directory, recursive, options["testing"])
File "/build/source/src/documents/management/commands/document_consumer.py", line 294, in handle_inotify
inotify = INotify()
^^^^^^^^^
File "/nix/store/3ziqbc4xcs58hhh5srx7pfl2n9mwj22g-python3.12-inotifyrecursive-0.3.5/lib/python3.12/site-packages/inotifyrecursive/inotifyrecursive.py", line 31, in __init__
inotify_simple.INotify.__init__(self)
File "/nix/store/1qv923rjqijj7nbhhm9k1bz53jh9pb3a-python3.12-inotify-simple-1.3.5/lib/python3.12/site-packages/inotify_simple.py", line 91, in __init__
FileIO.__init__(self, _libc_call(_libc.inotify_init1, flags), mode='rb')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/nix/store/1qv923rjqijj7nbhhm9k1bz53jh9pb3a-python3.12-inotify-simple-1.3.5/lib/python3.12/site-packages/inotify_simple.py", line 39, in _libc_call
raise OSError(errno, os.strerror(errno))
OSError: [Errno 24] Too many open files
inotify is an API that allows your process to be notified when files are accessed, changed, deleted, etc. Because inotify queues can use a fair bit of memory, Linux implements specific interfaces to limit these calls:
The following interfaces can be used to limit the amount of kernel memory consumed by inotify:
/proc/sys/fs/inotify/max_queued_events
The value in this file is used when an application calls
inotifyinit(2) to set an upper limit on the number of
events that can be queued to the corresponding inotify
instance. Events in excess of this limit are dropped, but
an INQ_OVERFLOW event is always generated.
/proc/sys/fs/inotify/max_user_instances
This specifies an upper limit on the number of inotify
instances that can be created per real user ID.
/proc/sys/fs/inotify/max_user_watches
This specifies an upper limit on the number of watches
that can be created per real user ID.
I found a Nixpkgs GitHub Issue about the build failure where folks saw the "too many open files" and assumed it was ulimit configuration issues on the build hosts that stumped even the NixOS Super Posters, but it's this other more obscure limit that nevertheless raises the same Errno when it's hit. Classic Linux! These values are set to arbitrarily low values by default, and I remembered that the inotify watcher in the The Arcology Project 's FastAPI prototype bumped up against these limits when I deployed it way back when. Classic Linux!
There are two machines in /etc/nix/machines, my Framework 13 Laptop and My Homelab Build . Presumably the build works on one but not the other, but I no longer explicitly set these values, so something in NixOS itself must be, this is easy enough to check:
pushd ~/arroyo-nix
grep -ri fs.inotify.max_user_watches
pushd ~/Code/nixpkgs
grep -ri fs.inotify.max_user_watches
~/arroyo-nix ~/org
~/Code/nixpkgs ~/arroyo-nix ~/org
nixos/modules/services/misc/graphical-desktop.nix: "fs.inotify.max_user_watches" = lib.mkDefault 524288;
nixos/modules/virtualisation/lxd.nix: "fs.inotify.max_user_watches" = 1048576;
nixos/modules/virtualisation/incus.nix: "fs.inotify.max_user_watches" = lib.mkOverride 1050 1048576; # override in case conflict nixos/modules/services/x11/xserver.nix
So it's set to a higher value by enabling LXD (which I believe Waydroid does) but also by enabling any graphical desktop. So the package would build on my laptop but not my server or any "stock" NixOS server... a 50-50 shot.
source:sudo sysctl -w fs.inotify.max_user_watches=524288
sudo sysctl -w fs.inotify.max_user_instances=524288
I set this temporarily on my server and the build ran reliably 5 times though the tests are sooooooo freakin' slow... I hope the Hydra instance that publishes the binary cache gets around to something like this, rather than disabling the tests that validate that document consumption in this document scanning/processing service works properly....
It can be set "for good" with this NixOS configuration:
nix source:boot.kernel.sysctl = {
"fs.inotify.max_user_instances" = 524288;
"fs.inotify.max_user_watches" = 524288;
};
https://arcology.garden/grymoires/shell#20250116T105516.143422