software is amazing... my otherwise-idle 128 GB machine just started swapping and then invoked the OOM killer to nuke a Debug + code coverage LLVM build
@regehr how much parallelism?
@whitequark @regehr build system which gracefully handles memory pressure any day now!
@dotstdy @regehr is this even a build system's job? I can see both answers being appropriate, but if "yes" then you'll have to deal with a lot of pain making a cross-platform one
@dotstdy @regehr (personally I think I'd rather have my terminal emulator or login shell or something along those lines handle it)

@whitequark @dotstdy @regehr The build system still needs to be responsive to instructions from the system to scale up or down the amount of work it's doing.

(Isn't this OS level orchestration what Grand Central Dispatch is supposed to do on MacOS?)

@whitequark @regehr oh yeah it's a super tricky problem, especially when your build invokes another build system. I kinda put all these problems into the same bucket, akin to https://pvk.ca/Blog/2019/02/25/the-unscalable-thread-pool/

And they're all similar in that managing oversubscription is the operator's concern.

The unscalable, deadlock-prone, thread pool - Paul Khuong: some Lisp

Paul Khuong's personal blog. Some Lisp, some optimisation, mathematical or computer.

@dotstdy @regehr ninja now uses/provides a make jobserver endpoint which makes this problem a little less bad, but yeah I agree in general the situation is extremely bad
@whitequark @dotstdy @regehr "I agree in general the situation is extremely bad," is something that I would say of the state of computers 😅
@whitequark @dotstdy @regehr ninja has a concept of "pools" for exactly this sort of thing: limiting the number of concurrent linker invocations: https://ninja-build.org/manual.html#ref_pool
The Ninja build system

@tedmielczarek @whitequark @regehr yeah but that's very much the "problem is users own" approach, since the value for every concurrency limit needs to change depending on the available hardware resources.
@dotstdy @tedmielczarek @regehr but we don't have a way to predict how much memory a linker invocation will consume, do we? in which case that's sort of what we're stuck with, since any solution must necessarily be reactive
@whitequark @tedmielczarek @regehr I think tuning it the other way (i.e. this invocation requires 2gb) and having the job server track a budget, delaying launch (without introducing deadlocks) would allow you to scale to arbitrary hardware so long as the annotations are somewhat accurate. But yeah it's fraught with complexity. See also some kind of back-off recovery for extreme pressure since the work is ideally possible to re-launch.
@dotstdy @tedmielczarek @regehr I don't think there's any way to make the annotations somewhat accurate

@dotstdy @tedmielczarek @regehr I think what would really help is if a build system had the knobs to suspend a process if it consumes too much memory. ld eats more than 2 GB? pause it, let other processes finish, then let it restart

I don't know if that's feasible

@whitequark @tedmielczarek @regehr yeah that's what I was getting at by backoff, suspending is hard because you're still holding the memory while you're suspended. Currently build scripts end up hard coding parallel linker limits of 4 or whatever, which is trivially broken with low memory systems, and trivially too conservative for large systems. So if you said 2gb in your own build script (you know the software after all) that would allow you to avoid the majority of issues open softwares hit.
@whitequark @tedmielczarek @regehr but yeah some kind of "I'm swapping and non-critical so i don't want to be re-scheduled until memory pressure improves" would be interesting, I just worry about the inherent deadlocks there since the job server itself is the only thing which knows the dependencies between jobs, not the OS

@dotstdy @tedmielczarek @regehr

So if you said 2gb in your own build script (you know the software after all)

nope, not gonna work. you know the software, sure, but you don't know which crackhead build of the linker your users are gonna invoke it with, which custom options, which architecture... all of which can vary the resulting consumed memory by an order of magnitude at least. hell, even running a 32-bit linker instead of a 64-bit one is a big deal

@whitequark @tedmielczarek @regehr I mean that's true but if you're doing that I'm not really worried about your problems :') it's mostly just things like "I want to build llvm / rust / unreal engine / etc" on my PC with a threadripper and 16GB of ram (or realistically, my laptop with 16 threads and 16Gb of ram, and the OP) which I think is tractable to solve that way and still useful to fix in itself.
@dotstdy @tedmielczarek @regehr yeah I think that it is not only not tractable but also actively hostile to people for whom it fails (since now you've promised to unfuck the build system but it turned out to have been a lie). it's essentially guaranteed to breed resentment for your system
@dotstdy @tedmielczarek @regehr also now there's another arch/platform/commit sensitive parameter stored in version control that will get desynced from the "real" values unless actively maintained, and the incentives to keep it updated aren't exactly there
@whitequark @tedmielczarek @regehr yeah I'm just suggesting that the current version of it where you check in `max-threads=4` is somehow even worse since it has all the same problems plus it's static for all hardware. `max-threads=max(1, min(num-cpus, system-memory / 4GiB))` feels like a concrete improvement, even if it retains many of the same flaws. But I do broadly agree that a much better option would be to fix things in a way that also handles the dynamic nature of resource availability.
@whitequark @dotstdy @regehr cgroups might help on Linux but we're back to your original point about trying to do it in a portable way being a nightmare.
@whitequark @dotstdy @tedmielczarek it doesn't seem hard (deadlocks notwithstanding)
@regehr @dotstdy @tedmielczarek how would you do it? SIGSTOP?
@whitequark @dotstdy @tedmielczarek any convenient mechanism would be fine for pausing the process, and then I guess a new syscall to push the entire process out of RAM? I mean, most unixes have been able to do that, but I don't know that Linux currently can...
@regehr @dotstdy @tedmielczarek I think existing swap behavior will take care of that provided you do have enough swap in first place; I think maybe the build system could be designed to kill the paused process with the least amount of runtime if it thinks the system is low on memory?
@whitequark @dotstdy @tedmielczarek yeah killing might be totally reasonable since build system commands are usually idempotent
@whitequark @dotstdy @tedmielczarek that's maybe my favorite idea from here so far, seems like it could be implemented in e.g. ninja without touching any other code
@regehr @whitequark @tedmielczarek yeah and in a build system sense it mirrors the "kill and retry" approach for dependency resolution, which is kinda neat too. It sounds better than relying on swap to me since you don't have to burn CPU time writing all that transient (and potentially cold) data out to disk and loading it back in.
@regehr @whitequark @dotstdy @tedmielczarek Eventually the OOM killer will do its job, if there's a way to detect that that happened and move that task to the end and run it by itself that would already make my life slightly easier.
@crzwdjk @regehr @dotstdy @tedmielczarek the kernel OOM killer should be assumed to not exist because of the inherent constraints on its function (it can only kill a process after it exhausts all other options, by which point the end user has already rebooted the machine in frustration)
@whitequark @crzwdjk @regehr @dotstdy @tedmielczarek yeah i have had it kill a leaky process eating close to 100gb of memory after about a day of thrashing and killing some random other process
@charlotte @crzwdjk @dotstdy @regehr @tedmielczarek @whitequark On Linux, I would assume a build system could act based on pressure stall information. Stuff like that is pretty much what PSI is for.
@regehr @whitequark @dotstdy @tedmielczarek now this description is starting to feel like using CRIU to checkpoint a gcc process to a file in userspace
@mokomull @regehr @dotstdy @tedmielczarek I did think of maybe using CRIU here, but it feels like it'll cause hard-to-debug issues in slightly unusual configurations (what if a compiler plugin uses a network to talk to a database? this exists), not to mention being completely non-portable
@whitequark @regehr @dotstdy @tedmielczarek I wrote off portability when working-around-the-oomkiller's-behavior was mentioned ... and when I've had this same frustration has been on shockingly-modern Linux kernels at $last_job.
@mokomull @regehr @dotstdy @tedmielczarek Windows and Darwin definitely have the APIs to account for a process' memory use and to kill it! seems like it could be portable to those platforms at least (and frankly almost any I can think of)

@regehr @whitequark @dotstdy @tedmielczarek this sounds a lot like memory.high in cgroups, where processes that exceed chosen usage get put under reclaim pressure and receive less CPU.

I'm not familiar with how it's actually done, but it feels like an OOMd equivalent that's managing a cgroup (containing subgroups for parallel build tasks) could both monitor memory use and throttle or suspend tasks if needed

@pmarheine @regehr @dotstdy @tedmielczarek I think cgroups may be the right tool to deal with this on Linux, but I have two concerns:

  • Linux is not the only operating system people run builds on
  • even on Linux, running 16 linkers in parallel can very quickly eat up a lot of memory in a way that merely adding reclaim pressure and reducing CPU quota will not solve
@whitequark @pmarheine @regehr @dotstdy @tedmielczarek You can definitely limit cgroup memory usage and force just a cgroup to OOM if it collectively runs out (eg, your 16 parallel linkers in a single cgroup). On a systemd based Linux this can be scripted, so you could run a build in an environment where it could only use a maximum of X amount of system RAM, although you'd have to decide the X in advance and if other bits used an unexpected amount of RAM you could still blow up.
@cks @pmarheine @regehr @dotstdy @tedmielczarek but I don't want them to collectively OOM, I want them to serialize themselves

@whitequark @pmarheine @regehr @dotstdy @tedmielczarek It looks like there's no good support for this in systemd/cgroup right now. Systemd has an API for getting memory pressure information¹, but I don't know of any command line programs you can run to watch this and take action for subordinate processes. Cgroup(v2) can freeze an entire cgroup² by writing to cgroup.freeze, but again one would have to write tooling around this.

¹ https://systemd.io/MEMORY_PRESSURE/
² https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Memory Pressure Handling

@whitequark @pmarheine @regehr @dotstdy @tedmielczarek I was hoping systemd would have a 'freeze this unit if there is memory pressure' or similar setting, but AFAIK there's nothing for that.

(You can make something not start if there's too much memory pressure, but that doesn't help for 'I started 16 linkers when the memory pressure was low and now they're all making the memory pressure high'.)

Oh well. Systemd doesn't do everything.

@cks @whitequark @pmarheine @regehr @dotstdy the classic GNU Make approach: "don't start any new jobs if the system load average is above this number": https://www.gnu.org/software/make/manual/html_node/Options-Summary.html#index-_002d_002dload_002daverage-1
Options Summary (GNU make)

Options Summary (GNU make)

@whitequark @dotstdy @tedmielczarek @regehr
Exactly. The build system had no clue how much resources each tool invocation is going to need.

And say, the linker can only see how much memory is available on the host, not how much would actually be okay for it to use. Perhaps we need a "--mem" option for tools, analogous to the -j option for threads.

@whitequark @dotstdy @regehr now I'm wondering if there are heuristics that would be workable… Number of input objects? Total size (in bytes) of all input objects? Things like LTO complicate matters, but maybe they could be accounted for?
@tedmielczarek @dotstdy @regehr I think it's probably fine to give up on thick LTO and only consider thin LTO, which scales more or less linearly in the size of input code; without specifically commenting on whether I think heuristics will work (as I think only testing will show)
@whitequark @dotstdy @regehr this reminds me that I always wondered how much of a pain it would be to make sccache able to cache linker invocations. (there's so much linker input that's not expressed on the commandline, it's ridiculous.)
@whitequark yeah of course that was the problem. this was whatever ninja wanted to do by default, but after it died a "ninja -j1" succeeded
@regehr Time for LLVM data centers.
@regehr In truth, that is pretty bad.
@gwozniak @regehr I just assumed John would be running a large server farm tbh, a sort of Cloud Lab, if you will
Thoughts on the Bluesky public incident write-up

Back on April 4, the social media site Bluesky suffered a pretty big outage. I was delighted to discover that one of their engineers, Jim Calabro, published a public writeup about it: April 2026 Ou…

Surfing Complexity
@norootcause @regehr la la la I don't hear you 🙉
@regehr Such a pain. Some builds want compilers == cores, others will swap to death if you try that.