software is amazing... my otherwise-idle 128 GB machine just started swapping and then invoked the OOM killer to nuke a Debug + code coverage LLVM build
@regehr how much parallelism?
@whitequark @regehr build system which gracefully handles memory pressure any day now!
@dotstdy @regehr is this even a build system's job? I can see both answers being appropriate, but if "yes" then you'll have to deal with a lot of pain making a cross-platform one
@whitequark @dotstdy @regehr ninja has a concept of "pools" for exactly this sort of thing: limiting the number of concurrent linker invocations: https://ninja-build.org/manual.html#ref_pool
The Ninja build system

@tedmielczarek @whitequark @regehr yeah but that's very much the "problem is users own" approach, since the value for every concurrency limit needs to change depending on the available hardware resources.
@dotstdy @tedmielczarek @regehr but we don't have a way to predict how much memory a linker invocation will consume, do we? in which case that's sort of what we're stuck with, since any solution must necessarily be reactive
@whitequark @tedmielczarek @regehr I think tuning it the other way (i.e. this invocation requires 2gb) and having the job server track a budget, delaying launch (without introducing deadlocks) would allow you to scale to arbitrary hardware so long as the annotations are somewhat accurate. But yeah it's fraught with complexity. See also some kind of back-off recovery for extreme pressure since the work is ideally possible to re-launch.
@dotstdy @tedmielczarek @regehr I don't think there's any way to make the annotations somewhat accurate

@dotstdy @tedmielczarek @regehr I think what would really help is if a build system had the knobs to suspend a process if it consumes too much memory. ld eats more than 2 GB? pause it, let other processes finish, then let it restart

I don't know if that's feasible

@whitequark @dotstdy @tedmielczarek it doesn't seem hard (deadlocks notwithstanding)
@regehr @dotstdy @tedmielczarek how would you do it? SIGSTOP?
@whitequark @dotstdy @tedmielczarek any convenient mechanism would be fine for pausing the process, and then I guess a new syscall to push the entire process out of RAM? I mean, most unixes have been able to do that, but I don't know that Linux currently can...
@regehr @dotstdy @tedmielczarek I think existing swap behavior will take care of that provided you do have enough swap in first place; I think maybe the build system could be designed to kill the paused process with the least amount of runtime if it thinks the system is low on memory?
@whitequark @dotstdy @tedmielczarek yeah killing might be totally reasonable since build system commands are usually idempotent
@whitequark @dotstdy @tedmielczarek that's maybe my favorite idea from here so far, seems like it could be implemented in e.g. ninja without touching any other code
@regehr @whitequark @tedmielczarek yeah and in a build system sense it mirrors the "kill and retry" approach for dependency resolution, which is kinda neat too. It sounds better than relying on swap to me since you don't have to burn CPU time writing all that transient (and potentially cold) data out to disk and loading it back in.
@regehr @whitequark @dotstdy @tedmielczarek Eventually the OOM killer will do its job, if there's a way to detect that that happened and move that task to the end and run it by itself that would already make my life slightly easier.
@crzwdjk @regehr @dotstdy @tedmielczarek the kernel OOM killer should be assumed to not exist because of the inherent constraints on its function (it can only kill a process after it exhausts all other options, by which point the end user has already rebooted the machine in frustration)
@whitequark @crzwdjk @regehr @dotstdy @tedmielczarek yeah i have had it kill a leaky process eating close to 100gb of memory after about a day of thrashing and killing some random other process
@charlotte @crzwdjk @dotstdy @regehr @tedmielczarek @whitequark On Linux, I would assume a build system could act based on pressure stall information. Stuff like that is pretty much what PSI is for.