software is amazing... my otherwise-idle 128 GB machine just started swapping and then invoked the OOM killer to nuke a Debug + code coverage LLVM build
@regehr how much parallelism?
@whitequark @regehr build system which gracefully handles memory pressure any day now!
@dotstdy @regehr is this even a build system's job? I can see both answers being appropriate, but if "yes" then you'll have to deal with a lot of pain making a cross-platform one
@whitequark @dotstdy @regehr ninja has a concept of "pools" for exactly this sort of thing: limiting the number of concurrent linker invocations: https://ninja-build.org/manual.html#ref_pool
The Ninja build system

@tedmielczarek @whitequark @regehr yeah but that's very much the "problem is users own" approach, since the value for every concurrency limit needs to change depending on the available hardware resources.
@dotstdy @tedmielczarek @regehr but we don't have a way to predict how much memory a linker invocation will consume, do we? in which case that's sort of what we're stuck with, since any solution must necessarily be reactive
@whitequark @tedmielczarek @regehr I think tuning it the other way (i.e. this invocation requires 2gb) and having the job server track a budget, delaying launch (without introducing deadlocks) would allow you to scale to arbitrary hardware so long as the annotations are somewhat accurate. But yeah it's fraught with complexity. See also some kind of back-off recovery for extreme pressure since the work is ideally possible to re-launch.
@dotstdy @tedmielczarek @regehr I don't think there's any way to make the annotations somewhat accurate

@dotstdy @tedmielczarek @regehr I think what would really help is if a build system had the knobs to suspend a process if it consumes too much memory. ld eats more than 2 GB? pause it, let other processes finish, then let it restart

I don't know if that's feasible

@whitequark @tedmielczarek @regehr yeah that's what I was getting at by backoff, suspending is hard because you're still holding the memory while you're suspended. Currently build scripts end up hard coding parallel linker limits of 4 or whatever, which is trivially broken with low memory systems, and trivially too conservative for large systems. So if you said 2gb in your own build script (you know the software after all) that would allow you to avoid the majority of issues open softwares hit.
@whitequark @tedmielczarek @regehr but yeah some kind of "I'm swapping and non-critical so i don't want to be re-scheduled until memory pressure improves" would be interesting, I just worry about the inherent deadlocks there since the job server itself is the only thing which knows the dependencies between jobs, not the OS

@dotstdy @tedmielczarek @regehr

So if you said 2gb in your own build script (you know the software after all)

nope, not gonna work. you know the software, sure, but you don't know which crackhead build of the linker your users are gonna invoke it with, which custom options, which architecture... all of which can vary the resulting consumed memory by an order of magnitude at least. hell, even running a 32-bit linker instead of a 64-bit one is a big deal

@whitequark @tedmielczarek @regehr I mean that's true but if you're doing that I'm not really worried about your problems :') it's mostly just things like "I want to build llvm / rust / unreal engine / etc" on my PC with a threadripper and 16GB of ram (or realistically, my laptop with 16 threads and 16Gb of ram, and the OP) which I think is tractable to solve that way and still useful to fix in itself.
@dotstdy @tedmielczarek @regehr yeah I think that it is not only not tractable but also actively hostile to people for whom it fails (since now you've promised to unfuck the build system but it turned out to have been a lie). it's essentially guaranteed to breed resentment for your system
@dotstdy @tedmielczarek @regehr also now there's another arch/platform/commit sensitive parameter stored in version control that will get desynced from the "real" values unless actively maintained, and the incentives to keep it updated aren't exactly there
@whitequark @tedmielczarek @regehr yeah I'm just suggesting that the current version of it where you check in `max-threads=4` is somehow even worse since it has all the same problems plus it's static for all hardware. `max-threads=max(1, min(num-cpus, system-memory / 4GiB))` feels like a concrete improvement, even if it retains many of the same flaws. But I do broadly agree that a much better option would be to fix things in a way that also handles the dynamic nature of resource availability.