Mastodawn

software is amazing... my otherwise-idle 128 GB machine just started swapping and then invoked the OOM killer to nuke a Debug + code coverage LLVM build

Show thread

✧✦Catherine✦✧2d ago

@regehr how much parallelism?

Show thread

Josh Simmons 2d ago

@whitequark @regehr build system which gracefully handles memory pressure any day now!

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @regehr is this even a build system's job? I can see both answers being appropriate, but if "yes" then you'll have to deal with a lot of pain making a cross-platform one

Show thread

Ted Mielczarek 2d ago

@whitequark @dotstdy @regehr ninja has a concept of "pools" for exactly this sort of thing: limiting the number of concurrent linker invocations: https://ninja-build.org/manual.html#ref_pool

The Ninja build system

Show thread

Josh Simmons 2d ago

@tedmielczarek @whitequark @regehr yeah but that's very much the "problem is users own" approach, since the value for every concurrency limit needs to change depending on the available hardware resources.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr but we don't have a way to predict how much memory a linker invocation will consume, do we? in which case that's sort of what we're stuck with, since any solution must necessarily be reactive

Show thread

Josh Simmons 2d ago

@whitequark @tedmielczarek @regehr I think tuning it the other way (i.e. this invocation requires 2gb) and having the job server track a budget, delaying launch (without introducing deadlocks) would allow you to scale to arbitrary hardware so long as the annotations are somewhat accurate. But yeah it's fraught with complexity. See also some kind of back-off recovery for extreme pressure since the work is ideally possible to re-launch.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr I don't think there's any way to make the annotations somewhat accurate

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr I think what would really help is if a build system had the knobs to suspend a process if it consumes too much memory. ld eats more than 2 GB? pause it, let other processes finish, then let it restart

I don't know if that's feasible

Show thread

Josh Simmons

@whitequark @tedmielczarek @regehr yeah that's what I was getting at by backoff, suspending is hard because you're still holding the memory while you're suspended. Currently build scripts end up hard coding parallel linker limits of 4 or whatever, which is trivially broken with low memory systems, and trivially too conservative for large systems. So if you said 2gb in your own build script (you know the software after all) that would allow you to avoid the majority of issues open softwares hit.

Show thread

Josh Simmons 2d ago

@whitequark @tedmielczarek @regehr but yeah some kind of "I'm swapping and non-critical so i don't want to be re-scheduled until memory pressure improves" would be interesting, I just worry about the inherent deadlocks there since the job server itself is the only thing which knows the dependencies between jobs, not the OS

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr

So if you said 2gb in your own build script (you know the software after all)

nope, not gonna work. you know the software, sure, but you don't know which crackhead build of the linker your users are gonna invoke it with, which custom options, which architecture... all of which can vary the resulting consumed memory by an order of magnitude at least. hell, even running a 32-bit linker instead of a 64-bit one is a big deal

Show thread

Josh Simmons 2d ago

@whitequark @tedmielczarek @regehr I mean that's true but if you're doing that I'm not really worried about your problems :') it's mostly just things like "I want to build llvm / rust / unreal engine / etc" on my PC with a threadripper and 16GB of ram (or realistically, my laptop with 16 threads and 16Gb of ram, and the OP) which I think is tractable to solve that way and still useful to fix in itself.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr yeah I think that it is not only not tractable but also actively hostile to people for whom it fails (since now you've promised to unfuck the build system but it turned out to have been a lie). it's essentially guaranteed to breed resentment for your system

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr also now there's another arch/platform/commit sensitive parameter stored in version control that will get desynced from the "real" values unless actively maintained, and the incentives to keep it updated aren't exactly there

Show thread

Josh Simmons 2d ago

@whitequark @tedmielczarek @regehr yeah I'm just suggesting that the current version of it where you check in `max-threads=4` is somehow even worse since it has all the same problems plus it's static for all hardware. `max-threads=max(1, min(num-cpus, system-memory / 4GiB))` feels like a concrete improvement, even if it retains many of the same flaws. But I do broadly agree that a much better option would be to fix things in a way that also handles the dynamic nature of resource availability.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr yeah I can agree with this!