Mastodawn

software is amazing... my otherwise-idle 128 GB machine just started swapping and then invoked the OOM killer to nuke a Debug + code coverage LLVM build

Show thread

✧✦Catherine✦✧2d ago

@regehr how much parallelism?

Show thread

Josh Simmons 2d ago

@whitequark @regehr build system which gracefully handles memory pressure any day now!

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @regehr is this even a build system's job? I can see both answers being appropriate, but if "yes" then you'll have to deal with a lot of pain making a cross-platform one

Show thread

Ted Mielczarek 2d ago

@whitequark @dotstdy @regehr ninja has a concept of "pools" for exactly this sort of thing: limiting the number of concurrent linker invocations: https://ninja-build.org/manual.html#ref_pool

The Ninja build system

Show thread

Josh Simmons 2d ago

@tedmielczarek @whitequark @regehr yeah but that's very much the "problem is users own" approach, since the value for every concurrency limit needs to change depending on the available hardware resources.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr but we don't have a way to predict how much memory a linker invocation will consume, do we? in which case that's sort of what we're stuck with, since any solution must necessarily be reactive

Show thread

Josh Simmons 2d ago

@whitequark @tedmielczarek @regehr I think tuning it the other way (i.e. this invocation requires 2gb) and having the job server track a budget, delaying launch (without introducing deadlocks) would allow you to scale to arbitrary hardware so long as the annotations are somewhat accurate. But yeah it's fraught with complexity. See also some kind of back-off recovery for extreme pressure since the work is ideally possible to re-launch.

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr I don't think there's any way to make the annotations somewhat accurate

Show thread

✧✦Catherine✦✧2d ago

@dotstdy @tedmielczarek @regehr I think what would really help is if a build system had the knobs to suspend a process if it consumes too much memory. ld eats more than 2 GB? pause it, let other processes finish, then let it restart

I don't know if that's feasible

Show thread

John Regehr 2d ago

@whitequark @dotstdy @tedmielczarek it doesn't seem hard (deadlocks notwithstanding)

Show thread

✧✦Catherine✦✧2d ago

@regehr @dotstdy @tedmielczarek how would you do it? SIGSTOP?

Show thread

John Regehr 2d ago

@whitequark @dotstdy @tedmielczarek any convenient mechanism would be fine for pausing the process, and then I guess a new syscall to push the entire process out of RAM? I mean, most unixes have been able to do that, but I don't know that Linux currently can...

Show thread

Peter Marheine 1d ago

@regehr @whitequark @dotstdy @tedmielczarek this sounds a lot like memory.high in cgroups, where processes that exceed chosen usage get put under reclaim pressure and receive less CPU.

I'm not familiar with how it's actually done, but it feels like an OOMd equivalent that's managing a cgroup (containing subgroups for parallel build tasks) could both monitor memory use and throttle or suspend tasks if needed

Show thread

✧✦Catherine✦✧1d ago

@pmarheine @regehr @dotstdy @tedmielczarek I think cgroups may be the right tool to deal with this on Linux, but I have two concerns:

Linux is not the only operating system people run builds on
even on Linux, running 16 linkers in parallel can very quickly eat up a lot of memory in a way that merely adding reclaim pressure and reducing CPU quota will not solve

Show thread

Chris Siebenmann 1d ago

@whitequark @pmarheine @regehr @dotstdy @tedmielczarek You can definitely limit cgroup memory usage and force just a cgroup to OOM if it collectively runs out (eg, your 16 parallel linkers in a single cgroup). On a systemd based Linux this can be scripted, so you could run a build in an environment where it could only use a maximum of X amount of system RAM, although you'd have to decide the X in advance and if other bits used an unexpected amount of RAM you could still blow up.

Show thread

✧✦Catherine✦✧

@cks @pmarheine @regehr @dotstdy @tedmielczarek but I don't want them to collectively OOM, I want them to serialize themselves

Show thread

Chris Siebenmann 1d ago

@whitequark @pmarheine @regehr @dotstdy @tedmielczarek It looks like there's no good support for this in systemd/cgroup right now. Systemd has an API for getting memory pressure information¹, but I don't know of any command line programs you can run to watch this and take action for subordinate processes. Cgroup(v2) can freeze an entire cgroup² by writing to cgroup.freeze, but again one would have to write tooling around this.

¹ https://systemd.io/MEMORY_PRESSURE/
² https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Memory Pressure Handling

Show thread

Chris Siebenmann 1d ago

@whitequark @pmarheine @regehr @dotstdy @tedmielczarek I was hoping systemd would have a 'freeze this unit if there is memory pressure' or similar setting, but AFAIK there's nothing for that.

(You can make something not start if there's too much memory pressure, but that doesn't help for 'I started 16 linkers when the memory pressure was low and now they're all making the memory pressure high'.)

Oh well. Systemd doesn't do everything.

Show thread

Ted Mielczarek 1d ago

@cks @whitequark @pmarheine @regehr @dotstdy the classic GNU Make approach: "don't start any new jobs if the system load average is above this number": https://www.gnu.org/software/make/manual/html_node/Options-Summary.html#index-_002d_002dload_002daverage-1

The Ninja build system

Memory Pressure Handling

Options Summary (GNU make)