Mastodawn

lewurm Jul 30, 2024

I had a feeling that kernel compilation got slower recently and tried to find the slowest file across randconfig builds. It turned out to be arch/x86/xen/setup.c, which takes 15 seconds to preprocess on a reasonably fast Apple M1 Ultra.

This all comes from one line "extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages, max_pages - max_pfn);" that expands to 47MB of preprocessor output after commits 80fcac55385c..867046cc70277.

Arnd Bergmann Jul 23, 2024

If anyone wonders what other files are bad, this is the list of all files with over five second compile times: https://pastebin.com/raw/fXJe7y9P.

The worst case for this particular file was 44 seconds, three times slower than in defconfig.

The median compile time across all files is 0.26s.

🐧sima🐧Jul 23, 2024

@arnd clicked through and pretty much got what I expected 😔

penguin42 Jul 23, 2024

@arnd I guess the AMD ones are all the huge register definitions they have?

Conor Dooley Jul 23, 2024

@arnd so you're telling me that, not only does amdgpu produce most of the build issues that I run into, it also is one of the worst things to enable when it comes to my build time? Hmmge

Arnd Bergmann Jul 23, 2024

@conor the other thing to realize about this driver is that it's about the same size as linux-2.6.12, 5.8M lines of code. I had a look at what makes it so slow to compile and as far as I can tell it's just the size of the individual files and how they contain functions with thousands of lines and dozens of arguments.

Emil Velikov Jul 24, 2024

@arnd can you share the script to get those numbers - or is there a make target for it these days?

jn (big endian energy)Jul 22, 2025

@arnd is this some kind of unchecked/unintentional recursion?

Arnd Bergmann Jul 24, 2025

@jn something like that, yes. IIRC one macro expands its arguments five times or more, and the user nests it several times, so the expression that gets passed in is expanded thousands of times in the end.

Geert Uytterhoeven Jul 23, 2024

@arnd Opening arch/x86/xen/setup.i and going to the end of the offending line looks like a good DoS-attach on vim.

Kees Cook (old account)Jul 23, 2024

@arnd 47M from a single composition of max()?? I am terrified to go look now ...

KeyJ Jul 23, 2024

@arnd But ... how? Even if it expanded to 47 kilobytes, that would be excessive, but 47 *megabytes*? 🤯

Arnd Bergmann Jul 23, 2024

@KeyJ it nests min() multiple levels deep with the use of min3(), and each one expands its argument 20 times times now (up from 6 back in linux-6.6). This gets 8000 expansions for each of the arguments, plus a lot of extra bits with each expansion. PFN_DOWN(MAXMEM) contributes a bit to the initial size as well.

See https://pastebin.com/MmfWH7TM for the first few pages of it.

unsigned long extra_pages = 0; extra_pages = __builtin_choose_expr((sizeof(in - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin

Dirk Steins Jul 23, 2024

@arnd @KeyJ and this is effective and efficient code? Doesn’t look like it to me - but I’m not a kernel developer.

Mans R Jul 23, 2024

@dirksteins @arnd @KeyJ It's a huge constant expression that gets evaluated by the compiler. The final code is (probably) fine.

Dirk Steins Jul 23, 2024

@mansr @arnd @KeyJ probably. But I still ask myself if a 47MB expression is useful and if the compiler is able to process this correctly (it probably is, but is it efficient?).

Mans R Jul 23, 2024

@dirksteins @arnd @KeyJ There's no reason to doubt the correctness, but as has been noted, it is somewhat slow.

Now as shocking as this might be, it is quite tame compared to the boost C++ library.

d@nny disc@ mc² Jan 4, 2025

@mansr @dirksteins @arnd @KeyJ since it's never checked in or written to the filesystem, it's more of an intermediate expression or IR than anything else. in this framing, it may be compared directly to to the kinds of in-memory data structures produced for c++ template metaprogramming. i don't know the appropriate answer to this specifically but given that c source code as IR has had a lot more effort optimizing compilation perf and correctness than c++ template metaprogramming as IR, i suspect c preprocessor macros win here on memory, perf, simplicity, and correctness

d@nny disc@ mc² Jan 4, 2025

@mansr @dirksteins @arnd @KeyJ there could still be room for a hygienic macro system to improve over using the C preprocessor as an IR, but that's how i might try to get a more apples-to-apples comparison

aspragg Jul 23, 2024

@dirksteins @arnd @KeyJ

I'm guessing that once the 47MB of preprocessor output is fed to the actual compiler, a lot of it will turn out to be semantically null or trivially redundant, and some really basic optimisations like hoisting and common subexpression elimination will collapse the rest back down to a handful of assembly instructions.

Martin Uecker Jul 23, 2024

@arnd @KeyJ Guess that got a bit out of hand.

AstraLuma Jul 24, 2024

@arnd @KeyJ that reads like XML entity attack

ljs Jul 23, 2024

@arnd we going to revert that series then?

I feel there is pedantry at play...

Arnd Bergmann Jul 23, 2024

@ljs Unfortunately, it seems a lot of code started relying on the added features, so it's not as easy as reverting it, and the original version wasn't great either.

Not sure how to best fix it, maybe there is a chance to make a neat version that doesn't produce a constexpr and change all the callers that actually need a constexpr to use a simpler macro.

ljs Jul 23, 2024

@arnd worth posting to the linux-mm ML at least?

I mean this is unacceptable

Omar Sandoval Jul 23, 2024

@ljs @arnd eh I don't think it's fair to call this pedantry. This looks like a legitimate quality of life improvement for a common annoyance with min/max, but given the consequences, it's fair to say that it's not worth it.

ljs Jul 23, 2024

@osandov @arnd Regardless, this is an unacceptable build regression.

Chatted to @arnd on IRC who suggests he is tied up with stuff to dig into this more so I'll have a wee look :)

EDIT: Think that'll be more likely a report to list + leave others to solve as this seems quite involved and I have a ton of things to do :>)

FULL-FIFO DEVELOPER 🇺🇦🇨🇿Jul 23, 2024

@ljs @osandov @arnd bro just get a faster pdp-11

Oleksandr Natalenko, MSE

@lkundrak @ljs @osandov @arnd All the arguments against preprocessor are always won by preprocessor.

ljs Jul 23, 2024

@osandov @arnd For those following, posted https://lore.kernel.org/linux-mm/[email protected]/T/#u summarising situation + showing some results on my local box (ignore cringe hostname of said box)

Build performance regressions originating from min()/max() macros

Oblomov Jul 24, 2024

@ljs @osandov @arnd haven't seen the code, but would it possible to cut this down by using some static const intermediary values?

ljs Jul 24, 2024

@oblomov @osandov @arnd I'm no expert on the behaviour of the pre-processor or C const behaviour + don't really have time to dig deep but based on discussions on IRC - there's some really subtle complexity around const behaviour (esp. in scenarios like array indices where it truly has to be const) that I believe makes a relatively straight forward solution trickier.

There are bits of existing code that do this but again, seems super subtle.

Anyway I will leave the details of a solution to others :)

Arnd Bergmann Jul 24, 2024

@ljs I think @oblomov is suggesting an intermediate variable in the specific caller, which is a trivial fix and will of course work as it reduces the 47MB to a few KB. The thing we can't easily do is have a temporary variable inside of the min()/max() macros themselves because that prevents them from being used as a C constant expression, e.g. as an array length inside of a type definition.

ljs Jul 24, 2024

@arnd @oblomov ah right yeah.

The problem is we have tons of callers so actually doing that would be a giant pain.

I feel like a way forward may be to have a specific macro for the extremely strict case (type decl) and a looser one for everywhere else that eliminates this.

But question is - how often are we using min()/max() + friends in places like array lengths where we must strictly be const?

Arnd Bergmann Jul 24, 2024

@ljs @oblomov Not a lot. In an x86 allmodconfig build I had to fix up these 11 files with a simplified min()/max(). There are probably another dozen for other configs.

drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c

ljs Jul 24, 2024

@arnd @oblomov well this solution seems viable then? maybe have a cmin()/cmax() for these scenarios or something like that.

Oblomov Jul 24, 2024

@arnd @ljs yeah, my idea was to do it in the caller, not in the macro. stuff like (types made up. I have no idea what they are):

const int pfn_down_max = PFN_DOWN(MAXMEM);
const int extra_ratio = EXTRA_MEM_RATIO*min(max_pfn, pfn_down_max);
const int leftover = max_pages - max_pfn;
extra_pages = min3(extra_ratio, extra_pages, leftover);

or whatever. Might even increase readability (esp. with comments) in addition to limiting macro expansion blowout.

ljs Jul 24, 2024

@oblomov @arnd Not criticsing this to be clear that's a fine suggestion [text can be shit for conveying this] :)

But in my tests I found that it's not just xen that's the problem, so you'd have to do this everywhere, which isn't really practical.

I mean it's worth whacking this egregious case but it's a band aid, we definitely need a general solution.

I am pretty certain people have also whacked other egregious cases individually too, got major de ja vous over this.

Oblomov Jul 24, 2024

@ljs @arnd ah, that makes perfect sense.

It's a bit frustrating because this would be “trivial” to fix with C++ thanks to templates and constexpr functions (would even work “everywhere” including array indexing).

I remember there used to be a discussion on LKML about having an automatic selector for constant vs non-constant expressions with some fancy tricks, and it was specifically to do this kind of selection, but, did it got anywhere?

ljs Jul 24, 2024

@oblomov @arnd yeah there's stuff like https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm_types.h#n507 which maybe that refers to

But not sure how useful that'd be here, unless you were thinking of something else?

mm_types.h « linux « include - kernel/git/torvalds/linux.git - Linux kernel source tree

Oblomov Jul 24, 2024

@ljs @arnd I found the discussion I was thinking of. Thread starts here:

https://lkml.org/lkml/2018/3/20/805

and this message specifically is what I was thinking of:

https://lkml.org/lkml/2018/3/21/223

LKML: "Uecker, Martin": detecting integer constant expressions in macros

Oblomov Jul 24, 2024

@ljs @arnd on second thoughts, it might not help in this case unless __builtin_choose_expr does some trick on macro expansion to avoid the multi-megabyte nested hell in the non-const case.

Wrong way, damn 8-(

ljs Jul 24, 2024

@oblomov @arnd welcome to hell, friend.

NetworkManager Jul 23, 2024

@arnd that is pretty impressive, in a way

stefanct Jul 23, 2024

@arnd did you measure it "manually" or is there some tooling for this?