I had a feeling that kernel compilation got slower recently and tried to find the slowest file across randconfig builds. It turned out to be arch/x86/xen/setup.c, which takes 15 seconds to preprocess on a reasonably fast Apple M1 Ultra.

This all comes from one line "extra_pages = min3(EXTRA_MEM_RATIO * min(max_pfn, PFN_DOWN(MAXMEM)), extra_pages, max_pages - max_pfn);" that expands to 47MB of preprocessor output after commits 80fcac55385c..867046cc70277.

If anyone wonders what other files are bad, this is the list of all files with over five second compile times: https://pastebin.com/raw/fXJe7y9P.

The worst case for this particular file was 44 seconds, three times slower than in defconfig.

The median compile time across all files is 0.26s.

@arnd clicked through and pretty much got what I expected 😔
@arnd I guess the AMD ones are all the huge register definitions they have?
@arnd so you're telling me that, not only does amdgpu produce most of the build issues that I run into, it also is one of the worst things to enable when it comes to my build time? Hmmge
@conor the other thing to realize about this driver is that it's about the same size as linux-2.6.12, 5.8M lines of code. I had a look at what makes it so slow to compile and as far as I can tell it's just the size of the individual files and how they contain functions with thousands of lines and dozens of arguments.
@arnd can you share the script to get those numbers - or is there a make target for it these days?
@arnd is this some kind of unchecked/unintentional recursion?
@jn something like that, yes. IIRC one macro expands its arguments five times or more, and the user nests it several times, so the expression that gets passed in is expanded thousands of times in the end.
@arnd Opening arch/x86/xen/setup.i and going to the end of the offending line looks like a good DoS-attach on vim.
@arnd 47M from a single composition of max()?? I am terrified to go look now ...
@arnd But ... how? Even if it expanded to 47 kilobytes, that would be excessive, but 47 *megabytes*? 🤯

@KeyJ it nests min() multiple levels deep with the use of min3(), and each one expands its argument 20 times times now (up from 6 back in linux-6.6). This gets 8000 expansions for each of the arguments, plus a lot of extra bits with each expansion. PFN_DOWN(MAXMEM) contributes a bit to the initial size as well.

See https://pastebin.com/MmfWH7TM for the first few pages of it.

unsigned long extra_pages = 0; extra_pages = __builtin_choose_expr((sizeof(in - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

Pastebin
@arnd @KeyJ and this is effective and efficient code? Doesn’t look like it to me - but I’m not a kernel developer.
@dirksteins @arnd @KeyJ It's a huge constant expression that gets evaluated by the compiler. The final code is (probably) fine.
@mansr @arnd @KeyJ probably. But I still ask myself if a 47MB expression is useful and if the compiler is able to process this correctly (it probably is, but is it efficient?).

@dirksteins @arnd @KeyJ There's no reason to doubt the correctness, but as has been noted, it is somewhat slow.

Now as shocking as this might be, it is quite tame compared to the boost C++ library.

@mansr @dirksteins @arnd @KeyJ since it's never checked in or written to the filesystem, it's more of an intermediate expression or IR than anything else. in this framing, it may be compared directly to to the kinds of in-memory data structures produced for c++ template metaprogramming. i don't know the appropriate answer to this specifically but given that c source code as IR has had a lot more effort optimizing compilation perf and correctness than c++ template metaprogramming as IR, i suspect c preprocessor macros win here on memory, perf, simplicity, and correctness
@mansr @dirksteins @arnd @KeyJ there could still be room for a hygienic macro system to improve over using the C preprocessor as an IR, but that's how i might try to get a more apples-to-apples comparison

@dirksteins @arnd @KeyJ

I'm guessing that once the 47MB of preprocessor output is fed to the actual compiler, a lot of it will turn out to be semantically null or trivially redundant, and some really basic optimisations like hoisting and common subexpression elimination will collapse the rest back down to a handful of assembly instructions.

@arnd @KeyJ Guess that got a bit out of hand.
@arnd @KeyJ that reads like XML entity attack
@arnd we going to revert that series then?

I feel there is pedantry at play...

@ljs Unfortunately, it seems a lot of code started relying on the added features, so it's not as easy as reverting it, and the original version wasn't great either.

Not sure how to best fix it, maybe there is a chance to make a neat version that doesn't produce a constexpr and change all the callers that actually need a constexpr to use a simpler macro.

@arnd worth posting to the linux-mm ML at least?

I mean this is unacceptable
@ljs @arnd eh I don't think it's fair to call this pedantry. This looks like a legitimate quality of life improvement for a common annoyance with min/max, but given the consequences, it's fair to say that it's not worth it.
@osandov @arnd Regardless, this is an unacceptable build regression.

Chatted to @arnd on IRC who suggests he is tied up with stuff to dig into this more so I'll have a wee look :)

EDIT: Think that'll be more likely a report to list + leave others to solve as this seems quite involved and I have a ton of things to do :>)
@ljs @osandov @arnd bro just get a faster pdp-11
@lkundrak @ljs @osandov @arnd All the arguments against preprocessor are always won by preprocessor.
@osandov @arnd For those following, posted https://lore.kernel.org/linux-mm/[email protected]/T/#u summarising situation + showing some results on my local box (ignore cringe hostname of said box)
Build performance regressions originating from min()/max() macros

@ljs @osandov @arnd haven't seen the code, but would it possible to cut this down by using some static const intermediary values?
@oblomov @osandov @arnd I'm no expert on the behaviour of the pre-processor or C const behaviour + don't really have time to dig deep but based on discussions on IRC - there's some really subtle complexity around const behaviour (esp. in scenarios like array indices where it truly has to be const) that I believe makes a relatively straight forward solution trickier.

There are bits of existing code that do this but again, seems super subtle.

Anyway I will leave the details of a solution to others :)
@ljs I think @oblomov is suggesting an intermediate variable in the specific caller, which is a trivial fix and will of course work as it reduces the 47MB to a few KB. The thing we can't easily do is have a temporary variable inside of the min()/max() macros themselves because that prevents them from being used as a C constant expression, e.g. as an array length inside of a type definition.
@arnd @oblomov ah right yeah.

The problem is we have tons of callers so actually doing that would be a giant pain.

I feel like a way forward may be to have a specific macro for the extremely strict case (type decl) and a looser one for everywhere else that eliminates this.

But question is - how often are we using min()/max() + friends in places like array lengths where we must strictly be const?

@ljs @oblomov Not a lot. In an x86 allmodconfig build I had to fix up these 11 files with a simplified min()/max(). There are probably another dozen for other configs.

drivers/edac/sb_edac.c
drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
drivers/gpu/drm/drm_color_mgmt.c
drivers/input/touchscreen/cyttsp4_core.c
drivers/md/dm-integrity.c
drivers/net/can/usb/etas_es58x/es58x_devlink.c
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
fs/btrfs/tree-checker.c
lib/vsprintf.c
net/ipv4/proc.c
net/ipv6/proc.c

@arnd @oblomov well this solution seems viable then? maybe have a cmin()/cmax() for these scenarios or something like that.

@arnd @ljs yeah, my idea was to do it in the caller, not in the macro. stuff like (types made up. I have no idea what they are):

const int pfn_down_max = PFN_DOWN(MAXMEM);
const int extra_ratio = EXTRA_MEM_RATIO*min(max_pfn, pfn_down_max);
const int leftover = max_pages - max_pfn;
extra_pages = min3(extra_ratio, extra_pages, leftover);

or whatever. Might even increase readability (esp. with comments) in addition to limiting macro expansion blowout.

@oblomov @arnd Not criticsing this to be clear that's a fine suggestion [text can be shit for conveying this] :)

But in my tests I found that it's not just xen that's the problem, so you'd have to do this everywhere, which isn't really practical.

I mean it's worth whacking this egregious case but it's a band aid, we definitely need a general solution.

I am pretty certain people have also whacked other egregious cases individually too, got major de ja vous over this.

@ljs @arnd ah, that makes perfect sense.

It's a bit frustrating because this would be “trivial” to fix with C++ thanks to templates and constexpr functions (would even work “everywhere” including array indexing).

I remember there used to be a discussion on LKML about having an automatic selector for constant vs non-constant expressions with some fancy tricks, and it was specifically to do this kind of selection, but, did it got anywhere?

@oblomov @arnd yeah there's stuff like https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mm_types.h#n507 which maybe that refers to

But not sure how useful that'd be here, unless you were thinking of something else?
mm_types.h « linux « include - kernel/git/torvalds/linux.git - Linux kernel source tree

@ljs @arnd I found the discussion I was thinking of. Thread starts here:

https://lkml.org/lkml/2018/3/20/805

and this message specifically is what I was thinking of:

https://lkml.org/lkml/2018/3/21/223

LKML: "Uecker, Martin": detecting integer constant expressions in macros

@ljs @arnd on second thoughts, it might not help in this case unless __builtin_choose_expr does some trick on macro expansion to avoid the multi-megabyte nested hell in the non-const case.

Wrong way, damn 8-(

@oblomov @arnd welcome to hell, friend.
@arnd that is pretty impressive, in a way
@arnd did you measure it "manually" or is there some tooling for this?