fun kernel/compiler interaction that causes some Linux kernel code to have some superfluous instructions on x86-64 and use a bit more stack space than necessary:
Linux instructs the compiler to prefer 8-byte aligned stack frames (instead of the standard 16 bytes), which then also means the compiler has to assume that at the start of each function, the stack is only aligned to 8 bytes; which means if something tries to do a 16-byte-aligned allocation, the compiler has to emit instructions to save the old stack pointer (even if frame pointers are disabled) and align the stack.
And apparently especially in GCC, any nontrivial stack allocation whose address escapes the compiler's analysis is aligned to 16 bytes even if the object actually requires less alignment:
int foo(void *);
struct s1 { unsigned long a; };
struct s2 { unsigned long a; unsigned long b; };
int bar1() {
struct s1 s;
return foo(&s);
}
int bar2() {
struct s2 s;
return foo(&s);
}
compiles to this with GCC trunk with flags -O3 -mpreferred-stack-boundary=3
:
bar1:
subq $8, %rsp
movq %rsp, %rdi
call foo
addq $8, %rsp
ret
bar2:
pushq %rbp
movq %rsp, %rbp
andq $-16, %rsp
subq $16, %rsp
movq %rsp, %rdi
call foo
leave
ret
Note that bar1
doesn't do alignment (probably because struct s1
is simple enough to hit some special case?) while bar2
adds instructions to align the object (even though s1
and s2
have the same alignment requirements).