Mastodawn

Day 1 of Advent of Compiler Optimisations!

Why do compilers love `xor eax, eax` for zeroing registers? It's brilliant: saves bytes compared to `mov eax, 0`, AND x86 CPUs recognise this "zeroing idiom" early in the pipeline—breaking register dependencies and removing it from execution entirely. Even better: writing to `eax` zeroes the top 32 bits of `rax` for free, handling 64-bit longs in one instruction.

#AoCO2025

Why xor eax, eax? — Matt Godbolt’s blog

Why do compilers love xor-ing registers so much?

Show thread

HP van Braam Dec 1

@mattgodbolt wait xor eax, eax clears the top of rax?

I never knew that, and also that seems pretty bad. Does xor ax, ax also clear ALL of eax and rax?

Show thread

matt godbolt Dec 1

@hp any write to a 32-bit register clears the top 32-bits. That's not true of the other writes. When AMD expanded the registers to 64b, they changed the behaviour for those only. So xor ax, ax only clears the bottom 16b, leaving the other 48 alone.

Show thread

lj·rk Dec 1

@mattgodbolt @hp is this also true with r8-15? Writing to r8d clears everything that is? Last time I dealt (and had access to) an x86 machine is a bit back in time but I remember something fuzzy about behavior of those registers differing from the classic ones

Show thread

matt godbolt

@ljrk @hp I'm 99.99% sure it's the same for all registers, including the new ones. It would be a pain for compilers' register allocatirs to have to treat registers differently depending if they were "new" or not.

Show thread

Fabian Giesen Dec 2

@mattgodbolt @ljrk @hp It's the same for all registers.

There's two funky bits involving r8..r15:

1. rsp/rbp have extra constraints on their addressing modes (due to how ModR/M encoding works) and r12/r13 inherit them. The main consequence being that sometimes using them produces longer encodings.
2. the 8-bit high registers (ah, bh, ch, dh) are only encodable without REX prefix, so you can't have a reference to any of those 4 and r8..r15 in the same instruction.

Show thread

Fabian Giesen Dec 2

@mattgodbolt @ljrk @hp for 1. see for example https://sandpile.org/x86/opc_rm.htm

- there is no straight [rsp] or [rsp+simm8/simm32] encoding; when you write an insn using rsp as base, it forces an encoding with SIB (Scaled Index Byte), and rsp cannot be the index (the part that gets shifted by 0..3). Same goes for r12.
- there is no [rbp] either (and thus no [r13]); these always get turned into [rbp+0] with simm8 adding an extra 0 byte.

2. is more annoying; mostly x86-64 compilers just avoid [abcd]h.

sandpile.org -- x86 architecture -- mod R/M byte

Show thread

Tom Forsyth Dec 2

@rygorous @mattgodbolt @ljrk @hp r12/r13 being strange is one of the really fun things about x86 encoding.

The other is that you can use "r0" instead of "rax" if you want, but be very careful with r1, r2 and r3 because they are not, - as you would suppose - rbx, rcx, rdx. That would be far too easy.

Show thread

Fabian Giesen Dec 2

@TomF @mattgodbolt @ljrk @hp well rax:rdx is the usual pair, so of course r1 is actually rdx.

haha jk. it's rax, rcx, rdx, rbx.

Show thread

Tom Forsyth Dec 2

@rygorous @mattgodbolt @ljrk @hp I usually blame the Z80, but I'm not actually sure if that's chronologically true. Intel are perfectly capable of making terrible decisions all by themselves.

Show thread

Fabian Giesen Dec 2

@TomF not Z80, 8080. But yes, this one came about due the desire to have asm-level compatibility with 8080 code early on.

It's not really an architecture thing at all. More a misfeature of the assembly language.

Show thread

Jeslas Dec 3

@TomF that is one point of so many other points I like about debugging arm code than x86🙂

Show thread

Tom Forsyth Dec 3

@jeslas Yes, I also love to debug Thumb code it is so much fun. Oh wait you didn't mean Thumb did you.

Arm is only 15 years younger, and it has easily as much cruft as x86.

Show thread

Jeslas Dec 3

@TomF my career is not long enough to understand what you mean here 😅