It's beautiful! Several overdue improvements to keep x86 competitive with ARM. I love competition.
It's beautiful! Several overdue improvements to keep x86 competitive with ARM. I love competition.
Intel's new PUSH2/POP2 are similar to ARM's LDP/STP. I think these are extremely underrated instructions. Loads and stores are quite expensive, but these processors already support 128-bit loads and stores for vector instructions.
Zen 3 and the Apple M1 can both do 3 loads per cycle, but with LDP, the M1 can load 2x the scalar registers per cycle – kinda crazy. It's a shame compilers aren't better at using these instructions, and that the Intel paired load/store is restricted to stack push/pop.
Conditional loads and stores are the biggest surprise to me so far. But they kind of make sense – you already have predicated loads and stores happening on the vector side, so it's nice to see that as an option in scalar code too.
Should also allow for a conditional trap by NULL-pointer-deref (or by writing to RIP+0 if you have W^X and want to save a byte?)
Adding that to my ARM wish-list.
@dougall *puts on tinfoil hat* CCMP lets you turn whatever condition you want into OF set (just do a CCMP on !cond of reg with itself and set OF to 1 on the "not set" path) and then you can just use INTO!
Ok sure, *technically* not allowed in x86-64 ever, but still!!11