It's beautiful! Several overdue improvements to keep x86 competitive with ARM. I love competition.
It's beautiful! Several overdue improvements to keep x86 competitive with ARM. I love competition.
Intel's new PUSH2/POP2 are similar to ARM's LDP/STP. I think these are extremely underrated instructions. Loads and stores are quite expensive, but these processors already support 128-bit loads and stores for vector instructions.
Zen 3 and the Apple M1 can both do 3 loads per cycle, but with LDP, the M1 can load 2x the scalar registers per cycle – kinda crazy. It's a shame compilers aren't better at using these instructions, and that the Intel paired load/store is restricted to stack push/pop.
Conditional loads and stores are the biggest surprise to me so far. But they kind of make sense – you already have predicated loads and stores happening on the vector side, so it's nice to see that as an option in scalar code too.
Should also allow for a conditional trap by NULL-pointer-deref (or by writing to RIP+0 if you have W^X and want to save a byte?)
Adding that to my ARM wish-list.
Conditional compare is great – hopefully x86 people will improve compiler support – it's a surprisingly tricky problem with only one flags register. And ARM doesn't have conditional test, so that's exciting!
64-bit absolute jump... Nice to have, but why? You can't use it in position independent code. You can't use it as a call-out-of-JIT. Is it just a jump-out-of-JIT? Wasn't "mov rax, target ; jmp rax" fine? Maybe for dynamic linkers, JITs, or dynamic instrumentation?
@jasonmolenda @pervognsen Nice! Yeah, that came to mind, but "mov + jmp" is 12 bytes, and I'm pretty sure JMPABS is 11 bytes? Saving a byte is nice, but I can't imagine it'll make a huge difference?
(It's given as "REX2,NO66,NO67,NOREP MAP0 W0 A1 target64" – REX2 would be the new 2-byte prefix. I guess "MAP0 W0" are the bits in the REX2 payload? "A1" would be a 1-byte opcode, then 8-bytes for "target64".)
@pervognsen @jasonmolenda I don't really – @comex's substitute comes to mind, but is unmaintained (and was maybe only intended to target function prologues, rather than arbitrary locations?) I haven't kept up with what people are using instead.
(IIRC it just refuses to inject if the hook would overlap a possible jump target (as determined by scanning forwards), then rewrites the replaced instructions to fix any RIP-relative operands.)