"Add" instructions: SSE (x86) vs SVE (ARM).

(x86 info from https://www.officedaytime.com/simd512e/ – may not be complete)

x86/x64 SIMD Instruction List (SSE to AVX512)

@dougall If we extend our reach to AVX-512, the intel side gets quite a bit fuller =)

@dougall devil's partially in how you count, too, of course. Is PMADDUBSW "an add"? Probably not, but it gets used as one all the time.

(Love to multiply by 1 just to add some numbers.)

@dougall Are the AVX-512 instructions with rounding control / masking / broadcast all "one instruction", or are they many? ARM would tend to call them different instructions, x86 would tend to say they're all ADDPS.
@dougall (Intel _does_ provide different intrinsics for a bunch of them, just to confuse things).
@steve @dougall I will never understand how Intel decided that some instructions with semantically distinct load forms (e.g. PMOV[SZ]X variants) don't get a memory operand form and you have to use the intrinsics with a load that's the wrong size (?!), but then AVX512 features that work uniformly for large swaths of the ISA (like masking) get 3 forms for every single instruction
@steve @dougall I'm OK with there being distinct _mask/maskz variants in the few cases where the mask form actually is special (e.g. compress/expand), but for ALU ops? Come on. It should have just been separate intrinsics for mask_epi{8,16,32,64} and mask_p[sd] and then they get fused during code gen. Both less noisy and more convenient both to use and for a compiler to deal with.
@steve @dougall Same with the fma123 forms, these are instructions but they should never have gotten their own intrinsics. Both for users and compilers it's more convenient to have a 4-operand FMA and only lower that to the 123 variants very late (after reg allocation).
@steve Good point! On the other side, I’ve also carefully dodged the fact that ARM uses the same mnemonic for all lane sizes.
@dougall Yet another reason why the lane size should have gone on the mnemonic rather than the register, as in Apple's assembly dialect... 😂
@steve true 😂 I’ve been meaning to ask – does Apple syntax double up on suffixes now that they’re not uniquely identified by the destination size? (eg FCVT.D.S or UDOT.S.H)
@steve Sorry – I probably shouldn't ask – but for anyone else wondering Apple tools (clang/otool) only use standard syntax for SVE as of Xcode 15 beta 5.
@steve @dougall on the other hand it makes the GCC asm constraints less useful, so its impossible to tell if its good or bad,
@steve Yeah, I’ve been thinking a lot about the coastline paradox lately (https://fgiesen.wordpress.com/2016/08/25/how-many-x86-instructions-are-there/) - but that adds another layer. As long as I also get to add [su]dot (4-way and 2-way) and [su]mmla to the SVE side, I don’t think it changes the ratio too much.
How many x86 instructions are there?

It’s surprisingly hard to give a good answer (the question was raised in this article). It depends on how you count, and the details are interesting (to me anyway). To not leave you hanging: …

The ryg blog
@dougall this is missing the some of the most important adds, namely, SAD :)
@dougall you vs the isa she tells you not to worry about