"Add" instructions: SSE (x86) vs SVE (ARM).

(x86 info from https://www.officedaytime.com/simd512e/ – may not be complete)

x86/x64 SIMD Instruction List (SSE to AVX512)

@dougall If we extend our reach to AVX-512, the intel side gets quite a bit fuller =)

@dougall devil's partially in how you count, too, of course. Is PMADDUBSW "an add"? Probably not, but it gets used as one all the time.

(Love to multiply by 1 just to add some numbers.)

@dougall Are the AVX-512 instructions with rounding control / masking / broadcast all "one instruction", or are they many? ARM would tend to call them different instructions, x86 would tend to say they're all ADDPS.
@dougall (Intel _does_ provide different intrinsics for a bunch of them, just to confuse things).
@steve @dougall I will never understand how Intel decided that some instructions with semantically distinct load forms (e.g. PMOV[SZ]X variants) don't get a memory operand form and you have to use the intrinsics with a load that's the wrong size (?!), but then AVX512 features that work uniformly for large swaths of the ISA (like masking) get 3 forms for every single instruction
@steve @dougall I'm OK with there being distinct _mask/maskz variants in the few cases where the mask form actually is special (e.g. compress/expand), but for ALU ops? Come on. It should have just been separate intrinsics for mask_epi{8,16,32,64} and mask_p[sd] and then they get fused during code gen. Both less noisy and more convenient both to use and for a compiler to deal with.
@steve @dougall Same with the fma123 forms, these are instructions but they should never have gotten their own intrinsics. Both for users and compilers it's more convenient to have a 4-operand FMA and only lower that to the 123 variants very late (after reg allocation).