@dougall
Learned some things down the rabbit hole.

Despite what LLVM says, M4 is somewhere between ARMv9.2-A and 9.4-A compliant. (SSV not required.)

SME/SME2 *DOES* bring matrix math HARDWARE enhancements to ARM CPUs. Apple’s now using some SME or even SME2, and a FEW matrix ops actually require a *touch* of SVE(!); Apple may eliminate its AMX coprocessor altogether in future SoCs and go all-in with ARM SME/2! (Stick with Accelerate.)

 will still have custom GPU & NPU designs in its “moat.”