re: “The weird bit, AMX is still present on the M4, along with SME”
My understanding is that ARM9 SME/SME2 are architecturally defined for the »CPU« so these “matrix” extensions execute ONLY on the CPU!
Apple could have additional proprietary matrix instructions that execute instead on the AMX blocks.
This suggests SHOULDN’T encourage devs to use SME/SME2 because those can only be done on the CPU (not even the NPU!)—BUT—through Accelerate, matrix will be executed on Apple’s AMX(s).