thx, @dougall
1/2
Everything “discrete matrix multiplication unit” remains murky for Apple’s proprietary AMX block or anyone else’s for that matter—afaik.
There’s no stock “ARM ‘Cortex’ Matrix Unit” reference design; every licensee bakes its own—ARM doesn’t have an off-the-shelf one like it does stock CPU/GPUs.
9.2-A’s SME is only that—a SOFTWARE instruction set. And Scalable Matrix Extension calls are executed on ARM CPUs—only
Apple AMX instructions, otoh, have a straight execution path—no?