thx, @dougall
1/2
Everything “discrete matrix multiplication unit” remains murky for Apple’s proprietary AMX block or anyone else’s for that matter—afaik.

There’s no stock “ARM ‘Cortex’ Matrix Unit” reference design; every licensee bakes its own—ARM doesn’t have an off-the-shelf one like it does stock CPU/GPUs.

9.2-A’s SME is only that—a SOFTWARE instruction set. And Scalable Matrix Extension calls are executed on ARM CPUs—only

Apple AMX instructions, otoh, have a straight execution path—no?

@dougall
I would like to amend my above post to say it’s stupid and should be ignored.

There *is no* “straight execution path” to AMX hardware because, despite its apparent physical separation from the CPU cluster in die shots, Apple considers AMX *part* of its CPUs — like an FPU or NEON hardware.

AMX is a “slave” to the CPU because it *is* the CPU for all intents and purposes.

And if Apple fully embraces SVE1/2, AMX will be superannuated and may stick around purely for legacy compatibility.