@dougall So, since your famous (infamous) reverse engineering and analysis of Apple’s AMX for the M1 & A14, have you discovered any hardware improvements to this unit up through the M4 & A18?
“Multicore” AMX would be too wasteful/expensive in die area/power/thermal/complexity respects, bt how about notable architectural improvements 2its internal logic units: wider paths, more lanes, on-chip SRAM or DMA — things that cannot be accounted for by die process shrink and higher clock speeds alone(?)


