Hello XDNA!
We're documenting how to program AMD's NPUs in Ryzen AI chips.
Our website covers the ISA, register files, operation latencies, and hand-optimized assembly kernels for tensor contractions.
Measured single-compute-tile throughput:
• XDNA1 (Ryzen 7 8700G): 398 BF16 GFLOPS (86% of peak)
• XDNA2 (Ryzen AI Max PRO 390): 1760 BFP16 GFLOPS (95% of peak)





