#LLM performance on #AMD Radeon AI PRO R9700 (32GB VRAM) using #ROCm 7.2: input/output tokens per second and VRAM usage at a 128k context length with Q8 KV cache:
Qwen 3.6 27B ~ 168(in)/25(out) t/s, 21GB
Qwen 3.6 27B MTP ~ 127(in)/28(out) t/s, 17GB
Qwen 3.6 35B A3B ~ 246(in)/71(out) t/s, 26GB
Qwen 3.6 35B A3B MTP ~ 195(in)/74(out) t/s, 22GB
Gemma 4 31B ~ 251(in)/22 (out) t/s, 20GB
Gemma 4 26B A4B ~ 418(in)/72(out) t/s, 18GB
Gemma 4 12 B ~ 493(in)/47(out) t/s, 9GB
#Ollama, Q4_K_M quantization.




