Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough. The fact that AMD is backing an official inference server that handles the driver/dependency maze is huge. My biggest question is NPU support - has anyone actually gotten meaningful throughput from the Ryzen AI NPU vs just using the dGPU? In my testing the NPU was mostly a bottleneck for anything beyond tiny models.
This is a really interesting direction for OCaml. A formal C++ backend could significantly simplify embedding OCaml into existing C++ codebases, especially where linking against the standard OCaml runtime might be tricky. I wonder how the performance compares to the existing native backend in long-running processes.