Lemonade by AMD: a fast and open source local LLM server using GPU and NPU
https://lemonade-server.ai
Lemonade: Local AI for Text, Images, and Speech
Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough. The fact that AMD is backing an official inference server that handles the driver/dependency maze is huge. My biggest question is NPU support - has anyone actually gotten meaningful throughput from the Ryzen AI NPU vs just using the dGPU? In my testing the NPU was mostly a bottleneck for anything beyond tiny models.