Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

https://lemonade-server.ai

Lemonade: Local AI for Text, Images, and Speech

Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough. The fact that AMD is backing an official inference server that handles the driver/dependency maze is huge. My biggest question is NPU support - has anyone actually gotten meaningful throughput from the Ryzen AI NPU vs just using the dGPU? In my testing the NPU was mostly a bottleneck for anything beyond tiny models.
the npu is more for power efficiency when on battery. I don't think it's a replacement for gpu.