Mastodawn

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Lemonade: Local AI for Text, Images, and Speech

Been running local LLMs on my 7900 XTX for months and the ROCm experience has been... rough. The fact that AMD is backing an official inference server that handles the driver/dependency maze is huge. My biggest question is NPU support - has anyone actually gotten meaningful throughput from the Ryzen AI NPU vs just using the dGPU? In my testing the NPU was mostly a bottleneck for anything beyond tiny models.

Show thread

cl0ckt0wer

the npu is more for power efficiency when on battery. I don't think it's a replacement for gpu.