Mastodawn

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Lemonade: Local AI for Text, Images, and Speech

Note that the NPU models/kernels this uses are proprietary and not available as open source. It would be nice to develop more open support for this hardware.

Show thread

swiftcoder

Are they? The docs say "You can also register any Hugging Face model into your Lemonade Server with the advanced pull command options"

Show thread

zozbot234 Apr 2

That won't give you NPU support, which relies on https://github.com/FastFlowLM/FastFlowLM . And that says "NPU-accelerated kernels are proprietary binaries", not open source.

GitHub - FastFlowLM/FastFlowLM: Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs. - FastFlowLM/FastFlowLM

GitHub