Mastodawn

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Lemonade: Local AI for Text, Images, and Speech

Note that the NPU models/kernels this uses are proprietary and not available as open source. It would be nice to develop more open support for this hardware.

Show thread

swiftcoder Apr 2

Are they? The docs say "You can also register any Hugging Face model into your Lemonade Server with the advanced pull command options"

Show thread

zozbot234 Apr 2

That won't give you NPU support, which relies on https://github.com/FastFlowLM/FastFlowLM . And that says "NPU-accelerated kernels are proprietary binaries", not open source.

GitHub - FastFlowLM/FastFlowLM: Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs. - FastFlowLM/FastFlowLM

GitHub

Show thread

plagiarist Apr 2

I bought one of their machines to play around with under the expectation that I may never be able to use the NPU for models. But I am still angry to read this anyway.

Show thread

zozbot234 Apr 2

AMD/Xilinx's software support for the NPU is fully open, it's only FFLM's models that are proprietary. See https://github.com/amd/iron https://github.com/Xilinx/mlir-aie https://github.com/amd/RyzenAI-SW/ . It would be nice to explore whether one can simply develop kernels for these NPU's using Vulkan Compute and drive them that way; that would provide the closest unification with the existing cross-platform support for GPU's.