Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

https://lemonade-server.ai

Lemonade: Local AI for Text, Images, and Speech

Note that the NPU models/kernels this uses are proprietary and not available as open source. It would be nice to develop more open support for this hardware.
Are they? The docs say "You can also register any Hugging Face model into your Lemonade Server with the advanced pull command options"
That won't give you NPU support, which relies on https://github.com/FastFlowLM/FastFlowLM . And that says "NPU-accelerated kernels are proprietary binaries", not open source.
GitHub - FastFlowLM/FastFlowLM: Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs.

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs. - FastFlowLM/FastFlowLM

GitHub
I bought one of their machines to play around with under the expectation that I may never be able to use the NPU for models. But I am still angry to read this anyway.
AMD/Xilinx's software support for the NPU is fully open, it's only FFLM's models that are proprietary. See https://github.com/amd/iron https://github.com/Xilinx/mlir-aie https://github.com/amd/RyzenAI-SW/ . It would be nice to explore whether one can simply develop kernels for these NPU's using Vulkan Compute and drive them that way; that would provide the closest unification with the existing cross-platform support for GPU's.