TurboQuant KV Compression and SSD Expert Streaming for M5 Pro and IOS

https://github.com/SharpAI/SwiftLM

GitHub - SharpAI/SwiftLM: ⚡ Native Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

⚡ Native Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app. - SharpAI/SwiftLM

GitHub

Although I'm interested in both topics (KV compression and attempts to stream MoE models from storage) this is at least the 10th vibecoded project on this topic I've seen today alone across HN, Twitter, and some subreddits I visit.

At least this one gave credit to the upstream projects which it used as a reference.

The llama.cpp project is also getting a wave of vibecoded PRs that are very clearly being produced by pointing claude at the repo and the original paper and having it produce something.

Almost none of these attempts contain information that really matters, like actual benchmark tests with differen KV quantization levels (not just perplexity or KLD).

"vibe coded" is NOT the bad thing you think it is.

Going from paper to implementation from scratch in half an hour or so is great.

Sure, but the problem is when you take that half hour of work and share it with other people without making clear how much effort has gone into it.

Software is valuable if it has been tested and exercised properly by other people. I don't care if you vide coded it provided you then put the real work in to verify that it actually works correctly - and then include the proof that you've done that when you start widely sharing it with the world.

Right now it's impossible to tell which of these projects implementing the paper are worth spending time with.