TurboQuant KV Compression and SSD Expert Streaming for M5 Pro and IOS

https://github.com/SharpAI/SwiftLM

GitHub - SharpAI/SwiftLM: ⚡ Native Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app.

⚡ Native Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, + iOS iPhone app. - SharpAI/SwiftLM

GitHub

Although I'm interested in both topics (KV compression and attempts to stream MoE models from storage) this is at least the 10th vibecoded project on this topic I've seen today alone across HN, Twitter, and some subreddits I visit.

At least this one gave credit to the upstream projects which it used as a reference.

The llama.cpp project is also getting a wave of vibecoded PRs that are very clearly being produced by pointing claude at the repo and the original paper and having it produce something.

Almost none of these attempts contain information that really matters, like actual benchmark tests with differen KV quantization levels (not just perplexity or KLD).

"vibe coded" is NOT the bad thing you think it is.

Going from paper to implementation from scratch in half an hour or so is great.

That’s a starting spot, but how about some testing and benchmarks?

Where’s the value added if the person just tells Claude to do it and then submits a PR?

The maintainers may as well vibe code it themselves if that’s all the work the would-be contributor is going to put into it.

if it works it works

we live in a wholly unoptimized world because the available resources have been so high, while the benefits of optimizing have been so low. that has flipped now and there are tons of low hanging fruit to optimize.

I agree that benchmarks would be great, but thats only relevant to this one topic, not the overall agentic coded pull request concept itself

It's relevant in that it's an example that people are doing the easy part - the coding - and skipping the hard part - the benchmarking and proving it works and provides value.

A PR without evidence it works and expectations for the benefits using the new feature would bring is kind of worthless.

> if it works it works

If it works in one case that doesn't mean it works consistently or well in the general case

I've made lots of things with Claude Code that just work... until I do things in a slightly different order and the whole thing explodes