RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

mehr auf Arint.info

#GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

https://x.com/KyleHessling1/status/2057853098585108979#m

Arint - SEO+KI (@[email protected])

<p>RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…</p> <p><a href="https://arint.info/@Arint/116621893018625926">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #huggingface #make #rest #science #SWE #Swe #arint_info</p> <p><a href="https://x.com/KyleHessling1/status/2057853098585108979#m">https://x.com/KyleHessling1/status/2057853098585108979#m</a></p>

Mastodon Glitch Edition

RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, ein Anstieg von 1,4x vor nur zwei Tagen!

mehr auf Arint.info

#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info

https://x.com/danielhanchen/status/2055274688025378854#m

Arint - SEO+KI (@[email protected])

<p>RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, ein Anstieg von 1,4x vor nur zwei Tagen!</p> <p><a href="https://arint.info/@Arint/116587929399884850">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info</p> <p><a href="https://x.com/danielhanchen/status/2055274688025378854#m">https://x.com/danielhanchen/status/2055274688025378854#m</a></p>

Mastodon Glitch Edition

RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, eine Steigerung von 1,4x vor nur zwei Tagen!

mehr auf Arint.info

#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info

https://x.com/danielhanchen/status/2055274688025378854#m

Arint - SEO+KI (@[email protected])

<p>RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, eine Steigerung von 1,4x vor nur zwei Tagen!</p> <p><a href="https://arint.info/@Arint/116579426735022973">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info</p> <p><a href="https://x.com/danielhanchen/status/2055274688025378854#m">https://x.com/danielhanchen/status/2055274688025378854#m</a></p>

Mastodon Glitch Edition

left curve dev (@leftcurvedev_)

Qwen3.6 27B MTP 모델을 기존 설정과 비교해 약 30% 더 빠른 성능을 확인했다는 테스트 결과다. llama.cpp의 MTP PR 브랜치와 Unsloth의 새 GGUF를 사용했으며, draft-mtp 옵션으로 추론 속도 향상을 시연했다.

https://x.com/leftcurvedev_/status/2054861291924213881

#qwen #llamacpp #unsloth #gguf #mtp

left curve dev (@leftcurvedev_) on X

Here are my results for Qwen3.6 27B MTP model vs base setup: ~30% extra speed 🔥 Used the specific MTP PR branch and downloaded the new GGUF from @UnslothAI git clone -b mtp-clean https://t.co/anD61S6gjm --spec-type draft-mtp --spec-draft-n-max 2 https://t.co/dW8ziUcrAo

X (formerly Twitter)
Behold, the riveting #exposé on #GGUF, the file format so revolutionary, it’s practically a single piece of digital art 🎨🤯. Witness as the author attempts to weave an epic tale out of a glorified zip file, while simultaneously acknowledging the sheer absence of anything remotely interesting. 🥱✨
https://nobodywho.ooo/posts/whats-in-a-gguf/ #digitalart #storytelling #technews #zips #HackerNews #ngated
What's in a GGUF, besides the weights - and what's still missing? - NobodyWho

What extra stuff is needed to properly run a language model? Besides the weights of a language model, what is the gguf metadata that we need to parse and use?

NobodyWho

What's in a GGUF, besides the weights – and what's still missing?

https://nobodywho.ooo/posts/whats-in-a-gguf/

#HackerNews #GGUF #AIweights #missingfeatures #technews

What's in a GGUF, besides the weights - and what's still missing? - NobodyWho

What extra stuff is needed to properly run a language model? Besides the weights of a language model, what is the gguf metadata that we need to parse and use?

NobodyWho

RT @stableAPY: Unsloth hat die MTP (Multi-Token Prediction) Version von Qwen 3.6 27B und 35B A3B veröffentlicht. Dies gibt auf der Decode-Seite einen ziemlich guten Boost, beeinträchtigt jedoch etwas das Prefill. Ich denke, dies wird noch meine Standardeinstellung bleiben, um ein wenig Decode-Geschwindigkeit zu gewinnen; der Nachteil beim Prefill ist für mich akzeptabel. Für llama.cpp benötigst du diesen spezifischen Branch: https://github.com/ggml-org/llama.cpp/pull/22673. Die Modelle sind verfügbar unter: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF und https://huggingface.co/unsloth/Qwen3.6-35B-A3B-MTP-GGUF.

mehr auf Arint.info

#GGUF #llamacpp #MachineLearning #MTP #Qwen3 #Unsloth #arint_info

https://x.com/stableAPY/status/2054136118648434941#m

llama + spec: MTP Support by am17an · Pull Request #22673 · ggml-org/llama.cpp

Overview This PR adds support for MTP (Multi Token Prediction) heads. I tested this on Qwen3.6 27B and Qwen3.6 35BA3B but in principle it should work for any MTP model. I've posted the detaile...

GitHub

merve (@mervenoyann)

Hugging Face가 Hermes Agent를 로컬 앱에 추가하고, 호환되는 GGUF/MLX 모델로 로컬 실행을 지원한다고 발표했다. 또한 Hermes Agent의 네이티브 트레이스 지원이 추가되어, 추적 결과를 Hub에서 직접 시각화할 수 있게 됐다.

https://x.com/mervenoyann/status/2053857347429151163

#huggingface #hermesagent #localai #gguf #mlx

merve (@mervenoyann) on X

🆕 Hugging Face 🤝 Hermes Agent 🔥 > we added Hermes Agent to local apps: run it locally with any compatible GGUF/MLX model > shipped native traces support for Hermes Agent: visualize your Hermes traces directly on the Hub Very soon most agents will run locally and we want to

X (formerly Twitter)

Victor M (@victormustar)

Hermes Agent를 로컬에서 실행할 때 유용한 기능이 소개됐다. /models에서 Hermes와 호환되는 6만 개 이상의 모델을 필터링할 수 있고, 각 모델 페이지에서 로컬 하드웨어에서 실행 가능한지 즉시 확인할 수 있다.

https://x.com/victormustar/status/2053863013040439517

#huggingface #hermesagent #localai #gguf #mlx

Victor M (@victormustar) on X

This feature is quite cool to run Hermes Agent locally because: - You can filter on the +60k models compatible with Hermes directly from /models - You can instantly know if it will run on your local hardware from the model page

X (formerly Twitter)

Show HN: ChonkLM – Tiny language models running offline in the browser

ChonkLM은 5억 파라미터 미만의 초소형 언어 모델을 브라우저 내에서 오프라인으로 실행할 수 있게 한 프로젝트입니다. WebGPU를 활용해 클라우드플레어에 호스팅된 정적 웹사이트에서 모델 가중치를 불러와 추론을 수행하며, ONNX 대비 성능과 호환성 문제를 개선했습니다. GGUF 포맷 모델을 WGSL로 실행하는 추론 런타임을 개발해, 최대 250MB 크기의 모델을 브라우저 캐시에 저장해 오프라인 사용도 지원합니다. AI 개발자가 경량화된 LLM을 웹 환경에서 직접 테스트하고 활용할 수 있는 실용적 도구입니다.

https://chonklm.com

#llm #webgpu #inference #browser #gguf

chonklm — tiny language models, running offline in your browser

Tiny language models, running offline in your browser. On-device inference in 2 minutes.