RT @bnjmn_marie: Gemma 4 12B ist schwer zu quantisieren. Ich konnte kein GGUF unter 6,4 GB finden, das gut funktioniert. MoQ 4.25 ist immer noch ziemlich gut und spart im Vergleich zu UD Q4KXL 0,9 GB. Unterhalb dieses Wertes würde ich keines der anderen GGUFs empfehlen, die ich bewertet habe.

mehr auf Arint.info

#Gemma4 #GGUF #LLM #MoQ #Quantisierung #UDQ4_K_XL #arint_info

https://x.com/bnjmn_marie/status/2064097932459069716#m

Arint - SEO+KI (@[email protected])

<p>RT @bnjmn_marie: Gemma 4 12B ist schwer zu quantisieren. Ich konnte kein GGUF unter 6,4 GB finden, das gut funktioniert. MoQ 4.25 ist immer noch ziemlich gut und spart im Vergleich zu UD Q4KXL 0,9 GB. Unterhalb dieses Wertes würde ich keines der anderen GGUFs empfehlen, die ich bewertet habe.</p> <p><a href="https://arint.info/@Arint/116725230682811899">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#Gemma4 #GGUF #LLM #MoQ #Quantisierung #UDQ4_K_XL #arint_info</p> <p><a href="https://x.com/bnjmn_marie/status/2064097932459069716#m">https://x.com/bnjmn_marie/status/2064097932459069716#m</a></p>

Mastodon Glitch Edition

Локальный запуск openai/gpt-oss-20b MXFP4 GGUF на ноутбуке без дискретной видеокарты: практический тест на 32 GB RAM

Запустил openai/gpt-oss-20b в варианте MXFP4 GGUF на обычном ноутбуке без дискретной видеокарты: только CPU, встроенная Radeon 780M и общая оперативная память. Тест проводился на ASUS Vivobook S 16 M3607HA : Ryzen 7 260, Radeon 780M, 32 GB DDR5 5600, Windows 11 и LM Studio 0.4.16-1 x64. Модель ноутбука указываю не ради привязки статьи к конкретному устройству, а для воспроизводимости. В таких тестах важны не только CPU и RAM, но и охлаждение, лимиты питания и shared memory у встроенной графики. Главный вопрос был практический: можно ли реально пользоваться локальной 20B-моделью на ноутбуке с 32 GB RAM, если отдельной видеокарты нет? Сразу оговорюсь это не научный benchmark, а пользовательский case study на конкретном железе. Проверял скорость, расход RAM/CPU/GPU, поведение при разных лимитах контекста и качество ответов на технических prompt сценариях. 20B-модель, 32 GB RAM и встроенная графика

https://habr.com/ru/articles/1044950/

#локальные_LLM #openai_gptoss20b #GGUF #MXFP4 #LM_Studio #Radeon_780M #Ryzen #ноутбук_без_дискретной_видеокарты #Windows_11 #benchmark

Локальный запуск openai/gpt-oss-20b MXFP4 GGUF на ноутбуке без дискретной видеокарты: практический тест на 32 GB RAM

Запустил openai/gpt-oss-20b в варианте MXFP4 GGUF на обычном ноутбуке без дискретной видеокарты: CPU, встроенная Radeon 780M и общая оперативная память. Тест проводился на ASUS Vivobook S 16 M3607HA....

Хабр

RT @bnjmn_marie: Gemma 4 12B ist schwer zu quantisieren. Ich konnte kein GGUF unter 6,4 GB finden, das gut funktioniert. MoQ 4.25 ist immer noch ziemlich gut und spart im Vergleich zu UD Q4KXL 0,9 GB. Darunter würde ich keines der anderen GGUFs empfehlen, die ich evaluiert habe.

mehr auf Arint.info

#Gemma4 #GGUF #LLM #MachineLearning #MoQ #Quantisierung #arint_info

https://x.com/bnjmn_marie/status/2064097932459069716#m

Arint - SEO+KI (@[email protected])

<p>RT @bnjmn_marie: Gemma 4 12B ist schwer zu quantisieren. Ich konnte kein GGUF unter 6,4 GB finden, das gut funktioniert. MoQ 4.25 ist immer noch ziemlich gut und spart im Vergleich zu UD Q4KXL 0,9 GB. Darunter würde ich keines der anderen GGUFs empfehlen, die ich evaluiert habe.</p> <p><a href="https://arint.info/@Arint/116718156059677130">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#Gemma4 #GGUF #LLM #MachineLearning #MoQ #Quantisierung #arint_info</p> <p><a href="https://x.com/bnjmn_marie/status/2064097932459069716#m">https://x.com/bnjmn_marie/status/2064097932459069716#m</a></p>

Mastodon Glitch Edition

Локальная Gemma 4 на MacBook читает графики и таблицы — и врёт красивее, чем говорит правду

MacBook M3, 16 ГБ, никакого облака. Поставил свежую Gemma 4, написал инструмент: кидаешь картинку с графиком или таблицей — получаешь CSV. Три кейса из семи — идеально. На остальных модель начала врать, причём аккуратнее, чем говорила правду: вместо рваных реальных чисел подсовывала гладкие выдуманные. Разобрал по шагам — сетап на маке, грабли с llama.cpp, сам инструмент — и собрал карту, где локальному зрению можно верить, а где оно тихо галлюцинирует

https://habr.com/ru/articles/1044400/

#Gemma_4 #llamacpp #локальные_LLM #мультимодальные_модели #OCR #извлечение_данных_из_графиков #visionмодели #MacBook_M3 #GGUF #визуализация_данных

Локальная Gemma 4 на MacBook читает графики и таблицы — и врёт красивее, чем говорит правду

MacBook M3, 16 гигабайт, никакого облака. Свежая Gemma 4 берёт с картинки график и отдаёт CSV. Первые три кейса — идеально. На четвёртом модель начала врать. И врать...

Хабр

RT @Michaelzsguo: When the creator of Redis starts thinking about KV cache, pay attention. antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis. But “creator of Redis” is almost too small a label. Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world. Then Redis happened. The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis. That is very antirez. Start with a real bottleneck. Avoid unnecessary abstraction. Expose the right primitive. Make it fast enough that people rethink the category. Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub. It made memory programmable. That is why his return to local AI is so interesting. With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.” He is asking a very Redis-like question: What is the real primitive here? For LLMs, one answer is obvious: KV cache. Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck. antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM. His phrase is perfect: “The KV cache is actually a…

mehr auf Arint.info

#agent #API #DeepSeek #GGUF #GPT5 #Make #Redis #arint_info

https://x.com/Michaelzsguo/status/2061557147453038750#m

Arint - SEO+KI (@[email protected])

<p>RT @Michaelzsguo: When the creator of Redis starts thinking about KV cache, pay attention. antirez is Salvatore Sanfilippo, the Sicilian programmer best known for creating Redis. But “creator of Redis” is almost too small a label. Before Redis, he was already an old-school systems hacker. He built hping, worked in network security, and invented the idle scan technique. This was the packet-level, C-programming, Unix-hacker world. Then Redis happened. The origin was not glamorous. He was building LLOOGG, a real-time web analytics service, and needed something faster and simpler than the tools he had. So he created Redis. That is very antirez. Start with a real bottleneck. Avoid unnecessary abstraction. Expose the right primitive. Make it fast enough that people rethink the category. Redis did not win because it looked like a traditional database. It won because it gave developers direct access to useful data structures: strings, lists, hashes, sets, sorted sets, streams, pub/sub. It made memory programmable. That is why his return to local AI is so interesting. With ds4, or DwarfStar 4, antirez is not just building “another local inference engine.” He is asking a very Redis-like question: What is the real primitive here? For LLMs, one answer is obvious: KV cache. Most people treat KV cache as an implementation detail. It lives in RAM or HBM, grows with context, and quietly becomes the bottleneck. antirez looks at DeepSeek V4 Flash, compressed KV cache, modern MacBook SSDs, and says: maybe KV cache should not only live in RAM. His phrase is perfect: “The KV cache is actually a…</p> <p><a href="https://arint.info/@Arint/116679933526859653">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#agent #API #DeepSeek #GGUF #GPT5 #Make #Redis #arint_info</p> <p><a href="https://x.com/Michaelzsguo/status/2061557147453038750#m">https://x.com/Michaelzsguo/status/2061557147453038750#m</a></p>

Mastodon Glitch Edition

RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…

mehr auf Arint.info

#GGUF #huggingface #make #rest #science #SWE #Swe #arint_info

https://x.com/KyleHessling1/status/2057853098585108979#m

Arint - SEO+KI (@[email protected])

<p>RT @KyleHessling1: BREAKING! Qwopus 3.6 27B is LIVE! Thank you for your patience on this one, but I believe you'll find the wait was worth it! We've benchmarked this thing up and down, verified that it holds at least a 75.25% (152/202) in the initial 202 SWE bench solves. Not a full run of 500, but it shows the agentic coding quality from the original 27B is retained while adding all of the additional Qwopus benefits across many domains. As always, Jackrong is absolutely cooking here! COT quality has improved significantly through the inversion techniques from our Negentropy proof of concept. It also went through thorough curriculum training. You can check out the MMLU pro benchmarks on the model card, but it improved a whopping 10 points over the base model in physics, as well as meaningful jumps in Chemistry, business, and computer science. However, the best part is that I was able to build an entire survival shooter game using this local model entirely. I genuinely was blown away by the results, which you can play right now on my HF space (link in comments below). "Qwopus Commander" was completed in 9 turns of Qwopus 3.6! To test the new long context training, I made it re-output the entire 3000+ line program each turn, and it would make fixes and add features that I requested in large prompts, while perfectly replicating the entire rest of the game from context. What's more is that I did it all at Q8 KV cache quantization, and never had an issue over the entire 303k token run! IMPORTANT: Run it at --temp 0.75 to 1. Mess with it in that range for your use case. Higher temp actually…</p> <p><a href="https://arint.info/@Arint/116621893018625926">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #huggingface #make #rest #science #SWE #Swe #arint_info</p> <p><a href="https://x.com/KyleHessling1/status/2057853098585108979#m">https://x.com/KyleHessling1/status/2057853098585108979#m</a></p>

Mastodon Glitch Edition

RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, ein Anstieg von 1,4x vor nur zwei Tagen!

mehr auf Arint.info

#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info

https://x.com/danielhanchen/status/2055274688025378854#m

Arint - SEO+KI (@[email protected])

<p>RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, ein Anstieg von 1,4x vor nur zwei Tagen!</p> <p><a href="https://arint.info/@Arint/116587929399884850">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info</p> <p><a href="https://x.com/danielhanchen/status/2055274688025378854#m">https://x.com/danielhanchen/status/2055274688025378854#m</a></p>

Mastodon Glitch Edition

RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, eine Steigerung von 1,4x vor nur zwei Tagen!

mehr auf Arint.info

#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info

https://x.com/danielhanchen/status/2055274688025378854#m

Arint - SEO+KI (@[email protected])

<p>RT @danielhanchen: Qwen3.6 MTP Unsloth GGUFs laufen jetzt 1,8x schneller, eine Steigerung von 1,4x vor nur zwei Tagen!</p> <p><a href="https://arint.info/@Arint/116579426735022973">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#GGUF #llamacpp #MTP #Qwen3 #SpeculativeDecoding #Unsloth #arint_info</p> <p><a href="https://x.com/danielhanchen/status/2055274688025378854#m">https://x.com/danielhanchen/status/2055274688025378854#m</a></p>

Mastodon Glitch Edition
Behold, the riveting #exposé on #GGUF, the file format so revolutionary, it’s practically a single piece of digital art 🎨🤯. Witness as the author attempts to weave an epic tale out of a glorified zip file, while simultaneously acknowledging the sheer absence of anything remotely interesting. 🥱✨
https://nobodywho.ooo/posts/whats-in-a-gguf/ #digitalart #storytelling #technews #zips #HackerNews #ngated
What's in a GGUF, besides the weights - and what's still missing? - NobodyWho

What extra stuff is needed to properly run a language model? Besides the weights of a language model, what is the gguf metadata that we need to parse and use?

NobodyWho

What's in a GGUF, besides the weights – and what's still missing?

https://nobodywho.ooo/posts/whats-in-a-gguf/

#HackerNews #GGUF #AIweights #missingfeatures #technews

What's in a GGUF, besides the weights - and what's still missing? - NobodyWho

What extra stuff is needed to properly run a language model? Besides the weights of a language model, what is the gguf metadata that we need to parse and use?

NobodyWho