Gemma 4 QAT is here - now I’m waiting for Ollama TurboQuant so the full stack is ready: QAT, MoE, sparse-active models, smarter attention, and MTP speculative decoding. #Gemma4 #Ollama #TurboQuant #QAT #MoE #MTP #LocalAI
Gemma 4 QAT is here - now I’m waiting for Ollama TurboQuant so the full stack is ready: QAT, MoE, sparse-active models, smarter attention, and MTP speculative decoding. #Gemma4 #Ollama #TurboQuant #QAT #MoE #MTP #LocalAI

🧠 #Google ha rilasciato #Gemma 4 12B, che introduce il supporto alla Multi-Token Prediction (#MTP) e porta capacità multimodali avanzate su hardware consumer.

👉 Per approfondire: https://www.linkedin.com/posts/alessiopomaro_google-mtp-gemma-ugcPost-7468543000747601920-Lp1r/

___
✉️ 𝗦𝗲 𝘃𝘂𝗼𝗶 𝗿𝗶𝗺𝗮𝗻𝗲𝗿𝗲 𝗮𝗴𝗴𝗶𝗼𝗿𝗻𝗮𝘁𝗼/𝗮 𝘀𝘂 𝗾𝘂𝗲𝘀𝘁𝗲 𝘁𝗲𝗺𝗮𝘁𝗶𝗰𝗵𝗲, 𝗶𝘀𝗰𝗿𝗶𝘃𝗶𝘁𝗶 𝗮𝗹𝗹𝗮 𝗺𝗶𝗮 𝗻𝗲𝘄𝘀𝗹𝗲𝘁𝘁𝗲𝗿: https://bit.ly/newsletter-alessiopomaro

#AI #GenAI #GenerativeAI #IntelligenzaArtificiale #LLM 

Google DeepMind představil nový model Gemma 4 12B – a jeho největší předností je, že výkon na úrovni blízké většímu 26B modelu nabídne ve výrazně menší paměťové stopě, takže ho lze spustit lokálně na běžném laptopu s 16 GB RAM nebo unifikované paměti.

Co dělá Gemma 4 12B zajímavým?

Model přichází s unikátní „encoder-free“ architekturou, místo […]

https://zdrojak.cz/zpravicky/google-predstavil-gemma-4-12b-vykonny-ai-model-ktery-pobezi-i-na-vasem-laptopu/

New week, beautiful new slides: Run LLMs Locally

Now with Mellum2 from JetBrains!
A very fast coding model, requires only 10 GB RAM.

I also added LFM 2.5 from LiquidAI, updated translations with HY-MT2 from Tencent, added examples for wllama using re-ranking and structured output
and added thinking_budget_tokens to the curl examples.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly #jetbrains #mellum2

Takie okienka są na dole ekranów reklamowych MTP na przystankach w Poznaniu. Co się za nimi kryje? Czy jest tam kamera? Zapytanie do MTP wysłane, będziemy tutaj pisać, co odpisali 💪

#poznań #aktywizm #reklamoza #mtp #prywatność #informacjaPubliczna

A 10 year old Xeon is all you need (for 26B-A4B MTP Drafters without GPU)

https://point.free/blog/gemma-4-on-a-2016-xeon/

#HackerNews #Xeon #MTP #Drafters #10YearsOld #TechForAll #GPUAlternative

A 10 year old Xeon is all you need - point.free

Or running Gemma 4 on a 2016 Xeon with no GPU, 25 flags, 128 GB of DDR3, and a 25B-parameter MoE.

point.free

I HAD TO SWITCH OFF #MeetTheRepublicans #MtP.

First guest fmr VP #FrostyTheNoMan #Pence over & over again refers to "the radical left" while defending THE GUY WHO TRIED TO HAVE HIM KILLED BY SENDING AN ANGRY MOB TO ATTACK THE CAPITOL AND OVERTURN AN ELECTION. 🤬

Maybe if you say "Radical Left" a few dozen more times, everyone will forget all that, Mike.

You sure did. 🙄

New week, more slides: Run LLMs Locally

Now including wllama to run GGUF models inside your browser!

wllama uses llama.cpp, WebAssembly and WebGPU, bringing a completely new experience of LLMs into the web.
It has no 4 GB limitation and is faster than Transformers.js.

I also added translations using the HY-MT model from Tencent.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf

#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly

RT @TeksEdge: 🚀 Neue MTP-Unterstützung für Strix Halo veröffentlicht!

mehr auf Arint.info

#AI #AMD #MTP #Qwen #ROCm #StrixHalo #arint_info

https://x.com/TeksEdge/status/2058728175388262761#m

Arint - SEO+KI (@[email protected])

<p>RT @TeksEdge: 🚀 Neue MTP-Unterstützung für Strix Halo veröffentlicht!</p> <p><a href="https://arint.info/@Arint/116634638485149145">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #AMD #MTP #Qwen #ROCm #StrixHalo #arint_info</p> <p><a href="https://x.com/TeksEdge/status/2058728175388262761#m">https://x.com/TeksEdge/status/2058728175388262761#m</a></p>

Mastodon Glitch Edition