Unused Ryzen 9 / 32gb / RTX3080 so stuck Ubuntu on it and testing a few local LLMs.

Gemma4:12b is pretty impressive, anything else I should check out?

#localllm

Generative AI can run completely offline. By downloading a model's neural weights and using local hardware, you skip the cloud entirely for maximum privacy. #AI #TechEducation #MachineLearning #LocalLLM #EdgeComputing
https://blazetrends.com/unplugged-ai-how-large-language-models-generate-text-entirely-offline/?fsp_sid=37791
Unplugged AI: How Large Language Models Generate Text Entirely Offline

Learn how modern AI models generate text entirely offline by using downloaded neural weights and local hardware, skipping the cloud entirely.

Blaze Trends

Google should have named the 4 QAT series Gemma 4.1. Most people use quantized models (for a good reason!), and QAT, as verified by WebBrain’s benchmarks, is significantly superior to the original model, just like Qwen 3.6 is superior to Qwen 3.5.

https://www.webbrain.one/blog/gemma-4-31b-qat-planner-benchmark

#LocalLLM
#AI #BrowserAutomation
#WebBrain #OpenSource #LLM #Gemma4 #Gemma #Qwen

Gemma 4 31B QAT becomes the best local Gemma planner we have tested

The QAT w4a16 Gemma 4 31B run improves over the older Gemma 31B int4 result and narrowly beats Qwen 3.6 27B on strict first-action quality.

RE: https://social.wildeboer.net/@jwildeboer/116775461671762518

I've known about this back in early 2024 and it was pretty awesome first time I tried it, albeit with a really small model since I only had a 1050 Ti, Ryzen 3 1300X and 16 gigs of ram back then. I was getting like < 5 tk/s.

I mean it's to be expected but it's better than nothing when we don't have internet which happens sometimes.

This stuff's been out there for a while that I'm a little surprised people are only catching up with local stuffs

#ai #localllm #llm

🔥 We just published our Q4 local planner benchmark comparing local AI models for browser automation:

• DiffusionGemma-26B-A4B-it: 0.35s median, 84% accuracy — fastest!
• Gemma 4 12B Coder: 0.40s median, 84% accuracy
• Cohere North-Mini-Code 1.0: 0.38s median, 84% accuracy

All three tied on accuracy but DiffusionGemma was the fastest.

Full benchmark: https://www.webbrain.one/blog/local-planner-q4-june-2026

#LocalLLM #AI #BrowserAutomation #WebBrain #OpenSource #LLM

DiffusionGemma hits 0.35s median in the WebBrain local planner bench

Gemma 4 12B Coder, North Mini Code, and DiffusionGemma completed WebBrain's frozen local planner run; DiffusionGemma is fast but not yet reliable enough for WebBrain, and VibeThinker is not a tool-calling agent model.

2am. Triggered two model pulls, a 70B load, a cluster of cloud API agents, and seven daemons. All at once. 96GB unified memory.

Kernel panic.

Not 'do the models fit in RAM?' — fragmentation, in-flight buffers, filesystem cache, kernel allocations. All sharing the same pool. All spiking together.

Two queues. Local-heavy: serial. Cloud API: bounded parallel. Never cross-mix.

#LocalLLM #LLMOps #AppleSilicon #MLOps

KI finde ich stark, aus Datenschutzsicht aber heikel. Darum teste ich gerade, was lokal auf eigener Hardware geht. Falls ich das mal bei Kunden im Sozial- und Gesundheitsbereich einsetzen will, sind lokale Modelle Pflicht, selbst EU-Clouds sind mir da zu grenzwertig.
Ohne die großen Speicher- und Geldfresser: Ich lasse die KI nicht rechnen, sondern coden. Das Modell schreibt die Skripte, die Arbeit macht deterministischer Code.
Soll auf einer RTX 5060 Ti 16 GB vernünftig laufen. Freue mich auf den Austausch und eure Erfahrungswerte. #LocalLLM #Datenschutz #Selfhosting

The question is, is building a NAS next or a local LLM for my home assistant next.

#localLLM #NAS #homeassistant

New Blog Post on Local AI Model Benchmarks I benchmarked local LLMs on a mid-tier gaming PC. Qwen3.5:9B is still a very strong model, even compared to Gemma4:26B; and a tiny (in size, not parameters) ternary model punched above its weight. #AI #LLM #LocalLLM 🤖💻 www.strakul.com/blog/posts/d...

Data Science: Local AI Model B...
Data Science: Local AI Model Benchmarks

A look at the performance of local AI models on a mid-tier gaming PC.

Strakul’s Thoughts
LLM Fit Check — will the model run on your box?

Check if any LLM fits in your GPU VRAM, Apple unified memory or RAM. Live Hugging Face data, exact GGUF quant sizes and KV-cache math.

LLM Fit Check