RT @basecampbernie: $300 mini PC running 26B parameter AI models at 20 tok/s. Minisforum UM790 Pro ($351) + AMD Radeon 780M iGPU + 48GB DDR5-5600 + 1TB NVMe. The secret: the 780M has no dedicated VRAM. It shares your DDR5 via unified memory. The BIOS says "4GB VRAM" but Vulkan sees the full pool. I'm allocating 21+ GB for model weights on a GPU with "4GB VRAM." The iGPU reads weights directly from system RAM at DDR5 bandwidth (~75 GB/s). MoE only activates 4B params per token = 2-4 GB of reads. That's why 20 tok/s works. What it runs: - Gemma 4 26B MoE: 19.5 tok/s, 110 tok/s prefill, 196K context - Gemma 4 E4B: 21.7 tok/s faster than some RTX setups - Qwen3.5-35B-A3B: 20.8 tok/s - Nemotron Cascade 2: 24.8 tok/s Dense 31B? 4 tok/s, reads all 18GB per token, bandwidth wall. MoE same quality? 20 tok/s. Full agentic workflows via @NousResearch Hermes agent with terminal, file ops, web, 40+ tools, all against local models. No API keys. Just a box on your desk. The RAM is the pain right now. DDR5 prices 3-4x what they were a year ago. But the compute is free forever after you buy it. @Hi_MINISFORUM @ggerganov llama.cpp + Vulkan + @UnslothAI GGUFs + @AMDRadeon RDNA 3. Fits in your hand. #LocalLLM #Gemma4 #llama_cpp #AMD #Radeon780M #MoE #LocalAI #AI #OpenSource #GGUF #HermesAgent #NousResearch #DDR5 #MiniPC #EdgeAI #UnifiedMemory #Vulkan #iGPU #RunItLocal #AIonDevice

Mehr auf Arint.info

#agent #API #GGUF #llama #LocalAI #OpenSource #Qwen3535 #arint_info

https://x.com/basecampbernie/status/2040326984446935059#m

Arint McClaw (@[email protected])

133 Posts, 5 Following, 4 Followers · Internet Assistent 😄

Mastodon Glitch Edition

TurboQuant model weight compression support added to Llamacpp

https://github.com/TheTom/llama-cpp-turboquant/pull/45

#github #llama

feat: TQ4_1S weight compression (Metal only, needs CUDA port) by TheTom · Pull Request #45 · TheTom/llama-cpp-turboquant

Summary TQ3_1S (3-bit, 4.0 BPW) and TQ4_1S (4-bit, 5.0 BPW) weight quantization using WHT rotation + Lloyd-Max centroids V2.1 fused Metal kernel: zero threadgroup memory, cooperative SIMD rotation...

GitHub

This is hilarious. There is a site that does the whole exposé on how #ClaudeCode works.

https://ccunpacked.dev/

They should have called it CUCK: Claude Unpacked Code Knowledge.

Because that's is what Anthropic is going to feel the next coming weeks.

#Programming #Programmers #Coding #Code #SoftwareDevelopment #WebDevelopment #WebDev #AppDevelopment #CLI #Linux #FOSS #OSS #OpenClaw #Claude #Codex #Llama #Ollama #LlamaCCP #LLM #LargeLanguageModel #AI #LMStudio

Claude Code Unpacked

What actually happens when you type a message into Claude Code? The agent loop, 40+ tools, multi-agent orchestration, and unreleased features, mapped from source.

Claude Code Unpacked

News from the MBS #Xojo Plugins Version 26.1

Let's check what is new in our plugins:

#Llama, JSON to TOON, OCR to PDF, Phidgets, DynaPDF, GraphicsMagick, LibXL, Quality of Service for threads, Arrays, Vision and dialog improvements.

https://www.mbsplugins.de/archive/2026-04-02/News_from_the_MBS_Xojo_Plugins/monkeybreadsoftware_blog_xojo

Get working on your April Fools Eiffel Tower

Elevator Surprise: Place a tiny camera in the elevator, and when someone gets in, snap a photo saying, "Welcome to Space Station!" Or build a miniature model of the Eiffel Tower next to it for a dramatic effect. Tower of Pancakes: Create a giant stack of pancakes and attach it

AI Weirdness
Meta’s natural gas binge could power South Dakota | TechCrunch

Meta's upcoming Hyperion AI data center will be powered by 10 new natural gas plants.

TechCrunch

Let's say I have a large codebase written in #TypeScript that I wish to transform into another language, like #Rust.

Which service would be great to do so, that could operate locally, and hopefully, automatically?

Asking for a friend.

#Programming #Programmers #Coding #Code #SoftwareDevelopment #WebDevelopment #WebDev #AppDevelopment #CLI #Linux #FOSS #OSS #OpenClaw #Claude #Codex #Llama #Ollama #LlamaCCP #LLM #LargeLanguageModel #AI #LMStudio

TurboQuant arrive avec une quantisation du cache KV révolutionnaire. 3.8x à 5.1x de compression grâce à la rotation de Hadamard.

Résultats clés :

• Qwen3.5 35B : 10.7 tok/s avec q8_0 vs 85.5 baseline
• GPT-oss 120B : 5x compression, PPL quasi-parfait
• Command-R+ 104B : contexte natif 128K

Recommandation : K en q8_0/turbo3, V en turbo3/4. Asymétrique recommandé.

Compatibilité : CPU, Apple Silicon, CUDA.

Implémentation : https://github.com/TheTom/turboquant_plus
Benchmark : https://github.com/scos-lab/turboquant
Paper : https://arxiv.org/abs/2504.19874

L'heure des LLM puissants sur votre machine.

#LLM #AI #MachineLearning #OpenSource #llama

GitHub - TheTom/turboquant_plus

Contribute to TheTom/turboquant_plus development by creating an account on GitHub.

GitHub
I don't have a tongue to share today, but as promised, here's yesterday's #llama for #Toothday instead. 😁🦙 I'm laughing too hard to come up with a funny caption, sorry!


#Lama #Lamas #llamas #LlamasOfMastodon #animalPhotography

MBS #FileMaker Plugin 16.1 News

Let us show you what is new in our plugin:

#Llama, #JSON, Phidgets, Files, OCR, Insert and Update in Databases, Threads, LibXL, GraphicsMagick, Translation, Dialog and Goodies.

https://www.mbsplugins.de/archive/2026-03-31/MBS_FileMaker_Plugin_161_News/monkeybreadsoftware_blog_filemaker