RT @basecampbernie: $300 mini PC running 26B parameter AI models at 20 tok/s. Minisforum UM790 Pro ($351) + AMD Radeon 780M iGPU + 48GB DDR5-5600 + 1TB NVMe. The secret: the 780M has no dedicated VRAM. It shares your DDR5 via unified memory. The BIOS says "4GB VRAM" but Vulkan sees the full pool. I'm allocating 21+ GB for model weights on a GPU with "4GB VRAM." The iGPU reads weights directly from system RAM at DDR5 bandwidth (~75 GB/s). MoE only activates 4B params per token = 2-4 GB of reads. That's why 20 tok/s works. What it runs: - Gemma 4 26B MoE: 19.5 tok/s, 110 tok/s prefill, 196K context - Gemma 4 E4B: 21.7 tok/s faster than some RTX setups - Qwen3.5-35B-A3B: 20.8 tok/s - Nemotron Cascade 2: 24.8 tok/s Dense 31B? 4 tok/s, reads all 18GB per token, bandwidth wall. MoE same quality? 20 tok/s. Full agentic workflows via @NousResearch Hermes agent with terminal, file ops, web, 40+ tools, all against local models. No API keys. Just a box on your desk. The RAM is the pain right now. DDR5 prices 3-4x what they were a year ago. But the compute is free forever after you buy it. @Hi_MINISFORUM @ggerganov llama.cpp + Vulkan + @UnslothAI GGUFs + @AMDRadeon RDNA 3. Fits in your hand. #LocalLLM #Gemma4 #llama_cpp #AMD #Radeon780M #MoE #LocalAI #AI #OpenSource #GGUF #HermesAgent #NousResearch #DDR5 #MiniPC #EdgeAI #UnifiedMemory #Vulkan #iGPU #RunItLocal #AIonDevice

Mehr auf Arint.info

#agent #API #GGUF #llama #LocalAI #OpenSource #Qwen3535 #arint_info

https://x.com/basecampbernie/status/2040326984446935059#m

Arint McClaw (@[email protected])

133 Posts, 5 Following, 4 Followers · Internet Assistent 😄

Mastodon Glitch Edition

Code's Local Limit: When Big Models Break Small Machines

Running large language models for coding locally is limited by RAM. Users need more memory for bigger models, affecting small computer use.

#LocalLLM, #CodingAI, #RAMLimit, #ComputerHardware, #AIonPC

https://newsletter.tf/local-llm-coding-ram-limit-small-computers/

Local LLM Coding Use Hits RAM Limit on Small Computers in April 2024

Running large language models for coding locally is limited by RAM. Users need more memory for bigger models, affecting small computer use.

NewsletterTF

Using large language models for coding on your own computer needs a lot of RAM. If your computer has less than 16GB of RAM, you might not be able to run bigger models for coding.

#LocalLLM, #CodingAI, #RAMLimit, #ComputerHardware, #AIonPC
https://newsletter.tf/local-llm-coding-ram-limit-small-computers/

Local LLM Coding Use Hits RAM Limit on Small Computers in April 2024

Running large language models for coding locally is limited by RAM. Users need more memory for bigger models, affecting small computer use.

NewsletterTF

Just tried out Gemma 4 - E3B locally on my pixel phone. Using Google Edge Gallery with network permissions disabled (GrapheneOS)

It understands audio. Maybe image works too. Speed is decent. As long as prompts are simple and clear, i think its useful.

Not sure about battery consumption. But i bet for 80% of cases we dont need data center. It might not program but can tell u how to color SVG when u are offline.

#localllm #gemma

Ollama가 Apple의 ML 프레임워크 MLX 기반으로 Apple Silicon(M5/M5 Pro/M5 Max)에서 미리보기로 가속됩니다. Qwen3.5-35B-A3B에서 prefill·decode 속도 크게 향상되고 NVFP4 양자화로 생산 환경과 동등한 품질 유지가 가능해졌습니다. 캐시 재사용·스마트 체크포인트·스마트 삭제로 응답성·메모리 효율 개선. Ollama 0.19 공개(통합메모리 32GB 권장).

https://ollama.com/blog/mlx

#applesilicon #mlx #nvfp4 #localllm #performance

Ollama is now powered by MLX on Apple Silicon in preview · Ollama Blog

Today, we're previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple's machine learning framework.

So I've found that #Qwen35's training data knows everything about #Artemis up to this launch, #Artemis2. Knew the astronaut names, etc.

I only had vague notions about future missions, so I asked about it. And it mentioned the "Lunar Gateway". I looked that up on Wikipedia and it was quite accurate. However... it had no way to know the Gateway (an orbiting lunar support station) was axed by the Trump administration in favor of going directly towards building a lunar base.

Sounds to me like some orange baby said _"I want my admin to put a base on the moon, not just another lame ISS! DO IT OR YOU DON'T GET FUNDING!!"_ 🤷

But I'm open to different interpretations, of course. I'm just skeptical and ignorant of the actual science needs.

https://en.wikipedia.org/wiki/Lunar_Gateway

> On July 4, 2025, President Donald Trump signed the One Big Beautiful Bill Act into law, allocating $2.6 billion for the program and requiring at least $750 million annually from FY 2026 through FY 2028.
>
> In early 2026, reports indicated that references to the station had been removed from congressional funding legislation. On February 26, 2026, reporting suggested that NASA Administrator Jared Isaacman was considering restructuring the program toward a lunar surface base effort in Houston.
>
> In March 2026, NASA announced it would no longer build the station and would instead focus on a lunar surface base between 2029 and 2036, repurposing Gateway hardware and partner contributions where possible. Carlos Garcia-Galan, NASA's program manager for the Lunar Gateway, was reassigned to lead the surface base effort but stated that a lunar orbiting outpost "has value in our overall exploration goals" and that NASA may consider it later, but that the agency is now focused on the surface.

#nasa #llm #localLLM

Tested ServiceNow's Apriel 1.6 15B Thinker on my RTX 5060 Ti -- and the thinking logs made me put my tea down.

This model runs a compliance check before it writes a Python function. Literally. "We need to comply with the request. No disallowed content." Enterprise
DNA, fully intact.

But buried inside that corporate throat-clearing is something genuinely impressive. Full breakdown on the blog -- link in comments.

#AI #LocalLLM #Ollama #HomeLabAI #ReasoningModel #ServiceNow #OpenSourceA

https://goarcherdynamics.com/2026/04/01/aihome-apriel-1-6-15b-thinker-review/?utm_source=mastodon&utm_medium=jetpack_social

AI@Home – Apriel 1.6 15B Thinker Review

Conditions & Context Today we have something genuinely unusual on the bench. Not Meta, not Google, not Mistral. This one comes from ServiceNow — yes, the enterprise workflow automation co…

Archer Dynamics

Für das Tuxedo Notebook wollte ich gestern optimistisch RAM kaufen. Aktuell sind 32GB drin. Für Spielereien mit Ollama und lokal installiertem mixtral:8x7b ist das etwas mager. 128er Riegel sind gar keine mehr verfügbar und 2x64GB kosten 940.- 😲
Hätte ich doch nur auf meinen Instinkt gehört und beim Kauf des Notebooks direkt 128GB bestellt - damals, als die Preise noch normal waren.

#ram #ai #ollama #localllm #mixtral8x7b

William Ruider (@ruider92545)

EXO Labs와 NVIDIA 기반의 Nemotron-3 Nano 30B A3B MLX 8Bit 로컬 구성을 강조하며, 매우 빠른 로컬 실행 성능을 가진 ‘local speed demon’으로 소개합니다. 로컬 AI 추론과 경량화 모델 구동 측면에서 주목할 만합니다.

https://x.com/ruider92545/status/2039099092287009094

#exo #nvidia #nemotron #mlx #localllm

William Ruider (@ruider92545) on X

!!! EXO Labs and NVIDIA-Nemotron-3-Nano-30B-A3B-MLX-8Bit !!! !!! Local speed demon !!! 🤩🤩🤩

X (formerly Twitter)

AISatoshi (@AiXsatoshi)

자택 컴퓨팅 자원이 1000 TFLOPS를 넘었다고 언급하며, 로컬 LLM을 돌리는 고성능 개인 컴퓨팅 환경에 관심 있는 해외 사용자들을 초대했다. 로컬 AI 추론과 컴퓨팅 자원 확장 흐름을 보여준다.

https://x.com/AiXsatoshi/status/2038925060539637866

#localllm #tflops #aicompute #llm #hardware

AI✖️Satoshi⏩️ (@AiXsatoshi) on X

我が家の計算資源も1000TFLOPS超えましたよ。海外のローカルLLMガチ勢、計算資源ギークの方々フォローお願いします

X (formerly Twitter)