Mastodawn

Running Gemma 4 locally with LM Studio's new headless CLI and Claude Code

https://ai.georgeliu.com/p/running-google-gemma-4-locally-with

Running Google Gemma 4 Locally With LM Studio’s New Headless CLI & Claude Code

LM Studio 0.4.0 introduced llmster and the lms CLI. Here is how I set up Gemma 4 26B for local inference on macOS that can be used with Claude Code.

George Liu

Show thread

martinald 3d ago

Just FYI, MoE doesn't really save (V)RAM. You still need all weights loaded in memory, it just means you consult less per forward pass. So it improves tok/s but not vram usage.

Show thread

charcircuit

You never need to have all weights in memory. You can swap them in from RAM, disk, the network, etc. MOE reduces the amount of data that will need to be swapped in for the next forward pass.