New week, beautiful new slides: Run LLMs Locally
Now with Mellum2 from JetBrains!
A very fast coding model, requires only 10 GB RAM.
I also added LFM 2.5 from LiquidAI, updated translations with HY-MT2 from Tencent, added examples for wllama using re-ranking and structured output
and added thinking_budget_tokens to the curl examples.
https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2026_ThomasBley.pdf
#ai #llm #llamacpp #wllama #stablediffusion #qwen3 #glm #localai #gemma4 #webgpu #opencode #mtp #webassembly #jetbrains #mellum2





