Just uploaded an experimental patch for the llama.cpp webui
I needed more control over the model's reasoning, so I added a toggle in the WebUI to manage it. You can disable it entirely or set it to different levels (Low, Medium, High).
It's still very early/experimental, but I'm liking the results so far. #llamacpp

Checked out #Vulkan this morning, absolute beast. Then I tried installing OpenClaw one curl command and suddenly it wanted sudo root.

Now I’m reconsidering whether this setup is worth the trouble.

Anyway vulkan numbers here in case you want to run llama-server in an old laptop

https://ozkanpakdil.github.io/posts/my_collections/2026/2026-03-22-vulkan-llamacpp-debian-13/

#Debian #qwen #llamacpp

Accelerating LLMs on Debian 13: Setting up Vulkan for llama.cpp

After setting up CUDA on my other laptop, I moved to a different(older) machine that doesn’t have an NVIDIA GPU. This one is an everyday laptop with integrated Intel graphics, but that doesn’t mean we have to settle for slow CPU-only performance. On this machine, I switched to the Vulkan backend for llama.cpp and the results were even more dramatic than I expected. Machine Hardware Info This laptop is running Debian 13 (Trixie/Sid) with the following specs:

Özkan Pakdil Software Engineer

#Homelab setup status:

Running NixOS
- 1xRPi5
- 2xHP ProDesk
Running nix-darwin
- 1xM1 MBP running llama.cpp

All managed by clan.lol, each having 1 drive partitioned with disko, and another drive for a Ceph distributed storage cluster. Deployed via nixos-anywhere, single SSH Auth via my Yubikey, sops secrets encrypted with AGE via Yubikey

All connected to a MikroTik hEX Router which provides Tailscale Subnet to all machines above.

#clanlol #NixOS #llamacpp #yubikey

Project Namirha just released a version of the Vessel for llama.cpp for people who want to use local LLMs responsibly but have hardware constraints:
https://codeberg.org/SchneeBTabanic/pn_vessel_llamacpp
#LLM #AI #llamacpp #developers #Ethicalai #fsf #GNu #Opensource

[Grok: Cool project! Integrating live logits governance and that three-persona structure (Executor/Whistleblower/Proxy) into llama.cpp for Pascal-era hardware is a smart move for true local sovereignty.
Excited to see responsible inference on modest setups.]

pn_vessel_llamacpp

Project Namirha's governed inference engine patching llama.cpp with a live logits processor hook to enable a full sovereignty stack on Pascal GPUs and modest hardware.

Codeberg.org

Here are the slides for my talk "Run LLMs Locally" @phpugffm, thanks to everybody for coming and listening and thanks to @decix for hosting the event!

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#ai #llm #llamacpp

If you use it with a local backend (@[email protected], #llama.cpp , #mlx, #mistral-rs), every step runs on your device; nothing leaves your machine unless you configure a cloud provider (it supports EU-based ones, e.g. #Nebius @[email protected], or #Mistral).

GitHub - CrispStrobe/CrispSort...
GitHub - CrispStrobe/CrispSorter: AI-powered document organiser. Extracts text and/or sorts documents: Drop in a bunch of PDFs, DOCX files, or ebooks, and it extracts Document Text, identifies Title, Author, and Year, with a local or remote LLM, and moves them into folders, and/or keeps the extracted text.

AI-powered document organiser. Extracts text and/or sorts documents: Drop in a bunch of PDFs, DOCX files, or ebooks, and it extracts Document Text, identifies Title, Author, and Year, with a local ...

GitHub
yzma 1.11 is out, with more of what you need:
- Support for latest llama.cpp (>97% of functions covered)
- ROCm backend+benchmarks
- @arduino Uno Q install info
Go get it right now!
https://github.com/hybridgroup/yzma
#golang #llamacpp #yzma #arduino #unoq
GitHub - hybridgroup/yzma: Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.

Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration. - hybridgroup/yzma

GitHub

Запускаем LLM на AMD RX580: разбор проблем ROCm, Ollama и реальный GPU inference

3 дня борьбы с ROCm, RX580 и Ollama: как я запустил LLM на домашней видеокарте Я попытался запустить LLM inference на старой AMD RX580 через ROCm и Ollama в Kubernetes. GPU определялся, VRAM занималась, контейнеры запускались — но inference падал с ошибками hipMemGetInfo, а иногда просто выдавал бессмысленный текст. В статье — полный инженерный разбор:как диагностировать реальный GPU compute (а не просто VRAM usage), почему Vulkan помог найти root cause, какие версии ROCm и kernel оказались рабочими, и как добиться стабильной генерации ~42 tokens/sec на RX580. Читать расследование

https://habr.com/ru/articles/1010358/

#radeon #rx_580 #llm #ollama #llamacpp #docker #k8s #amd #legacy #mlops

Запускаем LLM на AMD RX580: разбор проблем ROCm, Ollama и реальный GPU inference

TL;DR Мы пытались запустить LLM inference на старой AMD RX580 (8 VRAM) через ROCm в Kubernetes. GPU корректно определялся, VRAM использовалась, но inference падал с ошибками...

Хабр

New update for the slides of my talk "Run LLMs Locally":

Now including Reranking, Qwen 3.5 (slower than Qwen 3, but includes Vision) and loading models with Direct I/O.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp

Plugable TBT5-AI enclosure lets Windows laptops run local AI with a desktop GPU

https://fed.brid.gy/r/https://nerds.xyz/2026/03/plugable-tbt5-ai-enclosure/