If you use it with a local backend (@[email protected], #llama.cpp , #mlx, #mistral-rs), every step runs on your device; nothing leaves your machine unless you configure a cloud provider (it supports EU-based ones, e.g. #Nebius @[email protected], or #Mistral).

GitHub - CrispStrobe/CrispSort...
GitHub - CrispStrobe/CrispSorter: AI-powered document organiser. Extracts text and/or sorts documents: Drop in a bunch of PDFs, DOCX files, or ebooks, and it extracts Document Text, identifies Title, Author, and Year, with a local or remote LLM, and moves them into folders, and/or keeps the extracted text.

AI-powered document organiser. Extracts text and/or sorts documents: Drop in a bunch of PDFs, DOCX files, or ebooks, and it extracts Document Text, identifies Title, Author, and Year, with a local ...

GitHub
yzma 1.11 is out, with more of what you need:
- Support for latest llama.cpp (>97% of functions covered)
- ROCm backend+benchmarks
- @arduino Uno Q install info
Go get it right now!
https://github.com/hybridgroup/yzma
#golang #llamacpp #yzma #arduino #unoq
GitHub - hybridgroup/yzma: Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration.

Go with your own intelligence - Go applications that directly integrate llama.cpp for local inference using hardware acceleration. - hybridgroup/yzma

GitHub

Запускаем LLM на AMD RX580: разбор проблем ROCm, Ollama и реальный GPU inference

3 дня борьбы с ROCm, RX580 и Ollama: как я запустил LLM на домашней видеокарте Я попытался запустить LLM inference на старой AMD RX580 через ROCm и Ollama в Kubernetes. GPU определялся, VRAM занималась, контейнеры запускались — но inference падал с ошибками hipMemGetInfo, а иногда просто выдавал бессмысленный текст. В статье — полный инженерный разбор:как диагностировать реальный GPU compute (а не просто VRAM usage), почему Vulkan помог найти root cause, какие версии ROCm и kernel оказались рабочими, и как добиться стабильной генерации ~42 tokens/sec на RX580. Читать расследование

https://habr.com/ru/articles/1010358/

#radeon #rx_580 #llm #ollama #llamacpp #docker #k8s #amd #legacy #mlops

Запускаем LLM на AMD RX580: разбор проблем ROCm, Ollama и реальный GPU inference

TL;DR Мы пытались запустить LLM inference на старой AMD RX580 (8 VRAM) через ROCm в Kubernetes. GPU корректно определялся, VRAM использовалась, но inference падал с ошибками...

Хабр

New update for the slides of my talk "Run LLMs Locally":

Now including Reranking, Qwen 3.5 (slower than Qwen 3, but includes Vision) and loading models with Direct I/O.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp

Plugable TBT5-AI enclosure lets Windows laptops run local AI with a desktop GPU

https://fed.brid.gy/r/https://nerds.xyz/2026/03/plugable-tbt5-ai-enclosure/

One more update for the slides of my talk "Run LLMs Locally":

Now including text to speech with Qwen3-TTS and Model Context Protocol.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp

I updated the slides for my talk "Run LLMs Locally":

Now including image generation with Qwen3 and content classification from the Qwen3Guard Technical Report paper.

https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf

#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai

@john #ollama is garbage, #qwen3_5 has many fixes in #llamacpp recently, it is not fully ready yet

Just a NixOS Phone running a reasoning LLM locally...

#nixos #llm #llamacpp

Less than 2 weeks until Embedded World & I will be at large on the expo floor! Let's chat about all things Go with microcontrollers, computer vision, & machine learning. Just look for Gopherbot.

#golang #tinygo #ew26 #embedded #computerVision #ml #openCV #llamacpp #yzma