Plugable TBT5-AI enclosure lets Windows laptops run local AI with a desktop GPU
https://fed.brid.gy/r/https://nerds.xyz/2026/03/plugable-tbt5-ai-enclosure/
Plugable TBT5-AI enclosure lets Windows laptops run local AI with a desktop GPU
https://fed.brid.gy/r/https://nerds.xyz/2026/03/plugable-tbt5-ai-enclosure/
One more update for the slides of my talk "Run LLMs Locally":
Now including text to speech with Qwen3-TTS and Model Context Protocol.
https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf
#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai #mcp
I updated the slides for my talk "Run LLMs Locally":
Now including image generation with Qwen3 and content classification from the Qwen3Guard Technical Report paper.
https://codeberg.org/thbley/talks/raw/branch/main/Run_LLMs_Locally_2025_ThomasBley.pdf
#llm #llamacpp #ollama #stablediffusion #gptoss #qwen3 #glm #opencode #localai
金のニワトリ (@gosrum)
Qwen3.5-27B-UD-Q4_K_XL을 llama.cpp로 추론 속도 평가한 결과, 모델이 VRAM에 올라갈 경우 RTX 5090이 매우 빠름을 확인. RTX 5090(1장) Prefill 약 2800 tps, Decode 약 60 tps. M2 Ultra(2장) Prefill 약 256 tps, Decode 약 18 tps.
GGML·llama.cpp, Hugging Face 합류, 로컬 추론 오픈소스 단일화
llama.cpp 제작팀 GGML이 Hugging Face에 합류. transformers와 llama.cpp 통합 가속화로 로컬 AI 오픈소스 생태계의 큰 변화를 소개합니다.