Mastodawn

How much overhead does an LLM launcher add?

Matched-flags benchmarks across AMD APU (Strix Halo), Apple Silicon, and NVIDIA. Wrapper overhead: every cell within 1% of raw llama-server. Proxy hop adds 0.45 ms median TTFT.

Where it gets interesting: Ollama is 41-72% slower decode on AMD APU. LM Studio's Vulkan wins decode on small/mid models but pays a 1-1.5 s TTFT tax.

Per-cell JSONs checked in. Reproducible with one make target.

https://deepu.tech/benchmarking-llamastash/

Show thread

Deepu K Sasidharan 1d ago

Built for my offline AI workstation. Maybe it fits yours too.

Repo: https://github.com/llamastash/llamastash
Release blog: https://deepu.tech/introducing-llamastash
Benchmarks: https://deepu.tech/benchmarking-llamastash

⭐ welcome on the repo. Bugs welcome on the issue tracker. Feedback welcome here.

#Rust #LlamaCpp #LocalAI #LocalLLM #OpenSource #FOSS

GitHub - llamastash/llamastash: A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs via llama.cpp with zero overhead

A fast terminal native app (TUI) and CLI with init wizard for launching local LLMs via llama.cpp with zero overhead - llamastash/llamastash

GitHub

Show thread

Deepu K Sasidharan 1d ago

Install LlamaStash:

curl -fsSL https://llamastash.dev/install.sh | sh

irm https://llamastash.dev/install.ps1 | iex

brew install llamastash/llamastash/llamastash

yay -S llamastash

cargo install llamastash

Then run `llamastash init` and you're chatting with a local model in a few minutes. Linux, macOS, and Windows 11 on day one.

Show thread

Deepu K Sasidharan 1d ago

On benchmarks: LlamaStash spawns llama-server unmodified, so the wrapper better not add overhead. I measured it across AMD APU, Apple Silicon, and NVIDIA.

LlamaStash ≡ raw llama-server within ≤1% on every cell, across 4 model sizes and 2 metrics.

Ollama 0.24 is 41–72% slower decode on AMD APU. RAG prefill is catastrophic (4 min cold prefill on a 31B model). Numbers + methodology in the next post.

Show thread

Deepu K Sasidharan 1d ago

The architecture is the punchline.

One binary, three personas: TUI, CLI, daemon. Invoked differently, same code.

The daemon spawns the *unmodified* upstream llama-server. Nothing patched. Nothing forked. Bearer-token loopback HTTP between TUI/CLI and daemon, same transport on Linux, macOS, and Windows.

Same primitives in your shell as in the UI. Anything a person can do, an agent can do via --json.

Show thread

Deepu K Sasidharan 1d ago

What you get out of the box:

• llamastash init — first-run wizard that detects your hardware, installs llama-server, picks a GGUF that fits your VRAM, downloads it, and smoke-launches.
• Full TUI with vim-style nav, chat/embed/rerank tabs, in-TUI HuggingFace browser.
• CLI with --json as a stable agent contract and documented exit codes per failure class.
• Multi-model concurrency, daemon-on-demand.

Show thread

Deepu K Sasidharan 1d ago

Local LLMs sit in an awkward gap. Raw llama-server is fast but tedious. Ollama and LM Studio wrap it in friendlier shells but hide too much and pay a real performance cost.

I wanted a launcher that stays out of llama.cpp's way and treats agents as first-class users. That's LlamaStash.

Deepu K Sasidharan 1d ago

Today I'm releasing LlamaStash 0.0.2: a zero-overhead, terminal-native launcher for llama.cpp.

One Rust binary that's a TUI, a CLI, a daemon, and an OpenAI-compatible proxy.

Demo below 🧵

#Rust #LlamaCpp #LocalAI #LocalLLM #OpenSource #FOSS

Deepu K Sasidharan Apr 9

KDash 1.0.0 is out 🎉
A big milestone for the terminal UI dashboard for Kubernetes.

- direct shell into containers
- A Troubleshoot tab
- inline filter across views
- aggregate logs for workloads
- custom themes

Release notes: https://github.com/kdash-rs/kdash/releases/tag/v1.0.0

#Kubernetes #DevOps #Rust

Deepu K Sasidharan Nov 15, 2025

JAVAPRO Nov 11, 2025

Passwords cost time, money, and trust. #Passkeys solve the problem with public-key cryptography. Stronger security and a better experience—no shared secrets. Learn how to use them in #Java apps with @deepu105.

Discover the future: https://javapro.io/2025/05/30/a-passwordless-future-passkeys-for-developers/

#WebAuthn #Passwordless

A Passwordless Future: Passkeys for Developers - JAVAPRO International

Passwords have been around for thousands of years and we were all happily sharing our Netflix passwords. They…

JAVAPRO International

Website & Blog	https://deepu.tech
Linkedin	https://www.linkedin.com/in/deepu05
GitHub	https://github.com/deepu105
Twitter	https://twitter.com/deepu105