Deepu K Sasidharan

@deepu105
394 Followers
333 Following
244 Posts

Serial copy paster.
#JHipster co-lead.
Creator of #KDash.
Java Champion.
Staff Developer ๐Ÿฅ‘ #Okta.
Polyglot dev/Speaker/Author.
Follow for #Java, #Rust, #JS, #Go, #Kubernetes, and #DevOps content.
he/him

https://deepu.tech

Website & Bloghttps://deepu.tech
Linkedinhttps://www.linkedin.com/in/deepu05
GitHubhttps://github.com/deepu105
Twitterhttps://twitter.com/deepu105

How much overhead does an LLM launcher add?

Matched-flags benchmarks across AMD APU (Strix Halo), Apple Silicon, and NVIDIA. Wrapper overhead: every cell within 1% of raw llama-server. Proxy hop adds 0.45 ms median TTFT.

Where it gets interesting: Ollama is 41-72% slower decode on AMD APU. LM Studio's Vulkan wins decode on small/mid models but pays a 1-1.5 s TTFT tax.

Per-cell JSONs checked in. Reproducible with one make target.

https://deepu.tech/benchmarking-llamastash/

On benchmarks: LlamaStash spawns llama-server unmodified, so the wrapper better not add overhead. I measured it across AMD APU, Apple Silicon, and NVIDIA.

LlamaStash โ‰ก raw llama-server within โ‰ค1% on every cell, across 4 model sizes and 2 metrics.

Ollama 0.24 is 41โ€“72% slower decode on AMD APU. RAG prefill is catastrophic (4 min cold prefill on a 31B model). Numbers + methodology in the next post.

What you get out of the box:

โ€ข llamastash init โ€” first-run wizard that detects your hardware, installs llama-server, picks a GGUF that fits your VRAM, downloads it, and smoke-launches.
โ€ข Full TUI with vim-style nav, chat/embed/rerank tabs, in-TUI HuggingFace browser.
โ€ข CLI with --json as a stable agent contract and documented exit codes per failure class.
โ€ข Multi-model concurrency, daemon-on-demand.

Today I'm releasing LlamaStash 0.0.2: a zero-overhead, terminal-native launcher for llama.cpp.

One Rust binary that's a TUI, a CLI, a daemon, and an OpenAI-compatible proxy.

Demo below ๐Ÿงต

#Rust #LlamaCpp #LocalAI #LocalLLM #OpenSource #FOSS

KDash 1.0.0 is out ๐ŸŽ‰
A big milestone for the terminal UI dashboard for Kubernetes.

- direct shell into containers
- A Troubleshoot tab
- inline filter across views
- aggregate logs for workloads
- custom themes

Release notes: https://github.com/kdash-rs/kdash/releases/tag/v1.0.0

#Kubernetes #DevOps #Rust

Thanks to everyone who attended my #passkeys talk at #springio24
As promised here are the results from the usability poll.
Hello #DevoxxUK, join me tomorrow at 9AM at the coding cafe for
- A crash course on #oauth and #OIDC
- Spring boot app & api security
- Spring boot microservices security
- attempt at humor ๐Ÿ˜†
Bring your laptops to code

I'll be talking about #passkeys and the amazing technology behind it at
#devworld conference on 29th February.

Join me for an illustrated journey โœ๏ธ

Introducing JWT-UI ๐ŸŒŸ

A Terminal UI for decoding & encoding JSON Web Tokens written in #Rust.
Brought to you by @auth0 and @oktadev

๐Ÿ‘‰ https://github.com/jwt-rs/jwt-ui

Feedback and reshares appreciated ๐Ÿ™

GitHub - jwt-rs/jwt-ui: A command line UI for decoding/encoding JSON Web Tokens

A command line UI for decoding/encoding JSON Web Tokens - jwt-rs/jwt-ui

GitHub

Thank you J-Fall, this means a lot. It was a brand new topic for me and I'm happy to see the efforts paying off ๐Ÿ•บ

#passkeys #jfall #auth0