Compare llama.cpp speeds on a 16 GB GPU for dense and MoE models at 19K, 32K, and 64K context. Tables list VRAM, GPU load, and tokens per second.

#Self-Hosting #LLM #AI #Hardware #NVidia

https://www.glukhov.org/llm-performance/benchmarks/best-llm-on-16gb-vram-gpu/

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)

Compare llama.cpp speeds on a 16 GB GPU for dense and MoE models at 19K, 32K, and 64K context. Tables list VRAM, GPU load, and tokens per second.

Rost Glukhov | Personal site and technical blog