Compare llama.cpp speeds on a 16 GB GPU for dense and MoE models at 19K, 32K, and 64K context. Tables list VRAM, GPU load, and tokens per second.
#Self-Hosting #LLM #AI #Hardware #NVidia
https://www.glukhov.org/llm-performance/benchmarks/best-llm-on-16gb-vram-gpu/

16 GB VRAM LLM benchmarks with llama.cpp (speed and context)
Compare llama.cpp speeds on a 16 GB GPU for dense and MoE models at 19K, 32K, and 64K context. Tables list VRAM, GPU load, and tokens per second.
Rost Glukhov | Personal site and technical blog