This is now my go-to library when it comes to serving an llm.

PagedAttention, distributed serving, a *very* nice python interface… this project has it all!

https://github.com/vllm-project/vllm

#ai #vllm #opensource

GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

GitHub