This is now my go-to library when it comes to serving an llm.
PagedAttention, distributed serving, a *very* nice python interface… this project has it all!
This is now my go-to library when it comes to serving an llm.
PagedAttention, distributed serving, a *very* nice python interface… this project has it all!