Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

https://github.com/xaskasdf/ntransformer

#HackerNews #Llama3.1 #RTX3090 #NVMe #GPU #bypass #CPU #AItechnology

GitHub - xaskasdf/ntransformer: High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090.

High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090. - xaskasdf/ntransformer

GitHub