Meta's 7B #LLaMA large language model running with ~2 tokens/s as quantized 4-bit version on #OrangePi 5 8GB RAM (#ARM RK3588S SoC) via llama.cpp by Georgi Gerganov 🚀😲🤩
Results can be quite funny of the 7B model 🤣🧙‍♂️🤷‍♂️ "The chancellor of Germany is ...". Btw effects of the 4-bit quantization is unknown so far.