Llama 3.1 AI Models Have Officially Released
Llama 3.1 AI Models Have Officially Released
At long context, Nemo is way better than llama 8B in my testing.
Turns out they are both very sensitive to quantization though.
Ah, that’s a wonderful use case. One of my favourite models has a storytelling lora applied to it, maybe that would be useful to you too?
At any rate, if you’d end up publishing your model, I’d love to hear about it.
[Oh, my friend, you have to switch to this: huggingface.co/BeaverAI/mistral-doryV2-12b
It’s so much smarter than llama 13B. And it goes all the way out to 128K!
A 3090.
But it should be fine on a 3060
Dump ollama for long context. Grab a 6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webu- and tabbyapi (with some other frontend) will also load them.