Llama 3.1 AI Models Have Officially Released
Llama 3.1 AI Models Have Officially Released
I haven’t given it a very thorough testing, and I’m by no means an expert, but from the few prompts I’ve ran so far, I’d have to hand it to Nemo concerning quality.
Using openrouter.ai, I’ve also given llama3.1 405B a shot, and that seems to be at least on par with (if not better than) Claude 3.5 Sonnet, whilst being a bit cheaper as well.
At long context, Nemo is way better than llama 8B in my testing.
Turns out they are both very sensitive to quantization though.
Ah, that’s a wonderful use case. One of my favourite models has a storytelling lora applied to it, maybe that would be useful to you too?
At any rate, if you’d end up publishing your model, I’d love to hear about it.
[Oh, my friend, you have to switch to this: huggingface.co/BeaverAI/mistral-doryV2-12b
It’s so much smarter than llama 13B. And it goes all the way out to 128K!
A 3090.
But it should be fine on a 3060
Dump ollama for long context. Grab a 6bpw exl2 quantization and load it with Q4 or Q6 cache depending on how much context you want. I personally use EXUI, but text-gen-webu- and tabbyapi (with some other frontend) will also load them.
I dunno, with image models specifically it seems like they’re the devil because of the datasets they’re trained on, killing artists, and… that’s that. And LLMs to a lesser extent.
I think most people don’t realize how much of an inflection point local running vs. corporate hosting could be, which is especially ironic on Lemmy.