You can get the resulting PPL but that’s only gonna get you a sanity check at best, an ideal world would have something like lmsys’ chat arena and could compare unquantized vs quantized but that doesn’t yet exist
My personal collection of interesting models I've quantized from the past week (yes, just week)
https://sh.itjust.works/post/15432590
My personal collection of interesting models I've quantized from the past week (yes, just week) - sh.itjust.works
So you don’t have to click the link, here’s the full text including links: >Some
of my favourite @huggingface models I’ve quantized in the last week (as always,
original models are linked in my repo so you can check out any recent changes or
documentation!): > >@shishirpatil_ gave us gorilla’s openfunctions-v2, a great
followup to their initial models:
https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2
[https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2] > >@fanqiwan
released FuseLLM-VaRM, a fusion of 3 architectures and scales:
https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2
[https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2] > >@IBM used a new
method called LAB (Large-scale Alignment for chatBots) for our first interesting
13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2
[https://huggingface.co/bartowski/labradorite-13b-exl2] > >@NeuralNovel released
several, but I’m a sucker for DPO models, and this one uses their Neural-DPO
dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2
[https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2] > >Locutusque, who has
been making the Hercules dataset, released a preview of “Hyperion”:
https://huggingface.co/bartowski/hyperion-medium-preview-exl2
[https://huggingface.co/bartowski/hyperion-medium-preview-exl2] >
>@AjinkyaBawase gave an update to his coding models with code-290k based on
deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2
[https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2] > >@Weyaxi
followed up on the success of Einstein v3 with, you guessed it, v4:
https://huggingface.co/bartowski/Einstein-v4-7B-exl2
[https://huggingface.co/bartowski/Einstein-v4-7B-exl2] > >@WenhuChen with TIGER
lab released StructLM in 3 sizes for structured knowledge grounding tasks:
https://huggingface.co/bartowski/StructLM-7B-exl2
[https://huggingface.co/bartowski/StructLM-7B-exl2] > >and that’s just the
highlights from this past week! If you’d like to see your model quantized and I
haven’t noticed it somehow, feel free to reach out :)
Interesting, hadn’t heard of it before today, but guess I don’t look at European car brands that often anyways
Ah I mean fair enough :) I don’t keep up much with car brands and ownerships, but still TIL haha
Huh, didn’t realize Volvo was primarily owned by a Chinese company, you got me there lol, genuinely always thought they were standalone and therefore a Swedish company
If you’re using text generation webui there’s a bug where if your max new tokens is equal to your prompt truncation length it will remove all input and therefore just generate nonsense since there’s no prompt
Reduce your max new tokens and your prompt should actually get passed to the backend. This is more noticable in models with only 4k context (since a lot of people default max new tokens to 4k)
I don’t understand the title, twitch isn’t mentioned anywhere in the article is it??
Colour me intrigued. I want more manufactures that go against the norm. If they put out a generic slab with normal specs at an expected price, I won’t be very interested, but if they do something cool I’m all for it
Stop making me want to buy more graphics cards…
Seriously though this is an impressive result, “beating” gpt3.5 is a huge milestone and I love that we’re continuing the trend. Will need to try out a quant of this to see how it does in real world usage. Hope it gets added to the lmsys arena!
itsme2417/PolyMind: A multimodal, function calling powered LLM webui.
https://sh.itjust.works/post/14191764
itsme2417/PolyMind: A multimodal, function calling powered LLM webui. - sh.itjust.works
> PolyMind is a multimodal, function calling powered LLM webui. It’s designed to
be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features
including: Internet searching with DuckDuckGo and web scraping capabilities.
Image generation using comfyui. Image input with sharegpt4v (Over llama.cpp’s
server)/moondream on CPU, OCR, and Yolo. Port scanning with nmap. Wolfram Alpha
integration. A Python interpreter. RAG with semantic search for PDF and
miscellaneous text files. Plugin system to easily add extra functions that are
able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask)
are written entirely by Mixtral.