0 Followers
0 Following
1 Posts
You can get the resulting PPL but that’s only gonna get you a sanity check at best, an ideal world would have something like lmsys’ chat arena and could compare unquantized vs quantized but that doesn’t yet exist

My personal collection of interesting models I've quantized from the past week (yes, just week)

https://sh.itjust.works/post/15432590

My personal collection of interesting models I've quantized from the past week (yes, just week) - sh.itjust.works

So you don’t have to click the link, here’s the full text including links: >Some of my favourite @huggingface models I’ve quantized in the last week (as always, original models are linked in my repo so you can check out any recent changes or documentation!): > >@shishirpatil_ gave us gorilla’s openfunctions-v2, a great followup to their initial models: https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2 [https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2] > >@fanqiwan released FuseLLM-VaRM, a fusion of 3 architectures and scales: https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2 [https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2] > >@IBM used a new method called LAB (Large-scale Alignment for chatBots) for our first interesting 13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2 [https://huggingface.co/bartowski/labradorite-13b-exl2] > >@NeuralNovel released several, but I’m a sucker for DPO models, and this one uses their Neural-DPO dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2 [https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2] > >Locutusque, who has been making the Hercules dataset, released a preview of “Hyperion”: https://huggingface.co/bartowski/hyperion-medium-preview-exl2 [https://huggingface.co/bartowski/hyperion-medium-preview-exl2] > >@AjinkyaBawase gave an update to his coding models with code-290k based on deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2 [https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2] > >@Weyaxi followed up on the success of Einstein v3 with, you guessed it, v4: https://huggingface.co/bartowski/Einstein-v4-7B-exl2 [https://huggingface.co/bartowski/Einstein-v4-7B-exl2] > >@WenhuChen with TIGER lab released StructLM in 3 sizes for structured knowledge grounding tasks: https://huggingface.co/bartowski/StructLM-7B-exl2 [https://huggingface.co/bartowski/StructLM-7B-exl2] > >and that’s just the highlights from this past week! If you’d like to see your model quantized and I haven’t noticed it somehow, feel free to reach out :)

Interesting, hadn’t heard of it before today, but guess I don’t look at European car brands that often anyways
Ah I mean fair enough :) I don’t keep up much with car brands and ownerships, but still TIL haha
Huh, didn’t realize Volvo was primarily owned by a Chinese company, you got me there lol, genuinely always thought they were standalone and therefore a Swedish company

If you’re using text generation webui there’s a bug where if your max new tokens is equal to your prompt truncation length it will remove all input and therefore just generate nonsense since there’s no prompt

Reduce your max new tokens and your prompt should actually get passed to the backend. This is more noticable in models with only 4k context (since a lot of people default max new tokens to 4k)

I don’t understand the title, twitch isn’t mentioned anywhere in the article is it??
Colour me intrigued. I want more manufactures that go against the norm. If they put out a generic slab with normal specs at an expected price, I won’t be very interested, but if they do something cool I’m all for it

Stop making me want to buy more graphics cards…

Seriously though this is an impressive result, “beating” gpt3.5 is a huge milestone and I love that we’re continuing the trend. Will need to try out a quant of this to see how it does in real world usage. Hope it gets added to the lmsys arena!

itsme2417/PolyMind: A multimodal, function calling powered LLM webui.

https://sh.itjust.works/post/14191764

itsme2417/PolyMind: A multimodal, function calling powered LLM webui. - sh.itjust.works

> PolyMind is a multimodal, function calling powered LLM webui. It’s designed to be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features including: Internet searching with DuckDuckGo and web scraping capabilities. Image generation using comfyui. Image input with sharegpt4v (Over llama.cpp’s server)/moondream on CPU, OCR, and Yolo. Port scanning with nmap. Wolfram Alpha integration. A Python interpreter. RAG with semantic search for PDF and miscellaneous text files. Plugin system to easily add extra functions that are able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask) are written entirely by Mixtral.