My personal collection of interesting models I've quantized from the past week (yes, just week)
https://sh.itjust.works/post/15432590
My personal collection of interesting models I've quantized from the past week (yes, just week) - sh.itjust.works
So you don’t have to click the link, here’s the full text including links: >Some
of my favourite @huggingface models I’ve quantized in the last week (as always,
original models are linked in my repo so you can check out any recent changes or
documentation!): > >@shishirpatil_ gave us gorilla’s openfunctions-v2, a great
followup to their initial models:
https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2
[https://huggingface.co/bartowski/gorilla-openfunctions-v2-exl2] > >@fanqiwan
released FuseLLM-VaRM, a fusion of 3 architectures and scales:
https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2
[https://huggingface.co/bartowski/FuseChat-7B-VaRM-exl2] > >@IBM used a new
method called LAB (Large-scale Alignment for chatBots) for our first interesting
13B tune in awhile: https://huggingface.co/bartowski/labradorite-13b-exl2
[https://huggingface.co/bartowski/labradorite-13b-exl2] > >@NeuralNovel released
several, but I’m a sucker for DPO models, and this one uses their Neural-DPO
dataset: https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2
[https://huggingface.co/bartowski/Senzu-7B-v0.1-DPO-exl2] > >Locutusque, who has
been making the Hercules dataset, released a preview of “Hyperion”:
https://huggingface.co/bartowski/hyperion-medium-preview-exl2
[https://huggingface.co/bartowski/hyperion-medium-preview-exl2] >
>@AjinkyaBawase gave an update to his coding models with code-290k based on
deepseek 6.7: https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2
[https://huggingface.co/bartowski/Code-290k-6.7B-Instruct-exl2] > >@Weyaxi
followed up on the success of Einstein v3 with, you guessed it, v4:
https://huggingface.co/bartowski/Einstein-v4-7B-exl2
[https://huggingface.co/bartowski/Einstein-v4-7B-exl2] > >@WenhuChen with TIGER
lab released StructLM in 3 sizes for structured knowledge grounding tasks:
https://huggingface.co/bartowski/StructLM-7B-exl2
[https://huggingface.co/bartowski/StructLM-7B-exl2] > >and that’s just the
highlights from this past week! If you’d like to see your model quantized and I
haven’t noticed it somehow, feel free to reach out :)
itsme2417/PolyMind: A multimodal, function calling powered LLM webui.
https://sh.itjust.works/post/14191764
itsme2417/PolyMind: A multimodal, function calling powered LLM webui. - sh.itjust.works
> PolyMind is a multimodal, function calling powered LLM webui. It’s designed to
be used with Mixtral 8x7B + TabbyAPI and offers a wide range of features
including: Internet searching with DuckDuckGo and web scraping capabilities.
Image generation using comfyui. Image input with sharegpt4v (Over llama.cpp’s
server)/moondream on CPU, OCR, and Yolo. Port scanning with nmap. Wolfram Alpha
integration. A Python interpreter. RAG with semantic search for PDF and
miscellaneous text files. Plugin system to easily add extra functions that are
able to be called by the model. 90% of the web parts (HTML, JS, CSS, and Flask)
are written entirely by Mixtral.
Introducing Nomic Embed: A Truly Open Embedding Model
https://sh.itjust.works/post/14187813
Introducing Nomic Embed: A Truly Open Embedding Model - sh.itjust.works
Open source Open data Open training code Fully reproducible and auditable Pretty
interesting stuff for embeddings, I’m going to try it for my RAG pipeline when I
get a chance, I’ve not had as much success as I was hoping, maybe this
english-focused one will help
InternLM2 models llama-fied - sh.itjust.works
Thanks to Charles [https://huggingface.co/chargoddard] for the conversion
scripts, I’ve converted several of the new internLM2 models into Llama format.
I’ve also made them into ExLlamaV2 while I was at it. You can find them here:
https://huggingface.co/bartowski?search_models=internlm2
[https://huggingface.co/bartowski?search_models=internlm2] Note, the chat models
seem to do something odd without outputting [UNUSED_TOKEN_145] in a way that
seems equivalent to <|im_end|>, not sure why, but it works fine despite
outputting that at the end.
First few quants are up: huggingface.co/…/WizardCoder-33B-V1.1-exl2
4.25 should fit nicely into 24gb (3090, 4090)
Smaller sizes still being created, 3.5, 3.0, and 2.4

bartowski/WizardCoder-33B-V1.1-exl2 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
WizardLM/WizardCoder-33B-V1.1 released!
https://sh.itjust.works/post/12150488

WizardLM/WizardCoder-33B-V1.1 released! - sh.itjust.works
Based off of deepseek coder, the current SOTA 33B model, allegedly has gpt 3.5
levels of performance, will be excited to test once I’ve made exllamav2 quants
and will try to update with my findings as a copilot model
Microsoft announces WaveCoder - sh.itjust.works
Paper abstract: > Recent work demonstrates that, after being fine-tuned on a
high-quality instruction dataset, the resulting model can obtain impressive
capabilities to address a wide range of tasks. However, existing methods for
instruction data generation often produce duplicate data and are not
controllable enough on data quality. In this paper, we extend the generalization
of instruction tuning by classifying the instruction data to 4 code-related
tasks and propose a LLM-based Generator-Discriminator data process framework to
generate diverse, high-quality instruction data from open source code. Hence, we
introduce CodeOcean, a dataset comprising 20,000 instruction instances across 4
universal code-related tasks,which is aimed at augmenting the effectiveness of
instruction tuning and improving the generalization ability of fine-tuned model.
Subsequently, we present WaveCoder, a fine-tuned Code LLM with Widespread And
Versatile Enhanced instruction tuning. This model is specifically designed for
enhancing instruction tuning of Code Language Models (LLMs). Our experiments
demonstrate that Wavecoder models outperform other open-source models in terms
of generalization ability across different code-related tasks at the same level
of fine-tuning scale. Moreover, Wavecoder exhibits high efficiency in previous
code generation tasks. This paper thus offers a significant contribution to the
field of instruction data generation and fine-tuning models, providing new
insights and tools for enhancing performance in code-related tasks.
Mixture of Experts Explained (Huggingface blog)
https://sh.itjust.works/post/10861455

Mixture of Experts Explained (Huggingface blog) - sh.itjust.works
Mistral releases version 0.2 of their 7B model
https://sh.itjust.works/post/10861335

Mistral releases version 0.2 of their 7B model - sh.itjust.works
Available in instruct only currently:
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
[https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2]
Mistral drops a new magnet download - sh.itjust.works
Early speculation is that it’s an MoE (mixture of experts) of 8 7b models, so
maybe not earth shattering like their last release but highly intriguing, will
update with more info as it comes out