QAD от NVIDIA: разбираюсь, почему 4-битная квантизация перестала всё ломать

NVIDIA выпустила отчет о методе QAD, который позволяет квантовать LLM в 4 бита без потери качества на сложных задачах (математика, код). Разбираем, почему привычный QAT «ломает» модели после RLHF, как дистилляция через KL-дивергенцию решает эту проблему и почему метод работает даже на рандомных данных. Личный опыт попыток уместить 49B модель в железо и анализ нового подхода.

https://habr.com/ru/articles/991586/

#LLM #Квантизация #NVIDIA #QAD #QAT #FP4 #Blackwell #Machine_Learning #Llama #Distillation

QAD от NVIDIA: разбираюсь, почему 4-битная квантизация перестала всё ломать

На прошлой неделе NVIDIA выложила отчёт про QAD и я его проигнорировал. Потому что каждый месяц кто-то "решает квантизацию" и каждый раз на практике всё не так радужно. Но потом коллега скинул...

Хабр

@itsecnews stellt sich die Frage ob andere #Intel #QuickAssistTechnology / #QAT / #QuickAssist - Produkte ebenfalls betroffen sind.

Gemma 3 QAT Models: Bringing state-of-the-Art AI to consumer GPUs- Google Developers Blog

Explore Gemma 3 models now offering state-of-the-art AI performance on consumer GPUs with new int4 quantized versions optimized with Quantization Aware Training (QAT).

Thoughts on #qat / #khat ?
Tried it and loved it
0%
Tried it and didn't love it
0%
Want to try it
50%
Not interested
50%
Poll ended at .

Who knew a 'sin tax' on #Khat could lead to #MentalHealth revolution in #Somaliland? Country's innovative approach is #Funding treatment for addicts, but let's not forget: #InternationalDonors, you're not off the hook yet! #MentalHealthMatters #Qat

https://saxafimedia.com/somaliland-sin-tax-mental-health/

In Somaliland, A Sin Tax For Mental Health Relief | Saxafi Media

The article “In Somaliland, a Sin Tax for Mental Health Relief” discusses an innovative approach in Somaliland to address mental health issues through the taxation of Khat, a locally consumed stimulant plant.

SaxafiMedia
Answering myself: LKCF seems to use the in-tree QAT driver by default for a bunch of algorithms. #QAT #linux #kernel
Is anyone using the upstreamed Intel #QAT kernel module for anything? Is it useful at all? All the Intel instructions seem to start with installing their stuff instead. #linux #kernel

Anyone else gone down the rabbit hole of #Intel #QAT support on #FreeBSD?

The performance boost is crazy, Server The Home has a great write up about it on the Xeon D CPU:
https://www.servethehome.com/welcome-to-the-intel-ice-lake-d-era-with-the-xeon-d-2700-and-d-1700-series/

It accelerates #OpenSSL
https://github.com/intel/QAT_Engine/tree/master

It also accelerates gzip with a QATzip module.

For #Nginx there is a great guide:
https://www.intel.com/content/www/us/en/developer/articles/guide/nginx-https-with-qat-tuning-guide.html

It looks like a lot of work to configure and test, but hopefully later this summer I can give it a go on a system . CPU that supports QAT.

#SysAdmin

Welcome to the Intel Ice Lake D Era with the Xeon D-2700 and D-1700 series

We get hands-on with Intel Xeon D-2700 and D-1700 platforms for the Ice Lake-D launch and share initial performance and power figures

ServeTheHome