🧠 #NVIDIA researchers are advancing SLMs through structured weight pruning and knowledge distillation, cutting down model size while preserving performance.

For instance, #Minitron 8B and 4B models, derived from Nemotron 15B, outperform many models trained from scratch. Despite its size, it competes with top-tier models like #Gemma2 and #Phi2.

src: https://developer.nvidia.com/blog/how-to-prune-and-distill-llama-3-1-8b-to-an-nvidia-llama-3-1-minitron-4b-model/

#AI #MachineLearning #NLP #LLM

How to Prune and Distill Llama-3.1 8B to an NVIDIA Llama-3.1-Minitron 4B Model | NVIDIA Technical Blog

Large language models (LLM) are now a dominant force in natural language processing and understanding, thanks to their effectiveness and versatility. LLMs such as Llama 3.1 405B and NVIDIA Nemotron-4…

NVIDIA Technical Blog