Mastodawn

michabbb Oct 4, 2024

#NVIDIA introduces #NVLM 1.0, a family of open-source #multimodal #LLMs:

🏆 Achieves state-of-the-art results on vision-language tasks, competing with #GPT4 and #Llama3V

📊 72B model outperforms on #OCRBench and #VQAv2 benchmarks

📈 Shows improved accuracy on text-only tasks after multimodal training

💻 Excels in #math, #coding, and #reasoning across modalities

🧠 Novel architecture enhances training efficiency and multimodal reasoning

🖼️ Introduces 1-D tile-tagging for improved performance on high-resolution images

🔬 Emphasizes dataset quality and task diversity over scale in training

🔗 Open-sourcing model weights and training code in Megatron-Core

Learn more: https://research.nvidia.com/labs/adlr/NVLM-1/

NVLM: Open Frontier-Class Multimodal LLMs

We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved text-only performance over its LLM backbone after multimodal training.

NVIDIA ADLR