#NVIDIA introduces #NVLM 1.0, a family of open-source #multimodal #LLMs:
π Achieves state-of-the-art results on vision-language tasks, competing with #GPT4 and #Llama3V
π 72B model outperforms on #OCRBench and #VQAv2 benchmarks
π Shows improved accuracy on text-only tasks after multimodal training
π» Excels in #math, #coding, and #reasoning across modalities
π§ Novel architecture enhances training efficiency and multimodal reasoning
πΌοΈ Introduces 1-D tile-tagging for improved performance on high-resolution images
π¬ Emphasizes dataset quality and task diversity over scale in training
π Open-sourcing model weights and training code in Megatron-Core
Learn more: https://research.nvidia.com/labs/adlr/NVLM-1/