Mastodawn

Alibaba’s new open‑source model Qwen3‑VL can scan two‑hour videos, achieving 96.5 % on DocVQA and 875 on OCRBench. The multimodal vision‑language system rivals the rumored GPT‑5 in document understanding. Dive into the results and see why the community is buzzing. #Qwen3VL #Alibaba #DocVQA #OCRBench

🔗 https://aidailypost.com/news/qwen3vl-scans-twohour-videos-hits-965-docvqa-875-ocrbench

michabbb Sep 1, 2024

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

GitHub - QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud. - QwenLM/Qwen2-VL

GitHub