Z.ai just unveiled GLM‑4.6V, a single‑pass model that can ingest 150 PDFs, 200 slide decks, or a full hour‑long video and still deliver coherent inference, visual understanding and logical reasoning. Perfect for startups needing fast, open‑source AI on massive docs. Curious how it works? Dive into the details. #GLM46V #ZAI #AIInference #VisualUnderstanding

🔗 https://aidailypost.com/news/zais-glm-46v-handles-150-docs-200-slides-1hour-video-one-pass

#TechNews: #Qwen Releases New #VisionLanguage #LLM Qwen2-VL 🖥️👁️

After a year of development, #Qwen has released Qwen2-VL, its latest #AI system for interpreting visual and textual information. 🚀

Key Features of Qwen2-VL:

1. 🖼️ Image Understanding:

Qwen2-VL shows performance on #VisualUnderstanding benchmarks including #MathVista, #DocVQA, #RealWorldQA, and #MTVQA.

2. 🎬 Video Analysis:

Qwen2-VL can analyze videos over 20 minutes in length. This is achieved through online streaming capabilities, allowing for video-based #QuestionAnswering, #Dialog, and #ContentCreation. #VideoAnalysis

3. 🤖 Device Integration:

The #AI can be integrated with #mobile phones, #robots, and other devices. It uses reasoning and decision-making abilities to interpret visual environments and text instructions for device control. #AIAssistants 📱

4. 🌍 Multilingual Capabilities:

Qwen2-VL understands text in images across multiple languages. It supports most European languages, Japanese, Korean, Arabic, Vietnamese, among others, in addition to English and Chinese. #MultilingualAI

This release represents an advancement in #ArtificialIntelligence, combining visual perception and language understanding. 🧠 Potential applications include #education, #healthcare, #robotics, and #contentmoderation.

https://github.com/QwenLM/Qwen2-VL

GitHub - QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud. - QwenLM/Qwen2-VL

GitHub