🌖 YOLO-World:即時開詞彙物件偵測
➤ 突破物件偵測的詞彙限制,實現更廣泛的應用
https://arxiv.org/abs/2401.17270
本文介紹了YOLO-World,一種透過視覺語言模型和大規模資料集預訓練,增強YOLO偵測器開詞彙偵測能力的新方法。YOLO-World 透過新穎的 RepVL-PAN 網路架構和區域-文本對比損失函數來促進視覺和語言資訊的互動,在零樣本學習的環境下,能有效偵測廣泛的物件。實驗結果顯示,YOLO-World 在LVIS 資料集上表現出色,在速度和準確度上均優於現有方法,並在下游任務如物件偵測和開詞彙實例分割中展現出卓越的效能。
+ 聽起來很有潛力,能在不需要重新訓練的情況下偵測到新的物件,這對於實際應用非常重要。
+ 速度和準確度都提升了,而且提供了程式碼和模型,這對於研究人員來說很有幫助!
#電腦視覺 #人工智慧 #物件偵測
YOLO-World: Real-Time Open-Vocabulary Object Detection

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.

arXiv.org