YOLO-World: Real-Time Open-Vocabulary Object Detection
https://arxiv.org/abs/2401.17270
#HackerNews #YOLO #World #RealTime #ObjectDetection #OpenVocabulary #AIResearch
The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO-World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation.
@kdenlive been looking into this a little more, by running the app from the terminal, I have identified that the #kdenlive #ObjectDetection plugin is not correctly installing on #macos - you can see in my previous screenshot the Models folder is 64 Bytes, which means its empty. When you select a model to download, on the terminal you get error starting:
kf.kio.core: couldn't create worker: "Can not find 'kioworker' executable at '/Applications/kdenlive.app/Contents/MacOS,
1/2
@kdenlive @kde Can anyone help me use this #SAM2 #ObjectDetection in #kdenlive I got stuck at the first hurdle!
The plugin setting says it's installed, but the Effect Stack still just shows a "Configure" button? This is on macOS... what am I missing?