WebVid-10M

is a large-scale dataset of short videos with textual descriptions sourced from stock footage sites. The videos are diverse and rich in their content.

- 10.7M video-caption pairs.
- 52K total video hours.

https://maxbain.com/webvid-dataset

Train split:
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_train.csv

Validation split:
http://www.robots.ox.ac.uk/~maxbain/webvid/results_10M_val.csv

Video data coming soon.

#openscience #dataset #ai

Large Scale Text-Video Dataset

Containing 10M text-video pairs crawled from the web.

TempoFunk/webvid-10M Β· Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.