Most curious that HomoFaciens videos got scraped into The Pile used to train generative AI models.
https://www.proofnews.org/youtube-ai-search/
Many larger maker channels seem to have been excluded, false negatives not withstanding. Mark Rober is there but no Colin Furze.
I'm guessing HomoFaciens was included because of the effort he puts in to add accurate subtitles. He also publishes English and German language versions of the same video - high quality training data

