Fairly mind-blowing deep dive into a big and widely used input data set in the AI/ML space, LAION-5B. Beautifully presented. Strongly recommended.
https://knowingmachines.org/models-all-the-way
Models All The Way Down

LAION-5B is an open-source foundation dataset. It contains 5.8 billion image and text pairs—a size too large to make sense of. We follow the construction of the dataset to better understand its contents, implications and entanglements.

@timbray No feed?! Damn. That’s dumb.
@timbray moral : we should make websites less accessible so that data crawlers can't meaningfully use our data . (:joke:)
@timbray If the models really are worthwhile, we need a international WPA-project to index and annotate all human knowledge. Make a clean dataset. If the models are worthwhile.