Fairly mind-blowing deep dive into a big and widely used input data set in the AI/ML space, LAION-5B. Beautifully presented. Strongly recommended. https://knowingmachines.org/models-all-the-way
Models All The Way Down
LAION-5B is an open-source foundation dataset. It contains 5.8 billion image and text pairs—a size too large to make sense of. We follow the construction of the dataset to better understand its contents, implications and entanglements.
@timbray If the models really are worthwhile, we need a international WPA-project to index and annotate all human knowledge. Make a clean dataset. If the models are worthwhile.