TIL about the Pile dataset 886TB oops 886GB of text data, created in 2K20 which can be used for various purposes including LLM training
#Programming #Pile #program #OpenSource #LLM #slop #AI #technology #dataset
https://en.wikipedia.org/wiki/The_Pile_%28dataset%29?wprov=sfla1

🍵 