An Open Training Set For AI Goes Global
https://fed.brid.gy/r/https://www.techdirt.com/2026/03/24/an-open-training-set-for-ai-goes-global/
An Open Training Set For AI Goes Global
https://fed.brid.gy/r/https://www.techdirt.com/2026/03/24/an-open-training-set-for-ai-goes-global/
As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they can off the Internet. Quite how that will work […]
#aiAlliance #commonCorpus #curation #euAiAct #financeCommons #france #gdpr #github #legalCommons #llms #multilingual #openCulture #openGovernment #openScience #openSource #openWeb #pdf #permissiveLicensing #pleias #publicDomain #scraping #tokens #toxicity #wikimedia #youtube https://walledculture.org/common-corpus-an-open-training-set-for-ai-goes-global-and-so-should-support-for-it/Comment les IA se nourrissent de livres piratés ?
https://web.brid.gy/r/https://korben.info/ia-entrainement-donnees-piratees-books3-common-cor.html