Quelques informations supplémentaires sur la journée du 21 mars à Jussieu, dont la composition de la table-ronde dans ce communiqué de presse. https://www.sorbonne-universite.fr/actualites/seulement-02-de-donnees-francophones-dans-lia-wikimedia-france-et-sorbonne-universite

#Wikipedia #Pleias #TeamESR

Common Corpus, an open training set for AI, goes global – and so should support for it

As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they can off the Internet. Quite how that will work […]

#aiAlliance #commonCorpus #curation #euAiAct #financeCommons #france #gdpr #github #legalCommons #llms #multilingual #openCulture #openGovernment #openScience #openSource #openWeb #pdf #permissiveLicensing #pleias #publicDomain #scraping #tokens #toxicity #wikimedia #youtube https://walledculture.org/common-corpus-an-open-training-set-for-ai-goes-global-and-so-should-support-for-it/

> Today, we are announcing #Amazon, #Meta, #Microsoft, #mistralai , and #Perplexity for the first time as they join our roster of partners, which includes #Google, #Ecosia, #Nomic, #Pleias, #ProRata, and #ReefMedia. All these organizations utilize #WikimediaEnterprise to integrate human-governed knowledge into their platforms at scale. By doing so, they help ensure that the work of our global volunteer community reaches billions of people with the accuracy and transparency that Wikipedia represents.

And that a good new for me.

#wikimedia #wikipedia #ai

https://enterprise.wikimedia.com/blog/wikipedia-25-enterprise-partners/

New Wikimedia Enterprise Partners: Wikipedia’s 25th Birthday

Amazon, Meta, Microsoft, Mistral AI, and Perplexity have officially joined the Wikimedia Enterprise ecosystem as we celebrate 25 years of Wikipedia. Discover how we provide the dedicated infrastructure to deliver human-governed knowledge to the world’s most influential platforms.

Wikimedia Enterprise
Statt Scraping: KI-Firmen schließen Verträge mit Wikipedia für Datenzugriff

Lange haben KI-Firmen für das KI-Training auf Wikipedia-Inhalte zugegriffen und dort die Serverlast steigen lassen. Nun nutzen immer mehr eine Alternative.

heise online
Really happy to see a new #copyleft -based #LLM , and this one seems to be more general-purpose than former attempts such as #PleIAs. The #Comma model is trained with #CommonPile, a new training pile with 8 TB of public domain and copyleft data. huggingface.co/papers/2506.052…

Paper page - The Common Pile v...
Paper page - The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Join the discussion on this paper page

Really happy to see a new #copyleft -based #LLM , and this one seems to be more general-purpose than former attempts such as #PleIAs. The #Comma model is trained with #CommonPile, a new training pile with 8 TB of public domain and copyleft data. huggingface.co/papers/2506.052…
Paper page - The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

Join the discussion on this paper page

Good news: since a company's training requires one, I finally found a locally-hosted #LLM, #PleIAs, trained solely with freely redistributable data. Bad news: it's so new, it hasn't been integrated with #LocalAI yet and I'm still tweaking YAML files around.
Bluesky

Bluesky Social
Good news: since a company's training requires one, I finally found a locally-hosted #LLM, #PleIAs, trained solely with freely redistributable data.
Bad news: it's so new, it hasn't been integrated with #LocalAI yet and I'm still tweaking YAML files around.
Ah, and I was about to download #PleIAs myself to test it. The AGPL share-alike restriction I don't mind, the problem is the non-commercial-licensed data would taint the license of the output. Any plans to filter the #CommonCorpus even further to prevent these issues? @dorialexander.bsky.social

RE: https://bsky.app/profile/did:plc:627gjfohrkofk73ict4hmb6p/post/3lcfu67ppds2n
Bluesky

Bluesky Social