Common Corpus, an open training set for AI, goes global – and so should support for it

As many of the AI stories on Walled Culture attest, one of the most contentious areas in the latest stage of AI development concerns the sourcing of training data. To create high-quality large language models (LLMs) massive quantities of training data are required. In the current genAI stampede, many companies are simply scraping everything they can off the Internet. Quite how that will work […]

#aiAlliance #commonCorpus #curation #euAiAct #financeCommons #france #gdpr #github #legalCommons #llms #multilingual #openCulture #openGovernment #openScience #openSource #openWeb #pdf #permissiveLicensing #pleias #publicDomain #scraping #tokens #toxicity #wikimedia #youtube https://walledculture.org/common-corpus-an-open-training-set-for-ai-goes-global-and-so-should-support-for-it/
@aliceamour yes, because #PublicDomain doesn't exist in places like #Germany, only #PermissiveLicensing and #LapsedCopyright which lasts for Lifetime +70 years instead of the more reasonable 28 years it was originall designed for!
Copyright Forever Less One Day (REUPLOAD)

YouTube
What is BSD? Come to a conference to find out!

What is BSD? Come to a conference to find out!

@arturo182 @kicad That's nice but why didn't they chose CC-BY if they don't want to have share-alike provisions?

  • Maybe they just don't want others to paywall their files?

Either way it's a good thing:

  • OFC they don't do it out of the goodness of their hearts but because it'll result in #tinkerers using #KiCad, #LibreCad, etc. to use and buy said parts as they can and most likely will be shipped with aid programs.

Same applies even more for #PCB #assembly services:

  • They gonna buy said parts on reels if not multiples simply because customer demand will appear...

And that's how #OpenAccess and #PermissiveLicensing are even beneficial in #commercial settings!

@standingpad personally, I feel similar:

#GPLv3 is great when you want to commit #AssetDenial and enshure your project can't be "tivolized".

  • If you want to be real "anti-capitalist" go with #AGPLv3 so noone will use it in any commercial setting!

Tho I do think that #PermissiveLicensing like #0BSD works better long-term because #enforcement of the #GPL only backfires harder...

  • But don't take my word for it, when @landley has publicly admitted that this had harmed #BusyBox to the point of trashing the projects reputation.

I chose 0BSD for #OS1337 because I want to make a #GNU-free, minimalist #Linux.

  • OFC that only applies to the project and it's code: Linux is still #GPLv2-only and that's because GPLv3 is incompatible with how #licensing and #patents work and manufacturers of hardware can't bend reality to fit RMS's wishful thinking.