I've contributed a fair bit to free content (CC licenses) & open source projects over the years.

Personally, I want "my" stuff to be used to make AI models better. I use open licenses precisely _because_ I want people to come up with interesting & hopefully beneficial new uses.

I understand why lots of folks feel differently, of course.

However, it's not a clear-cut legal situation, either. Training != inference; it's only model output that violates licenses that's unambiguously infringing.

Creative Commons itself takes the, IMO very reasonable, view that training AI models on copyrighted works constitutes fair use:
https://creativecommons.org/2023/08/18/understanding-cc-licenses-and-generative-ai/

The folks who are calling such training "theft" might regret what they seem to be implicitly asking for, i.e. much stricter copyright. Copyright law won't prevent Microsoft, Google, OpenAI or Adobe from making shady licensing deals, but they'll prevent the free/open community from keeping up.

Understanding CC Licenses and Generative AI - Creative Commons

Many wonder what role CC licenses, and CC as an organization, can and should play in the future of generative AI. The legal and ethical uncertainty over using copyrighted inputs for training, the uncertainty over the legal status and best practices around works produced by generative AI, and the implications for this technology on the…

Creative Commons

Some of the anti-#AI backlash seems to go hand in hand with an explicit or implicit defense and support for copyright -- a questionable institution that aggregates power with the Disneys & Apples of this world.

I am very skeptical that a just world is one that still makes heavy use of intellectual monopoly rights to secure individual incomes.

Copyright should, IMO, at best be regarded as a necessary evil, one which we have failed to rid ourselves of along with capitalism.

@eloquence I was told by a commons-specialist lawyer that it would be a breach of a CC-BY-NC licenses if I invited tips for such licensed films even if I gave 100% to the filmmakers (and obvs had their written permission).

It doesn’t seem reasonable then that CC restrains quite everyday ‘commercial’ use, while allowing LLM megacorps using toolsets far beyond the everyday user, to use any artists’ work to copy their style/subjects and compete with them. There needs to be a CC-No-LLM license.

@eloquence I also can’t see how any LLM wholesale copying and repurposing comes under fair use. In the four factors of fair use - purpose, nature, amount, effect (https://fairuse.stanford.edu/overview/fair-use/four-factors/) - on three of these, LLMs go far beyond the current norms of Fair Use.
Purpose - to run a commercial service imitating styles similar to the one copied.
Amount - everything!
Effect - to directly compete with commercial writers/musicians/animators/filmmakers/coders etc.
IMHO it’s just awaiting the class actions.
Measuring Fair Use: The Four Factors

Unfortunately, the only way to get a definitive answer on whether a particular use is a fair use is to have it resolved in federal court. Judges use four factors to resolve fair use disputes, as ...

Stanford Copyright and Fair Use Center

@nicol

That question ultimately will be settled in the courts, but there are certainly plenty of experts and civil society groups who wholly or partially agree with CC's interpretation.This submission by Pamela Samuelson (Professor @ Berkeley Center for Law & Technology and a widely recognized expert on fair use) et al. to the copyright office is worth a read:

https://www.regulations.gov/comment/COLC-2023-0006-8854

Regulations.gov

@nicol

In that blog post they're not commenting on CC license interactions with LLM; they're making the general comment that _regardless of license_, model training is likely fair use. If that is true, even if you made a CC anti-LLM license, it wouldn't matter.

@eloquence I think that when it comes to artists, I personally can understand viewing it as theft. The most popular AI tools charge people to use them and in that way they are saying that their work entitles them to money but the work of artists is fair to exploit.

Like yes, fuck companies and their copyrights, but profiting off of others work at the scale of AI is a level of exploitation that these companies couldn’t achieve before. Regardless of how much we hate the system we operate within, if large swaths of work were necessary to improve the quality of a model, there should be explicit consent or compensation

@cederbs

I disagree with the term "theft" in connection with information, but I do certainly view this as a cycle of exploitation that is emblematic for capitalism.

Google "organizing the world's information", slapping ads next to excerpts, and surveilling the shit out of everyone who uses their products, to me, is a version of the same thing. It's perfectly legal (which training AI models may also prove to be, depending on pending court cases), but also exploitative.

@cederbs

But the anti-AI lawsuits, if won, would not lead to benign or positive long term outcomes. Tightening copyright in this manner would benefit those who can exercise the most leverage using law.

In this way, the lawsuits are, in my view, potentially even _part_ of the cycle of exploitation (the exploited unwittingly aiding the exploiters), leading to new forms of profit extraction.

@cederbs

Moreover, aside from hope for a one-time settlement for litigants, it's not clear that it would do _anything_ to address the underlying issues. Models like Adobe Firefly are based on licensed work and available now, similarly for-profit. Such models are labor-displacing, potentially exploitative (depending on licensing deals), but almost certainly lawsuit-proof.

And successful lawsuits will just lead to more Fireflies, at best yielding pennies in licensing fees.

@cederbs

To your earlier point, permissively licensed models _are_ widely used today, increasingly so, for both text and image generation. To me, the way to escape cycles of exploitation is to build alternatives, and to figure out together how they will change the way we make & think.

But it's exactly those alternatives that will be stifled by successful lawsuits, while proprietary models will consolidate under few owners.

@eloquence yup, rather than attacking AI as a copyright issue, artists and writers should discuss what kind of digital business models that would benefit them. Everything that pirates said about corporate digitalization 15 years ago became true, but still most haven't caught up with the issues raised then.
@glynmoody
@audunmb @eloquence @glynmoody The purpose of these LLM business models is to redirect artists' and writers' income to the models’ proprietors. Asking, “What digital business models would benefit them,” is a nonsense question.
@krans that business model thrives in a centralized distribution model, like YouTube or Spotify. To challenge it you need to break up these semi-monopolies. If business models centered creators (in the way envisioned 15 years ago) instead of points of distributions the LLM model wouldn't work.
@eloquence @glynmoody
@eloquence The discussion I've seen is that "fair use" is an american concept and cannot be relied upon in an international setting.

@troed Sure, but provisions that offer comparable exemptions exist in other jurisdictions, e.g., in the EU:

https://felixreda.eu/2021/07/github-copilot-is-not-infringing-your-copyright/

GitHub Copilot is not infringing your copyright

Felix Reda
@eloquence Thank you - I was not up to date on that decision in the EU.