After my repeated posts / boosts arguing that in OSS we’ve overemphasized licenses and underemphasized community, governance, and sustainability…I actually have a license question:

What’s the current thinking on licenses that lay the legal groundwork for action against people using OSS source code for LLM training without seeking permission or offering compensation?

1/2

The obvious answer is copyleft-type licenses.

(1) Has anybody done legal analysis on that beyond the obvious? I don’t think LLM training on copyleft code has been tested in court yet…? (Even LLM training on more restrictively licensed works seems to be surviving court challenge….)

(2) Are there copyleft licenses (i.e. “derived works must be similarly licensed”) out there that don’t have the Stink of Stallman on them? Or is GPL v3 still just the way to go despite the smell?

2/2

@inthehands If training LLMs is "fair use", then it doesn't matter what kind of license the code is under. Copyright doesn't apply.
@Azuaron We definitely seem to be heading for the point where •training• is fair use. I’m less sure about where the legal status is going to end up of •output• that is used in ways that would be legally infringing plagiarism if done by a human.

@inthehands The arguments I've seen AI companies making in court is that the output is fully the responsibility of the user, not the LLM, in the same way Adobe isn't responsible when people make things in Photoshop.

However, this doesn't make logical sense with the Copyright Office's stance that AI-generated material is public domain because it has no human authorship. If it has no human authorship, how can the user be responsible for the output?

I think the real test won't happen until someone naively makes something infringing with probably Sora. "Naively" in that they didn't type "pikachu driving a racecar", maybe don't even know about Pokemon, but the LLM still ends up generating pikachu driving a racecar. It's hard to say the user is the infringing party when they have no knowledge of what they're infringing, but also it will be basically impossible to say infringement did not occur.