After my repeated posts / boosts arguing that in OSS we’ve overemphasized licenses and underemphasized community, governance, and sustainability…I actually have a license question:

What’s the current thinking on licenses that lay the legal groundwork for action against people using OSS source code for LLM training without seeking permission or offering compensation?

1/2

The obvious answer is copyleft-type licenses.

(1) Has anybody done legal analysis on that beyond the obvious? I don’t think LLM training on copyleft code has been tested in court yet…? (Even LLM training on more restrictively licensed works seems to be surviving court challenge….)

(2) Are there copyleft licenses (i.e. “derived works must be similarly licensed”) out there that don’t have the Stink of Stallman on them? Or is GPL v3 still just the way to go despite the smell?

2/2

@inthehands Copyleft licenses that aren't GPL: The Mozilla Public License and the EUPL. There may be others, but those are the ones I know about.

If I ever were to start making software for the fun of it again, I would possibly use EUPL.

@datarama @inthehands

looking up what led to the EUPL could be interesting; but i think yous are making a big mistake:

there's a ton of escape hatches around copyright around the world: e. g. someone lobbied Japan so well that they codified a universal, non-revokable exception to copyright for training LLMs -- at least in Europe the training entity has to respect an opt-out

so you can't rely on copyright triggering at training time

what you described is instead contract law

@lbruno @datarama @inthehands contact law... so like Agents.md that says "by analyzing, or training on, data from this repository you agree to ..."

Edit: NVM, they make this point below

@wronglang @datarama @inthehands

yeah, specifically about this: you can't force terms on an agent and then try and bind the person running that agent to those terms, it's not really a thing

@lbruno @wronglang @datarama The possibility here — and to be clear, not saying courts would buy this, but! — the possibility here is that even if •training• is allowed use, the law could quite sensibly end up being that:

1. users of an LLM are responsible for how they use its output,

2. infringing reproduction of copyrighted work is infringement regardless of the technologies used for reproduction and transmission

2a. including LLMs,

3. licenses apply whenever the result would otherwise constitute infringement (this much is established; it’s how copyleft and CC licensing work, for example),

4. a software license can thus limit use of LLM-generated code whenever that code substantively reproduces code from the original project (which it often does), and thus

5. users of LLM output are legally exposed to licenses from the training material.

If training does become fair use in the the eyes of the legal system, it is thus still conceivable that a license could explicitly say “you can’t reproduce this code using an LLM;” that failing, it is certainly no great stretch to imagine that a copyleft license could extend to a project that uses LLM-generated code in the case where the LLM substantively reproduced its training material.

Again, not clear that this would make it through the gauntlet of billionaire regulatory capture — but I don’t think any of the points above are legally far-fetched at all.

@inthehands @wronglang @datarama

i deff agree with 99% everything you said; explicitly your numbered points are exactly how i think things work right now

this bit though: ```thus still conceivable that a license could explicitly say [...]``` is where i went “wait a minute” because you're trying to set terms using copyright that can only be set using contract law

in other words, there's a ton more leeway in what you can do in contracts

edit: this is an argument i first saw kemitchell.com

@inthehands @wronglang @datarama

https://writing.kemitchell.com/2020/12/27/War-on-License-Notices

that's probably the most succinct version of his argument, doesn't actually mention EULA. but mentions unilateral contracts

The War on License Notices

managing uncertainty at the fringes of open licensing

/dev/lawyer