After my repeated posts / boosts arguing that in OSS we’ve overemphasized licenses and underemphasized community, governance, and sustainability…I actually have a license question:

What’s the current thinking on licenses that lay the legal groundwork for action against people using OSS source code for LLM training without seeking permission or offering compensation?

1/2

The obvious answer is copyleft-type licenses.

(1) Has anybody done legal analysis on that beyond the obvious? I don’t think LLM training on copyleft code has been tested in court yet…? (Even LLM training on more restrictively licensed works seems to be surviving court challenge….)

(2) Are there copyleft licenses (i.e. “derived works must be similarly licensed”) out there that don’t have the Stink of Stallman on them? Or is GPL v3 still just the way to go despite the smell?

2/2

@inthehands Copyleft licenses that aren't GPL: The Mozilla Public License and the EUPL. There may be others, but those are the ones I know about.

If I ever were to start making software for the fun of it again, I would possibly use EUPL.

@datarama @inthehands

looking up what led to the EUPL could be interesting; but i think yous are making a big mistake:

there's a ton of escape hatches around copyright around the world: e. g. someone lobbied Japan so well that they codified a universal, non-revokable exception to copyright for training LLMs -- at least in Europe the training entity has to respect an opt-out

so you can't rely on copyright triggering at training time

what you described is instead contract law

@datarama @inthehands

similar in fact to an EULA -- which, since you're going against freedom #0 anyway, might as well forget the OSS idea entirely, and define a unilateral contract w/ the prohibitions you want and the penalties you wish

re: inference time, i believe only Microsoft offers indemnity to its users -- the idea being that “i used a tool to infringe on someone's copyright! ” is a confession, not exculpatory. copyright might help you against 3rd parties infringing on your stuff

@datarama @inthehands

personally i'd use private sharing under an EULA, and give up on Stallman's copyright hack entirely

[edit: your litany at the end re: community and governance is actually the same argument, now that i re-read it]

@lbruno @inthehands I've thought for a while that generative AI effectively kills FOSS for exactly this reason (among a few others).
@datarama @lbruno @inthehands that's a bit of a stretch since AI can't generate the human community involved in FOSS so it's not replacing the critical piece. So that can steal code from mastodon but they'll only ever use it to produce an x.com rather than anything interesting.

@wronglang @datarama @inthehands

my view is that only corporate OSS still has any benefit when AI can freely train on any source-available codebase

someone may want to publish some code/library, but the LLMs will be extrude copies of it, and remove any incentive to publish; you won't get users of your codebase

couple that w/ the desire to forbid/discriminate against some users like e.g. Palantir, i'd outright abandon copyleft in favour of a proprietary licence and an unilateral contract/EULA

@wronglang @datarama @inthehands

problem is that making that also source-available will also make the codebase data-minable by the LLM training orgs, so you'd need a contractual gate prior to showing the source-code; can't just host on github w/ a proprietary licence