After my repeated posts / boosts arguing that in OSS we’ve overemphasized licenses and underemphasized community, governance, and sustainability…I actually have a license question:

What’s the current thinking on licenses that lay the legal groundwork for action against people using OSS source code for LLM training without seeking permission or offering compensation?

1/2

The obvious answer is copyleft-type licenses.

(1) Has anybody done legal analysis on that beyond the obvious? I don’t think LLM training on copyleft code has been tested in court yet…? (Even LLM training on more restrictively licensed works seems to be surviving court challenge….)

(2) Are there copyleft licenses (i.e. “derived works must be similarly licensed”) out there that don’t have the Stink of Stallman on them? Or is GPL v3 still just the way to go despite the smell?

2/2

OK, so apparently I shouldn’t have said “beyond the obvious,” and the obvious needs stating:

(1) Copyright licenses very clearly •do• allow the copyright holder to determine who may use a work and for what purposes, at least when such use would be otherwise prohibited without a license. That is how the law works. Rightly or wrongly, empires are built on this: “Streaming service XYZ may offer this song for streaming but not for download until this date.” Copyleft is one example of this principle in action.

(1a) Thing the thing presents discriminatory licensing (such as in Daniel’s strawmen) is anti-discrimination law, not copyright law.

(2) The reason copyleft specifically might prevent LLM usage is that •if• LLM output can be considered a derived work of the training material, then the output must also be licensed in the same way. That seems to me a thin reed: courts so far haven’t been willing to treat LLM output as derived work, even when the output includes things that would surely be considered plagiarism and grossly illegal if done by a human. But I don’t see another path to protection, and courts are still sorting this out…so.

https://mastodon.sdf.org/@dlakelan/116267990581623218

Daniel Lakeland (@[email protected])

@[email protected] Copyright gives you legal rights to determine who may make and distribute copies, and create derived works. it doesn't and shouldn't give you the right to determine who may use the work or for what purposes. imagine the consequences of giving copyright holders that right... "Jews and black people may not read or write book reviews of this novel" or "People who work for the Democratic party may not read the project 2025 document" or etc.

Mastodon @ SDF

I see somebody else is on this topic today! And yes, billionaires will use regulatory capture to the maximum extent they can get away with — so yes, I fully expect the AI lobby to advocate a tangled legal regime where LLM output is copyrighted but copying data to train an LLM is not a copyright violation.

https://social.coop/@cwebber/116266757533136607

Christine Lemmer-Webber (@[email protected])

Also, and I want to say more on this soon, but if you think that the big AI players are hoping for *anything but* them being able to put a legislative moat around themselves where output *is* copyrighted and training materials *are* restricted but they're the *only ones* able to play, you're being a fool. Their key goal is to capture rent on all intellectual pursuits.

social.coop