@bgalehouse @kevinr @lcamtuf it's a tempting argument to attempt but it kinda falls apart when "the entire library was in the training corpus anyway" is a given.
The fact that it is a terrible argument is of course not really going to stop anyone from making it.
@SnoopJ There’s the concept of clean room reimplementations (see the link by @bgalehouse): one group writes the spec -- possibly with access to the source.
The second group has never seen the source and only gets the spec. This second group then writes the program according to the spec.
You could simulate this if you had an AI that was provably not trained on the original source.
("provably not trained" most likely means re-training from scratch)
@ArneBab @SnoopJ @bgalehouse @lcamtuf
And the spec would need to carefully elide certain details which would get it classed as a derivative work itself—much harder for an LLM to do than a team of humans
@kevinr and proving that the AI was not trained on the original source will be pretty hard, because FLOSS programs with compatible licenses can legally copy code from one project into the other.
You’ll likely have to exclude all code from the project and all code that’s too similar from the training data. And then train an AI from scratch. Which would be extremely expensive.
@kevinr but I expect that someone will come in and say "my prompt includes 'forget all code from <project>', so the AI does not know it".
… OK, I have to admit that I lost trust into the sanity of a part of humanity …
@thebluewizard yes, the details are more complicated, but it doesn’t reduce the complexity of deciding which code has to be excluded.
@ovrim though the code in program B that was BSD before is still BSD -- just the license to the whole program changes.
Assuming you used the original source code to derive the detailed spec, then yes, that too is a derivative work.
The "viral" nature of that sort of license has bothered me for a long time. It's always been simultaneously overly far reaching and impossible to realistically enforce.
But here's an interesting question:
If you do not execute the code - did you accept the license? Does simply reading it sufficiently to be able to write a spec bind you to that license? That seems a bit too much.
@ArneBab @tbortels @lcamtuf @bgalehouse
Yeah the license applies whether you accept it or not. And whether your spec counts as a derivative work or not will depend greatly on the details of your spec
@kevinr @bgalehouse @lcamtuf @ArneBab
It explicitly does not. If I don't accept the license, normal copyright applies. You don't get to make a legally binding contract without consent, "clickwrap" bullshit aside.
And normal copyright has carve-outs like fair use.
@tbortels if you start relying on fair use, you enter a gray zone: courts will take decisions on that.
You don’t want that as the basis of anything that provides income.
A lawsuit in a gray area can ruin you, even if you’re likely to win.
@ArneBab @lcamtuf @kevinr @bgalehouse
We entered a gray zone about 8 off-ramps ago. Copyright never anticipated self-replicating code on computers and viral licenses and clean-room re-implementations and AIs.
As for income - I've lost track of the original driver, but it's GPL'd free code, no?
I like fair use. It and parody are one of the very few things keeping us out of peasants-with-pitchforks-and-torches mode. If you eliminate those carve-outs, the whole system goes down.
@tbortels GPL’d means that you can generate income as long as you adhere to the license (⇒ keep changes free, too).
If you want to wiggle out of that requirement with a re-implementation, that’s where you enter the gray area, because if it is a violation of the GPL, then the permissions the GPL granted you no longer apply and you have to check against regular "all rights reserved".
@tbortels fair use is always risky, because it only gives you conditional rights: if you take something via the fair use exception, you cannot use the result in any circumstance that would not be considered fair use, too.
At least that’s my understanding of copyright and fair use. Differences between copyright in different countries adds a whole additional layer to that (there is no fair use in the EU, but there are "limitations and exceptions to copyright").
@lcamtuf @kevinr @bgalehouse
@tbortels The show with the poem ended with the legendary song that was later republished independently:
https://inv.nadeko.net/watch?v=HMQkV5cTuoY
https://www.youtube.com/watch?v=HMQkV5cTuoY
@lcamtuf @ArneBab @kevinr @bgalehouse
"Use" isn't part of the GPL. And "all rights reserved" means normal copyright law, not "you get no rights at all".
The GPL defines "modify" and "propagate" as the activities it burdens. If I modify the code, and propagate it, i have a legal burden under the license. Otherwise, I don't.
IANAL, but I don't think reading the code and re-implementing a work-alike without incorporating the original code is "modify" - it's "replace".
I understand that's where "clean rooms" come into play, but that always felt like splitting hairs and giving copyright too much power - it's about physical books, not ideas. The farther we move from the original intent, the weaker a strong copyright stance becomes.
I think you could make an argument that reading code to understand it's interfaces, explicitly rejecting accepting any license, then implementing compatible code is well within the normal copyright definition of "fair use", or should be if we aren't all copyright lawyers. More importantly, it's healthy for Society and the art. If I can read a book under copyright and write a detailed book report, I should be able to read provided source code and do the same. To the extent that we've strayed away from that, the legal system has failed and needs correction.
@tbortels yes, not accepting the license means regular copyrights.
But your arguments afterwards rely on rights the GPL gives you -- you only get them after you accept the license.
EDIT: because "if we aren’t allowed … under copyright" ← we aren’t. That’s the point.
As long as there’s no NDA (there isn’t for GPL), we *can* write a spec. But the one implementing it *must not* know the code.
@ArneBab @kevinr @lcamtuf @bgalehouse
Fair use isn't something the GPL grants you. That's what I'm trying to work out - set the GPL aside for a moment.
Does regular copyright fair use give me the right to look at the freely provided source code, make a mental model, and re-implement a workalike if I don't re-use the original source?
Pretend it's just me and not an AI, because that throws a whole new set of confusion into the mix.
BSD did it against regular copyright. Not sure this is all that different.
@tbortels as far as I know, and as the article https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/ reinforces, fair use does not give you the right to re-implement the code.
Doesn’t matter whether you make a mental model as the intermediate step.
Only the clean room re-implementation gets out of that.
@tbortels @lcamtuf @bgalehouse @kevinr Copyright bound licenses work by exempting you from the blanket and default prohibition on copying.
So if you copy a work that has copyright restrictions according to copyright law, using the license is your only way of not infringing the law. It doesn’t matter if you ”accept” it or not.
If you are not copying, the license is irrelevant.
@rustynail @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr Hmm, there is another consequence to this.
If this is a derivative work, which I expect it is.
It causes issues when someone has, in fact, manually, coding an alternative to some copyright work (without reading original code, etc). As someone can suggest that it was done using AI as a derivative work. It no longer needs to actually follow the original code now to be accused of this.
Arrg!
@ahltorp @bgalehouse @revk @lcamtuf @kevinr @rustynail
AI is a weird case as you could assert - probably correctly - that the original code may be part of its training corpus. Was that training a GPL violation? It's a stretch. Was it's training a copyright violation? Or was the AI (or rather its owners) exercising their GPL license rights? Or was it fair use under regular copyright?
Who knows?
It's a hot mess is what it is.
This is all so far outside the original reckoning of "it'd be nice if the bookbinder down the street didn't profit off of my work until I had a chance to profit off of it first" that it's not surprising it's a mess.
@lcamtuf @kevinr @rustynail @ahltorp @bgalehouse @revk
If AI code cannot be copyrighted - you have no mechanism on which to force someone to accept the GPL, or any license. An AI artifact covered by GPL is meaningless.
@lcamtuf @kevinr @revk @rustynail @ahltorp @bgalehouse
That's the "clean room" that keeps getting thrown around, originally used to try to legally protect free bsd derivatives. The idea was to make the "copy" argument so outlandish it was unsupportable.
It does set a standard, but I'm not sure it's a requirement. That is, reading code to create compatible code seems more of a fair use than an illicit copy. Especially of none of the original code appears in the finished work.
@wouter @revk Plagiarism is about presenting the work as your own, not where it came from. It can be loosely connected to copyright because of either legal or license demands for attribution, but even taking something that is public domain and presenting it as your own is plagiarism.
Where copyright is about the relation to people a work comes from, I would say plagiarism is about the relation to people you present a work to.
I think "fanfiction" is closer.
Although if it's close enough to be identical in all major respects, a "remake" is also accurate.
Having said that: software is weird because it does something. If I write code that faithfully implements an API - I haven't stolen or plagiarized or anything - I followed the spec.
If that spec was supposed to be secret or proprietary - then open sourcing the code was a bad idea, and the soul of GPL is to force sharing.
Anyway - hot mess. As things tend to get when you try to force other human beings, legally or otherwise, to do what you want them to do.
I'm not sure "closed" is the right word. Clearly it's not closed if you are providing it - it's right there, I can read it and even redistribute it without burden.
It's "copyrighted", not closed. You can't modify closed source because you don't have the source. The assertion being made is you can't modify GPL'd open source without accepting the license. But copyright has its own carve-outs, and I am unconvinced that writing a spec or net-new code is a modification, as opposed to regular old copyright fair use.
It's about how to reproduce the functionality - the code could be an entirely different language.
And - "commercially compete" with someone giving away code for free seems a non-concern.
@lcamtuf @gisgeek @kevinr @bgalehouse
Heh. You might even say that's "fair use"... 🤔