If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus
@lcamtuf actual answer: of course you do, it’s prima facie a derivative work, same as if you had rewritten the program by hand.

@kevinr @lcamtuf And if you ask it to write a detailed spec based on its implementation, and then separately to write an implementation of that spec?

https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/

Tales from 80s Tech: How Compaq’s Clone Computers Skirted IBM’s IP and Gave Rise to EISA

In the 1980s, Compaq was the first company to produce a portable IBM-compatible machine legally, but they flirted with breaking copyright law in the process.

All About Circuits

@bgalehouse @kevinr @lcamtuf it's a tempting argument to attempt but it kinda falls apart when "the entire library was in the training corpus anyway" is a given.

The fact that it is a terrible argument is of course not really going to stop anyone from making it.

@SnoopJ There’s the concept of clean room reimplementations (see the link by @bgalehouse): one group writes the spec -- possibly with access to the source.

The second group has never seen the source and only gets the spec. This second group then writes the program according to the spec.

You could simulate this if you had an AI that was provably not trained on the original source.

("provably not trained" most likely means re-training from scratch)

@bgalehouse @kevinr @lcamtuf

@ArneBab @SnoopJ @bgalehouse @lcamtuf

And the spec would need to carefully elide certain details which would get it classed as a derivative work itself—much harder for an LLM to do than a team of humans

@kevinr and proving that the AI was not trained on the original source will be pretty hard, because FLOSS programs with compatible licenses can legally copy code from one project into the other.

You’ll likely have to exclude all code from the project and all code that’s too similar from the training data. And then train an AI from scratch. Which would be extremely expensive.

@SnoopJ @bgalehouse @lcamtuf

@kevinr but I expect that someone will come in and say "my prompt includes 'forget all code from <project>', so the AI does not know it".

… OK, I have to admit that I lost trust into the sanity of a part of humanity …

@SnoopJ @bgalehouse @lcamtuf

@ArneBab @kevinr @SnoopJ @bgalehouse @lcamtuf I think it's more complicated. Consider program A licensed under GPL and program B licensed under BSD license. Code from program B can be copied into program A, but code from program A cannot be copied to program B without applying GPL to program B (changing the license). At least that's how it works as I understand it.

@thebluewizard yes, the details are more complicated, but it doesn’t reduce the complexity of deciding which code has to be excluded.

@kevinr @SnoopJ @bgalehouse @lcamtuf

@bgalehouse @lcamtuf @kevinr

Assuming you used the original source code to derive the detailed spec, then yes, that too is a derivative work.

The "viral" nature of that sort of license has bothered me for a long time. It's always been simultaneously overly far reaching and impossible to realistically enforce.

@lcamtuf @bgalehouse @kevinr

But here's an interesting question:

If you do not execute the code - did you accept the license? Does simply reading it sufficiently to be able to write a spec bind you to that license? That seems a bit too much.

@tbortels if you do not accept the license, you do not have any right to use the code. It’s "all rights reserved" then. @lcamtuf @bgalehouse @kevinr

@ArneBab @tbortels @lcamtuf @bgalehouse

Yeah the license applies whether you accept it or not. And whether your spec counts as a derivative work or not will depend greatly on the details of your spec

@kevinr @bgalehouse @lcamtuf @ArneBab

It explicitly does not. If I don't accept the license, normal copyright applies. You don't get to make a legally binding contract without consent, "clickwrap" bullshit aside.

And normal copyright has carve-outs like fair use.

@tbortels if you start relying on fair use, you enter a gray zone: courts will take decisions on that.

You don’t want that as the basis of anything that provides income.

A lawsuit in a gray area can ruin you, even if you’re likely to win.

@kevinr @bgalehouse @lcamtuf

@ArneBab @lcamtuf @kevinr @bgalehouse

We entered a gray zone about 8 off-ramps ago. Copyright never anticipated self-replicating code on computers and viral licenses and clean-room re-implementations and AIs.

As for income - I've lost track of the original driver, but it's GPL'd free code, no?

I like fair use. It and parody are one of the very few things keeping us out of peasants-with-pitchforks-and-torches mode. If you eliminate those carve-outs, the whole system goes down.

@tbortels GPL’d means that you can generate income as long as you adhere to the license (⇒ keep changes free, too).

If you want to wiggle out of that requirement with a re-implementation, that’s where you enter the gray area, because if it is a violation of the GPL, then the permissions the GPL granted you no longer apply and you have to check against regular "all rights reserved".

@lcamtuf @kevinr @bgalehouse

@tbortels fair use is always risky, because it only gives you conditional rights: if you take something via the fair use exception, you cannot use the result in any circumstance that would not be considered fair use, too.

At least that’s my understanding of copyright and fair use. Differences between copyright in different countries adds a whole additional layer to that (there is no fair use in the EU, but there are "limitations and exceptions to copyright").
@lcamtuf @kevinr @bgalehouse

@tbortels for parody there was the famous lawsuit of Erdogan vs. Böhmermann about the goat fucker poem where Böhmermann won (because of context and maybe also because the lawsuit of Erdogan provided the context which made the poem legal), but it is illegal to publish that poem outside of the context of the show (that explained which kinds of works actually are illegal and used that as an example), and the show cannot be published again, because context changed now.
@lcamtuf @kevinr @bgalehouse

@tbortels The show with the poem ended with the legendary song that was later republished independently:

https://inv.nadeko.net/watch?v=HMQkV5cTuoY
https://www.youtube.com/watch?v=HMQkV5cTuoY

@lcamtuf @kevinr @bgalehouse

@lcamtuf @ArneBab @kevinr @bgalehouse

"Use" isn't part of the GPL. And "all rights reserved" means normal copyright law, not "you get no rights at all".

The GPL defines "modify" and "propagate" as the activities it burdens. If I modify the code, and propagate it, i have a legal burden under the license. Otherwise, I don't.

IANAL, but I don't think reading the code and re-implementing a work-alike without incorporating the original code is "modify" - it's "replace".

I understand that's where "clean rooms" come into play, but that always felt like splitting hairs and giving copyright too much power - it's about physical books, not ideas. The farther we move from the original intent, the weaker a strong copyright stance becomes.

I think you could make an argument that reading code to understand it's interfaces, explicitly rejecting accepting any license, then implementing compatible code is well within the normal copyright definition of "fair use", or should be if we aren't all copyright lawyers. More importantly, it's healthy for Society and the art. If I can read a book under copyright and write a detailed book report, I should be able to read provided source code and do the same. To the extent that we've strayed away from that, the legal system has failed and needs correction.

@tbortels yes, not accepting the license means regular copyrights.

But your arguments afterwards rely on rights the GPL gives you -- you only get them after you accept the license.

EDIT: because "if we aren’t allowed … under copyright" ← we aren’t. That’s the point.

As long as there’s no NDA (there isn’t for GPL), we *can* write a spec. But the one implementing it *must not* know the code.

@lcamtuf @kevinr @bgalehouse

@ArneBab @kevinr @lcamtuf @bgalehouse

Fair use isn't something the GPL grants you. That's what I'm trying to work out - set the GPL aside for a moment.

Does regular copyright fair use give me the right to look at the freely provided source code, make a mental model, and re-implement a workalike if I don't re-use the original source?

Pretend it's just me and not an AI, because that throws a whole new set of confusion into the mix.

BSD did it against regular copyright. Not sure this is all that different.

@tbortels as far as I know, and as the article https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/ reinforces, fair use does not give you the right to re-implement the code.

Doesn’t matter whether you make a mental model as the intermediate step.

Only the clean room re-implementation gets out of that.

@kevinr @lcamtuf @bgalehouse

Tales from 80s Tech: How Compaq’s Clone Computers Skirted IBM’s IP and Gave Rise to EISA

In the 1980s, Compaq was the first company to produce a portable IBM-compatible machine legally, but they flirted with breaking copyright law in the process.

All About Circuits
@tbortels @lcamtuf @ArneBab @kevinr @bgalehouse "clean room" is for actual humans, not for algorithmic transforms
@tbortels @lcamtuf @bgalehouse @kevinr if a thing has a licence then that covers its use, so using it as a wallpaper image or software component or training data could be argued.

@tbortels @lcamtuf @bgalehouse @kevinr Copyright bound licenses work by exempting you from the blanket and default prohibition on copying.

So if you copy a work that has copyright restrictions according to copyright law, using the license is your only way of not infringing the law. It doesn’t matter if you ”accept” it or not.

If you are not copying, the license is irrelevant.

@ahltorp @tbortels @lcamtuf @bgalehouse @kevinr and indeed there are arguments that simply “reading” is not copying, same as reading a book, even if via a web site. But getting your AI to “read” it is probably a different matter.
@revk @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr idk about AI but I've heard more than once that when people are actually implementing something as free software that is originally non free but was either leaked or is source available, they completely restrict themselves from even looking at the thing and only use what any user would know and do some reverse engineering, so I assumed it's actually legally unsafe to taint yourself with original code and let it potentially influence you

@rustynail @ahltorp @tbortels @lcamtuf @bgalehouse @kevinr Hmm, there is another consequence to this.

If this is a derivative work, which I expect it is.

It causes issues when someone has, in fact, manually, coding an alternative to some copyright work (without reading original code, etc). As someone can suggest that it was done using AI as a derivative work. It no longer needs to actually follow the original code now to be accused of this.

Arrg!

@ahltorp @bgalehouse @revk @lcamtuf @kevinr @rustynail

AI is a weird case as you could assert - probably correctly - that the original code may be part of its training corpus. Was that training a GPL violation? It's a stretch. Was it's training a copyright violation? Or was the AI (or rather its owners) exercising their GPL license rights? Or was it fair use under regular copyright?

Who knows?

It's a hot mess is what it is.

This is all so far outside the original reckoning of "it'd be nice if the bookbinder down the street didn't profit off of my work until I had a chance to profit off of it first" that it's not surprising it's a mess.

@tbortels @ahltorp @bgalehouse @revk @lcamtuf @kevinr it seems GPL is supposed to be viral and restrictive enough that AI trained on GPL code can only produce GPL code. There is no other case where you can magically use GPL code for a non-GPL project and there probably shouldn't be

@lcamtuf @kevinr @rustynail @ahltorp @bgalehouse @revk

If AI code cannot be copyrighted - you have no mechanism on which to force someone to accept the GPL, or any license. An AI artifact covered by GPL is meaningless.

@tbortels @lcamtuf @kevinr @rustynail @bgalehouse @revk You can’t take GPL code, put it through the identity function, and then end up with non-GPL code, just because you claim it was produced by a machine. Even when it’s a more advanced process, like a compiler, the resulting code is not less bound by copyright than the source.

@lcamtuf @kevinr @revk @rustynail @ahltorp @bgalehouse

That's the "clean room" that keeps getting thrown around, originally used to try to legally protect free bsd derivatives. The idea was to make the "copy" argument so outlandish it was unsupportable.

It does set a standard, but I'm not sure it's a requirement. That is, reading code to create compatible code seems more of a fair use than an illicit copy. Especially of none of the original code appears in the finished work.

@revk
Reading is, indeed, not copying, and you are allowed to do that within copyright (hence the name; it's not 'readingright')

But reading and then writing something similar, while not exactly copying, is close enough that it's usually considered 'plagiarism'.
@ahltorp @tbortels @lcamtuf @bgalehouse @kevinr

@wouter @revk Plagiarism is about presenting the work as your own, not where it came from. It can be loosely connected to copyright because of either legal or license demands for attribution, but even taking something that is public domain and presenting it as your own is plagiarism.

Where copyright is about the relation to people a work comes from, I would say plagiarism is about the relation to people you present a work to.

@wouter @ahltorp @revk

I think "fanfiction" is closer.

Although if it's close enough to be identical in all major respects, a "remake" is also accurate.

Having said that: software is weird because it does something. If I write code that faithfully implements an API - I haven't stolen or plagiarized or anything - I followed the spec.

If that spec was supposed to be secret or proprietary - then open sourcing the code was a bad idea, and the soul of GPL is to force sharing.

Anyway - hot mess. As things tend to get when you try to force other human beings, legally or otherwise, to do what you want them to do.

@tbortels why would execution be needed to agree? You as a third party don't need to agree to the license, but if it's an open license to have the privilege to edit/reuse the code you have to agree to do it. By default the code is closed, the license opens it up for you, if you somehow don't agree to it you can't use the code at all because it's closed by default

(completely unrelated to the AI thing. fuck AI)

@marta

I'm not sure "closed" is the right word. Clearly it's not closed if you are providing it - it's right there, I can read it and even redistribute it without burden.

It's "copyrighted", not closed. You can't modify closed source because you don't have the source. The assertion being made is you can't modify GPL'd open source without accepting the license. But copyright has its own carve-outs, and I am unconvinced that writing a spec or net-new code is a modification, as opposed to regular old copyright fair use.

@tbortels you cannot redistribute copyrighted material(?)

If you make a spec of copyrighted code that's effectively instructions on how to reproduce the code and can be used to commercially compete with the owners of the code so I doubt it could classify as fair use.

@marta

It's about how to reproduce the functionality - the code could be an entirely different language.

And - "commercially compete" with someone giving away code for free seems a non-concern.

@tbortels competition is one of the factors that go into what qualifies as fair use, so no, it is not a non-concern. And no, someone publishing their code with open access does not give it away for free wtf
@bgalehouse @kevinr @tbortels @lcamtuf the licence allows you to do things you’re not allowed to do without it, and puts that permission on conditions. So, no way around that.
@tbortels @bgalehouse @lcamtuf @kevinr Well, yes but no. The point about spec is the level of detailing taken from the original work. If you write an original novel about a wild, big monkey found in a jungle, brought to New York, who escapes and so on, the King Kong author cannot claim any rights to that, sorry. If it were different, many narratives and movies would not exist today. That is inspiration, not derivation. Of course it is fair declaring inspiration, but call it with the right name.
@tbortels @bgalehouse @lcamtuf @kevinr Just to compare, all free libre office suites today are currently inspired to MS office, often with a very similar UX, but they are clearly separate work with their own license and MS cannot claim any rights about them. Do not confuse software with industrial artifacts and patents. Even out of US software is not patentable...

@lcamtuf @gisgeek @kevinr @bgalehouse

Heh. You might even say that's "fair use"... 🤔

@tbortels @lcamtuf @kevinr @bgalehouse
Where is the edge between inspiration and infringement? Are today's office suites infringing MS rights? Copyright says no, patents (a totally different beast) may say yes in some countries and no in others. So pay attention to what you desire for FOSS, because it could happen in many ways, including some very destructive ones.
@tbortels @bgalehouse @lcamtuf @kevinr the GPL is not problematic unless you want to use other people's work in a more restrictive way.. have you read proprietary software licenses?
@tbortels @bgalehouse @lcamtuf @kevinr that isn’t the case if a human generates the detailed spec though, and derivative works don’t have to be automatically derived to be derivative, so I don’t see how it would be subject to copyright
@bgalehouse @kevinr @lcamtuf the ai machine was already trained on all the open source data ever, so it's not a clean room interpretation anyway
@bgalehouse @kevinr @lcamtuf unless the training data is entirely clean of the original implementation, it's tests and documentation and any forks or other derivatives of it, it's still not a clean-room implementation. Research has shown that you can reconstruct entire book chapters with prompting because they were pulled into the training data.
@kevinr @lcamtuf In retrospect... "actual answer", "of course", "prima facie" are all red flags you're reading a bunch of nonsense blather.
@hopeless @lcamtuf no, you're just reading an educated asshole who happens to be right