If you ask AI to rewrite the entirety of an open-source program, do you still need to abide by the original license? In philosophy, this problem is known as the Slop of Theseus
@lcamtuf actual answer: of course you do, it’s prima facie a derivative work, same as if you had rewritten the program by hand.

@kevinr @lcamtuf And if you ask it to write a detailed spec based on its implementation, and then separately to write an implementation of that spec?

https://www.allaboutcircuits.com/news/how-compaqs-clone-computers-skirted-ibms-patents-and-gave-rise-to-eisa/

Tales from 80s Tech: How Compaq’s Clone Computers Skirted IBM’s IP and Gave Rise to EISA

In the 1980s, Compaq was the first company to produce a portable IBM-compatible machine legally, but they flirted with breaking copyright law in the process.

All About Circuits

@bgalehouse @kevinr @lcamtuf it's a tempting argument to attempt but it kinda falls apart when "the entire library was in the training corpus anyway" is a given.

The fact that it is a terrible argument is of course not really going to stop anyone from making it.

@SnoopJ There’s the concept of clean room reimplementations (see the link by @bgalehouse): one group writes the spec -- possibly with access to the source.

The second group has never seen the source and only gets the spec. This second group then writes the program according to the spec.

You could simulate this if you had an AI that was provably not trained on the original source.

("provably not trained" most likely means re-training from scratch)

@bgalehouse @kevinr @lcamtuf

@ArneBab @SnoopJ @bgalehouse @lcamtuf

And the spec would need to carefully elide certain details which would get it classed as a derivative work itself—much harder for an LLM to do than a team of humans

@kevinr and proving that the AI was not trained on the original source will be pretty hard, because FLOSS programs with compatible licenses can legally copy code from one project into the other.

You’ll likely have to exclude all code from the project and all code that’s too similar from the training data. And then train an AI from scratch. Which would be extremely expensive.

@SnoopJ @bgalehouse @lcamtuf

@ArneBab @kevinr @SnoopJ @bgalehouse @lcamtuf I think it's more complicated. Consider program A licensed under GPL and program B licensed under BSD license. Code from program B can be copied into program A, but code from program A cannot be copied to program B without applying GPL to program B (changing the license). At least that's how it works as I understand it.
@thebluewizard @ArneBab @kevinr @SnoopJ @bgalehouse @lcamtuf The copied code in B is still GPL so "B is BSD" is gone ....

@ovrim though the code in program B that was BSD before is still BSD -- just the license to the whole program changes.

@thebluewizard @kevinr @SnoopJ @bgalehouse @lcamtuf