@bgalehouse @kevinr @lcamtuf it's a tempting argument to attempt but it kinda falls apart when "the entire library was in the training corpus anyway" is a given.
The fact that it is a terrible argument is of course not really going to stop anyone from making it.
@SnoopJ There’s the concept of clean room reimplementations (see the link by @bgalehouse): one group writes the spec -- possibly with access to the source.
The second group has never seen the source and only gets the spec. This second group then writes the program according to the spec.
You could simulate this if you had an AI that was provably not trained on the original source.
("provably not trained" most likely means re-training from scratch)
@ArneBab @SnoopJ @bgalehouse @lcamtuf
And the spec would need to carefully elide certain details which would get it classed as a derivative work itself—much harder for an LLM to do than a team of humans
@kevinr and proving that the AI was not trained on the original source will be pretty hard, because FLOSS programs with compatible licenses can legally copy code from one project into the other.
You’ll likely have to exclude all code from the project and all code that’s too similar from the training data. And then train an AI from scratch. Which would be extremely expensive.
@ovrim though the code in program B that was BSD before is still BSD -- just the license to the whole program changes.