Apparently chardet got Claude to rewrite the entire codebase from LGPL to MIT?

https://github.com/chardet/chardet/releases/tag/7.0.0

That is one way to launder GPL code I guess?

Release 7.0.0 · chardet/chardet

Ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x. Just way faster and more accurate! Highlights: MIT license (previous versi...

GitHub
@Foxboron lol right, because Claude certainly wasn't trained on GPL code

@scy
US court is leaning towards that LLM generated code is fundamentally not copyrightable.

This is a different problem to the moral issues I have with this.

@Foxboron But does "is not copyrightable" mean that "is not a license violation of its input data"? I highly doubt it.
@scy
A license violation usually implies that there is a copyright violation to begin with.

@Foxboron Yeah but that's what I mean: Just because the end result is not copyrightable, does that automatically mean that it can't be a copyright violation?

Like, changing the format or medium of something is not a copyrightable work.

So, by that logic, if I take a copyrighted MP3 and convert it to AAC and publish that, my AAC is not copyrightable, but it's not a copyright violation to take it and publish it?

That's what I mean.

@scy @Foxboron It's a bit complicated, actually. IANAL, but this is what I understand:

- The music notation is copyrightable, individual notes are not. A sequence of notes is debatable, and it depends highly on recognizability AFAIK.

- A music recording is copyrightable. Playing that music in a distinctly different arrangement, less of an issue.

- Arguably, a change in digital format is either still the same recording, or sufficiently indistinguishable from it.

- Copyright has an ancient...

@scy @Foxboron ... naming and goes back to a time where making copies and distributing them was the hard part.

This is a non-problem in the digital age, which is why it's fine to create backup copies of copyrighted works, so long as the people accessing them are always the people having purchased/licensed an original copy.

So LLMs training on GPL is not itself a copyright violation, and them reproducing similar code isn't either, but then publishing such sufficiently similar code is.

@scy @Foxboron TL;DR what others already wrote: if the result is similar enough to inputs, the copyright holder of the inputs could challenge it, yes.
@scy @Foxboron If courts decide to throw this out, I would personally *love* for someone to use the exact same argument to produce a minimally altered copy of Avatar, and have Hollywood throw a fit.
@scy @Foxboron Basically, let's not fight this, let the industry giants fight each other. Throw a few near-copies of Metallica songs in for good measure, so we get a v2 of that "Napster baaaad" animation with greedy gnome Lars Ulrich.
@scy @Foxboron Either LLMs will die on the spot, or Copyright does.