Mastodawn

This case shows how Open Source will die. With anyone just being able to pipe existing code and tests through an LLM and claiming that to be "clean room" (which is hogwash) no licensing can protect your work from being accumulated and monetized by anyone. The commons are actively being shredded in front of our eyes.

https://github.com/chardet/chardet/pull/322

chardet 7.0: ground-up MIT-licensed rewrite by dan-blanchard · Pull Request #322 · chardet/chardet

Summary This PR is for a ground-up, MIT-licensed rewrite of chardet. It maintains API compatibility with chardet 5.x and 6.x, but with 27x improvements to detection speed, and highly accurate suppo...

GitHub

Show thread

mcc Mar 5

@tante I have heard people talking about this but what I don't understand is what license the code was under before the "rewrite". The project's own self-description says "chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet" which sounds like something a LLM would write but doesn't tell me much (And the PR is big enough to break Github's PR display feature so it's a little hard to figure out what the project looked like before the +14526 -546715 patch.)

Show thread

tante Mar 5

@mcc Before the Claude "clean room" reimplementation chardet was licensed under LGPL.
So the dude used claude (which was probably trained on chardet/LGPL) to generate a new version of chardet with the same API etc but put it under MIT license.

Show thread

mcc

@tante Thank you. And do I understand correctly that administratively the MIT rewrite* is "the same library", e.g. the maintainer flattened their own repository and hosted the new* thing at the same github address, same pypi address, same readthedocs address? I hadn't used Chardet previously and search engines point only to this same project.

* "so to speak"

Show thread

tante Mar 5

@mcc yes. They jumped from 6.0.0 (LGPL) with a huge merge request that changed everything to the Claude generated version and relicensed it. So it is in the same tree. Same name/canonical URL and everything. So the "clean room" argument is at least softened by putting it in direct succession

Show thread

mcc Mar 5

@tante Okay. It's clear now, Tthanks.

(Presumably of course the single maintainer did not write all of that code and anyone who contributed under LGPL would have expected consent before relicensing.)

Show thread

tante Mar 5

@mcc yes, there are contributors who contributed under LGPL especially Mark Pilgrim who started the whole project under that license.

Show thread

SomeVeganCheeseIsOk Mar 5

@tante @mcc does the fact that AI content can't be copyrighted impact this at all?

Show thread

tante Mar 5

@SomeVeganCheeseIsOk @mcc "AI content can't be copyrighted" is a bit of an oversimplification. TBH for the matter at hand it is relevant only in the sense that Claude surely was trained on chardet so it supports the "derivative" argument a bit.