A new twist in the "AI license laundering of chardet" story https://github.com/chardet/chardet/issues/327
No right to relicense this project · Issue #327 · chardet/chardet

Hi, I'm Mark Pilgrim. You may remember me from such classics as "Dive Into Python" and "Universal Character Encoding Detector." I am the original author of chardet. First off, I would like to thank...

GitHub

But really, relicensing a GPL codebase to MIT is uninteresting.

Let's do the interesting one, which is: vibe code a "clean room" reimplementation of an entire proprietary codebase! After all, Microsoft released a "shared source" proprietary version of Windows. Now try seeing what happens if you run THAT through the "turn it into public domain" machine

Win-win outcome, no matter how it goes

Winning option 1: yes, you can vibe code proprietary codebases into the public domain, allowing us to bootstrap proprietary codebases quickly

Winning option 2: stopping laundering of copyleft codebases

Either of these are interesting outcomes!

No right to relicense this project · Issue #327 · chardet/chardet

Hi, I'm Mark Pilgrim. You may remember me from such classics as "Dive Into Python" and "Universal Character Encoding Detector." I am the original author of chardet. First off, I would like to thank...

GitHub
omg I am just seeing now that the dude who did the "AI relicensing" fucking replied with an obvious slop response, of all the fucking disrespectful things to do, holy fucking shit https://github.com/chardet/chardet/issues/327#issuecomment-4005195078
No right to relicense this project · Issue #327 · chardet/chardet

Hi, I'm Mark Pilgrim. You may remember me from such classics as "Dive Into Python" and "Universal Character Encoding Detector." I am the original author of chardet. First off, I would like to thank...

GitHub
@cwebber that whole relicensing and this slop reply are vomit inducing.
@soapdog @cwebber It's just the lack of understanding of what an LLM is that's makes one's hand want to smack one's forehead. Or, preferably, his.
@soapdog @cwebber There is a real issue with people using LLMs to try to brute force their way out of a situation. Make a response that is long enough and plausible enough, and people will roll their eyes and often just give up. I have experienced this directly at work, and it drives me crazy.
@cwebber I love the sentence "If you are indeed the Mark Pilgrim..." So steeped in bad faith that they assume others are too.

@cwebber I'm not sure that's slop, but I won't discount the possibility... 🤔 But this part is funny in the dark humor sort of way:

"...explicitly instructed Claude not to base anything on LGPL/GPL-licensed code."

So, you see, no problem... 🙄

@cstanhope @cwebber

Claude after being explicitly instructed not to base anything on LGPL/GPL-licensed code

@cwebber these people don't know how to write on their own anymore lol
@cwebber
If he can't be bothered to write it, why should we bother to read it?
@cwebber I felt my brain getting smoother as I read that
@cwebber that's exactly where my mind went to. Any time I've rewritten something that was in copyleft because I needed it copycenter or even with such inspiration, I wouldn't let myself even look at the original code. But it would be a net boon to OSS if the same rules apply to proprietary stuff. The bad situation would be if corporate lawyers effectively made it so that only their code is protected from such reimplementation.

@cwebber good times! 😅

It's going to be fun to see how the boundaries of "human produced work" are defined over time, but I expect it will work out in whatever way benefits the big money players in software and media.

Does this only apply to "AI"? What does that mean? If I have a machine generated background crowd or vapour in some frames of my $300M blockbuster movie, can I still copyright it?

@cwebber the losing outcome is people use it, but it is shitty, and then it's so widely adopted as a general concept that you're forced to use shitty software

@vv yeah that's defintely the shitty outcome for usability

But... given that a lot of shittiness comes from an *uneven playing field* when it comes to copyright stuff, and people thinking they can wear down the commons with no consequences, I think it's worth pushing the needle on this approach

@cwebber I love the idea of weaponizing their reasoning in support of the working class.

Cynically though, I think there’s a third outcome: rules for thee, but not for me. In which Microsoft uses the full weight of their wallet to crush the common person, but is free to steal themselves, to profit off of the open source community. The rest of us are left to victimize each other with little legal recourse.

Is it logically consistent? Nope, but that’s the weird timeline we live in.

@Haste I increasingly see this as big tech and business having found a way to cut the legs off what has been a rapidly growing threat to their business models: FOSS.

GenAI can both copyright wash any source code, especially FOSS, and destroy the FOSS ecosystem. FOSS is terribly vulnerable right now and given the capture of governments and lawmakers by AI hype and lobbyists, I'm not sure it can survive in current form.

@cwebber

@cwebber What constitutes laundering of copyleft codebases?

@SprocketClown

The way I read it in this context is that an existing codebase has license (whether GPL, LGPL, or proprietary or whatever), and that by "laundering" the codebase through an LLM, the output no longer retains the retains the license terms. In the US at least, the Supreme Court has ruled that LLM output is uncopyrightable.

So as @cwebber highlights, either the licensewashing works, in which case LLMs can scrub licenses off proprietary codebases giving a leg up on "reproducing" proprietary codebases into the public domain; or it doesn't work, in which case LLM-produced code becomes subject to the licensing of the original code.

@cwebber Microslop committed to picking up the legal bill for anyone concerned about copyright issues with AI outputs from copilot so one could hypothetically use their tools to "clean room" implement Photoshop and then have Satya fight Adobe for your right to do so. Sounds fun to me!

https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/
Microsoft announces new Copilot Copyright Commitment for customers - Microsoft On the Issues

With customers ask whether they can use Microsoft’s Copilot services without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved.

Microsoft On the Issues
@svines @cwebber call that Project Photoslop

@svines @cwebber

This is topical:

"To protect against this, customers ... must not attempt to generate infringing materials, including not providing input to a Copilot service that the customer does not have appropriate rights to use."

@cwebber As long as you don’t get sued into oblivion!
@cwebber fucking genius!
@cwebber it's just gonna launder wine code lol

@cwebber

I'd certainly watch the vid, if someone actually did this!

@cwebber Microsoft can still sue for patent violations. But Windows 7 is over 15 years old.  

Also, trademark violations should be carefully avoided.

@cwebber Well, the maintainer's point was that this is "clean room", by which they mean Claude was not given the existing codebase as input. The counter argument is that the existing codebase almost certainly forms part of Claude's training data, so the claim of it being genuinely clean room is bogus. So to make your idea work, you'd have to use the proprietary codebase as training data, rather than prompt input.
@cwebber and I suspect that if you made an LLM based on the specific code as training data, a court would probably rule differently to how they have ruled about LLM generated code in other cases. maybe.
@cyberia @cwebber it would need a controlled clean-room training data and training and context, so yeah it was trained on the original GPL code and is not a clean-room implementation
@cwebber I cynically fear that the likely outcome is that proprietary copyright holders with lots of lawyers and money could succeed in preventing re-licensing as open source, while copyleft advocates with few resources couldn't actually prevent re-licensing to closed.
@cwebber I think you're going to need one hell of a kickstarter to fund that one.
@cwebber I think the only sticking point with this scheme is the concept of a vibe coded "clean room implementation" is problematic. Like, have you SEEN Claude's room? Is absolutely FILTHY!
@msh @cwebber I'm now hoping to see a picture of Claude's room which is a messy room, *based on the room images from Microsoft Bob*.
@cwebber even funnier with *closed source* proprietary Java or C# apps (and Android, perhaps?!) as these can be decompiled to a very ugly IR code that can be somewhat usable to guide a LLM!
@cwebber I’ve been pondering this for a while. Another compelling use case would be speeding up reverse engineering of GPU drivers and other proprietary hardware blobs, because there would no longer be any need for true clean room work.

@cwebber I feel like I made the right call banning ai in my gplv2 project recently.

I'm very curious to see how this plays out

@cwebber significant popcorn moment
@cwebber I would very much like someone with a legal mind explain how software licenses interact with yesterday's ruling that AI gen work is not copyrightable. What exactly is the basis of the copyright here? I hope we get to see someone dive into this.
@cwebber blimey it’s really kicking off over there…

Krass, dass sich AI-Firmen einfach Open Source Code schnappen und die Lizenzen "waschen" wollen. 😤

Das ist genau das Problem mit dem aktuellen AI-Hype: Die großen Player denken, sie können einfach alles verwenden was im Netz steht. Und wenn's rechtlich eng wird, wird halt schnell die Lizenz geändert...

Respekt an Mark Pilgrim dass er sich dagegen wehrt! Open Source lebt von Vertrauen und klaren Regeln - nicht von solchen Manövern.

#OpenSource #AIEthics #Licensing

@ralph_social Das ist komplett inkorrekt. Hier hat keine AI Firma irgendwas getan (außer das Model bereitgestellt). Der Hauptmaintainer seit über 12 Jahren hat das Projekt einmal komplett neugeschrieben und die Lizenz geändert. Mark Pilgrim hatte das Ding vor 15(?) Jahren von C zu Python migriert und die Lizenz festgelegt. Danach hat er sich irgendwann zurückgezogen und seit dem nichts mehr zum Projekt beigesteuert (was ja auch völlig ok ist).
@primeapple Danke für die Korrektur! Du hast völlig Recht - ich hatte die Situation falsch verstanden. Gut zu wissen, dass es der Maintainer war und nicht irgendeine AI-Firma. Wichtig, solche Sachen richtig darzustellen! 👍
@cwebber Reading through all the comments there left me wondering if anyone has (yet) hooked up an LLM to be a project maintainer. Interactions via issues and just let it loose. People would be utterly mad to ever include it in their supply chain, and yet people do do mad things.
@cwebber Isn’t this what forks are for?