RE: https://sfba.social/@drahardja/116311524946860153

the recent “clean room implementation” tools really hammered this home for me.

generated images always felt icky because visual. code felt more diffuse and less emotional for me. even though the process is the same.

but now #FOSS projects can be targeted and stripped of author, copyright, and effort with a single command.

not that I expected much from them lately, but it sure would be nice if the #FSF had an opinion on the end of copyleft.

@hbons Phrased differently it's also the end of copyright and that was always the goal of the Free Software movement or not? With copyleft being a necessary hack for the current system.

Next let's make sure the patent system goes down the same path.

What we're losing here is mostly attribution. (And gaining various other problems but you talked about copyleft specifically)

@slomo @hbons "the end of copyright" only benefits small entities in the mind of MIT techno libertarians of the '80s. Like all libertarians, they fundamentally don't understand the systems they deeply rely on. For instance: if you remove copyright, the people that will benefit are going to be the ones with the most power and resources, as it was before copyright was introduced.
@ebassi @hbons I'm not sure that's very different now. Who is most benefiting from the current copyright system if not publishers and e.g. Disney? While in theory it should benefit the actual authors that doesn't seem to be the effect in practice.
@slomo @hbons sure, it's bad; do we need to just roll over and die, then? Removing *all* the guardrails is not going to usher in a better alternative.

@ebassi @slomo @hbons Indeed, the only thing that is going to usher in a better alternative, is to work out what that better alternative is, campaign for it, and implement it.

(As always.)

@pwithnall @ebassi @hbons And whatever the alternative is, the situation has changed in a way that can't be turned back.
@slomo @pwithnall @hbons on this, I do not agree, and it's one of the reasons why I find discussing this stuff with a lot of people pointless: you have already given up, so there's no point in convincing you
@ebassi @pwithnall @hbons What's the alternative you're advocating for then?
@slomo @pwithnall @hbons the alternative is working to do harm reduction in projects that have already adopted permissive contribution guidelines; to introduce less permissive contribution guidelines in projects that haven't; to figure out licenses that strengthen the enforcement of licensing terms; to contribute to legal funds for license enforcement. In short, to put up a bit of resistance, instead of just folding like a lawn chair, and saying: "this is how it is"
@ebassi @slomo @pwithnall @hbons That's too vague for me to understand. Let's focus on one thing at a time. What do you mean by enforcement of licensing terms here?
@nirbheek @slomo @pwithnall @hbons we're talking about genAI-based "clean room" reimplementations; those should not be allowed, unless you can demonstrate that the training data set is actually clean. This should be encoded in the licensing terms and copyright law.

@ebassi @nirbheek @pwithnall @hbons Whether something is a "clean room" implementation or not is also not that easy. Maybe the LLM saw other implementations of the same thing during training, but is that different from you having read some other implementation of something some time before in your life and then writing your own? In either case it's not like the LLM or you can recite* any of the originals but you have an abstract model of it and anything else you ever learned that you work from.

* Give it a try: let a recent LLM write you the FreeBSD implementation of /bin/yes that was surely in its training data and is small enough, and then compare to the original. Or maybe more interesting: let an LLM write Rust/GStreamer code, and you'll clearly see that it learned from code I have written. Just like most humans did if you look over github/etc. Unlike what some humans do, you don't see 1:1 copy&paste of whole little helper functions though.

Very different to that is the case when you (or an LLM) actively look, compare, copy another implementation during development. (Which is also probably more common than we'd like to pretend based on all the code I've seen over the years)

I'm sure we're going to have lots of interesting philosophical discussions between lawyers and courts in the future, ideally with outcomes that don't backfire at us.

@slomo @ebassi @nirbheek @hbons Because humans are bad at license and copyright attribution on a small scale, does not mean that LLMs should be allowed to get away with bad license and copyright attribution on a vast scale.

1/3

@slomo @ebassi @nirbheek @hbons As a thought experiment: if there was an LLM which had been trained purely on (say) LGPL-2.1+ code, had low environmental impact, was not funded by VCs who are counting down the time until they turn on the monetisation switch, and which ran local-only and didn’t exfiltrate your stuff to the cloud, and someone used it to rewrite my project, I think I would still be massively pissed off.

Why? Because it’s a social problem.

2/3

Pioneering the Future of Code Preservation and AI with StarCoder2 - Software Heritage

Software Heritage’s mission is to collect, preserve, and make the entire body of software source code easily available, especially emphasizing Free and Open Source Software (FOSS) as a digital commons...

Software Heritage