Mastodawn

Seen on LinkedIn “If an LLM wrote your code, you can't really claim that it's proprietary”

The floor is open for discussion.

@aalmiray There is literally already a case where the judge ruled exactly that.

The case is not about Source Code but from a legal pov it does not make a difference.
https://ifrro.org/resources/documents/General/German_Court_OpenAI_Memory_Output_Infringe_Copyright_NOV25.pdf

In short: GEMA was able to extract the full song texts from a few well known german songs from ChatGPT. And the court ruled that if this can be proven (which it was) then the prompt was NOT the intellectual effort which passed the threshold of originality. 1/2

Show thread

Mark Struberg 5d ago

@aalmiray 2/2
instead OpenAI infringed the copyright of the original musicians, who still own the copyright - including the parts you extraxt from OpenAI. If you apply all this to IT software then the authors who originally invented/wrote the source code in the training data are STILL the copyright holders. And the original licenses of those source code parts also still apply.

In other words: if you vibe code something, then YOU do NOT own the copyright nor can you define the license!

Show thread

Andres Almiray 4d ago

@struberg this is the thing I’m uncertain about, if the license of the ingested data used for training is transitive or not. Either answer brings a host of issues and opportunities, but we seem to be operating as it didn’t matter at all.

Once law and regulation are in place we’ll know how to properly handle this situation. Myself, for now, I block any AI contributions to my FLOSS projects.

Show thread

Mark Struberg 4d ago

@aalmiray
Right, it's a legal minefield. And an ethical - 'stealing' ideas is not nothing!

Assume a piece of Code is licensed as say GPLv2 and the LLM is trained on it. If said LLM down the line creates code in your projects which is even only similar to that code, then the generated code is legally also GPLv2. And due to the explosive virality of the GPLv2 all your other code might too.

In the end it doesn't matter if OpenAI's etc LLM took it from that other project or a human did, isn't?

Show thread

Mark Struberg 4d ago

@aalmiray
BTW, the @TheASF already created AI guidelines a few years back. But imo even those might need updating after that GEMA vs OpenAI ruling I mentioned.
https://www.apache.org/legal/generative-tooling.html

ASF Generative Tooling Guidance | Apache Software Foundation

Show thread

Andres Almiray 4d ago

@struberg I think it’s worse than just generating code that may closely resemble the inputs. Just ingesting code is enough.

IIRC the engineers working on J9 could only rely on the spec to create their own JVM implementation, and were not allowed to look into the OpenJDK impl for ideas/inspiration, as that could taint the result.

If this is how humans behave with code licenses, why are LLMs treated differently?

Show thread

Mark Struberg 4d ago

@aalmiray
Fully agree.
But code which 'closely resemble the input' is easy to proof. Once this argument holds, then basically EVERY user of codegenerators who are proven to have been trained on GPL-ed code will be up for potential law suits.

Show thread

Andres Almiray 4d ago

@struberg and that would open a can of worms as that was one of the points in the big lawsuit from a decade ago: two implementations that closely resembled one another yet they had no common origin. Right?

Show thread

Mark Struberg 4d ago

@aalmiray Imo even way worse. The virality of GPLv2 is an explosive ingredient. Lol, just asked myself: if a LLM model got trained with lots of GPLv2 licensed code, then is all the model GPLv2 as well due to it's virality? Maybe an interesting research topic for @nikolausf ?

Show thread

Andres Almiray 4d ago

@struberg @nikolausf it doesn’t have to be trained with lots of GPL code. Just a single entry ingested is enough. That’s the virality of said license.

Anyhow, this show that we’re navigating with uncertainty and that’s dangerous.

Show thread

Andres Almiray 4d ago

@struberg even more so, yesterday I had a conversation with a developer that believed that becausee he paid for the GenAI tool then he could do what he wanted and that this licensing thing is not an issue 🤯

As if just because you paid money you could disregard the ToS of the tool/service.

Show thread

Michael Simons 4d ago

@aalmiray @struberg Tell that to Oracle lawyers for example.

Show thread

Andres Almiray 4d ago

@rotnroll666 @struberg

Show thread

Johannes Link 4d ago

@struberg @aalmiray If one thing is certain it‘s that „the law“ in a capitalist system will be bend until it fits capitalists‘ interests. In the end LLM output will be declared as free from the licenses of its training data and most probably also as copyrightable. There is no other possibility. Just like the EU US „data transfer framework“ which obviously doesn’t comply with EU regulations.