I have what may be a very ignorant question: if model-generated code may not be copyrighted due to a requirement of human authorship (current US Copyright Office policy), does it therefore follow that model-generated code may not be licensed under any terms whatsoever? Meaning anything from MIT to GPLv3?

I recognize no answers here would constitute legal advice, but I would love to hear from legal experts on this.

Licensing vs Copyright: Key Differences Creators Must Know

Understand the distinctions between licensing vs copyright. This article explores how licensing agreements affect copyright ownership and usage rights.

Copyright RPM

@fhekland

Can you license something without owning the copyright?

No, you need to own the copyright or have permission from the copyright holder to license the work.

So if no copyright is possible, no license is possible?

@fhekland Hey @cwebber ☝️ this was really bothering me. If the current precedent stands, it's absolutely the case that no open source license is enforceable on generative code, as the copyright is a prerequisite for any license.

I imagine there's a test of amount still, like if most of the code is human-authored, you could still claim copyright. But for example, the tool I just made with Claude Code as an experiment? Full public domain, no terms available to me.

@mttaggart @cwebber This is a really interesting question for all open-source projects. How is "plagiarised" code through an LLM relative to a forked repo? Is the LLM output OK as long as one respects the original licence and give proper credit? How on earth is one supposed to deal with this? Run your code through some plagiarism checker first?
Interesting times we live in..

@mttaggart @fhekland @cwebber This is accurate, yes. Illicitly acquired code works the same way: you don’t hold the copyright, so you don’t have the ability to license it to others.

There is an open question of what happens when the LLM emits a verbatim chunk of code against that code’s license terms. For example, if you told an LLM to implement ZFS’ spa_activate, it’s extremely likely to emit verbatim chunks of CDDL code without the attribution required by the license. A tool can’t be liable for the infringement, but does the liability rest with the company which included CDDL code in the training corpus, or does it rest with the user who didn’t verify that the output doesn’t infringe preexisting copyright?

@bob_zim @mttaggart @fhekland @cwebber Just like with written text on a very obscure subject, the LLMs are liable to spit out the ONLY source for a very specific, narrow technical problem. I have played with this on ChatGPT and the number of times you end up with a mishmash of the two public examples of "how to code X" (which doesn't run) is extremely high, with the same variable names and the same commenting and all. The risk of 100% regurgitation (IMHO) is very high for things that have only been coded and exposed to the world once or twice in the corpus.

@ai6yr Yeah I've had Copilot give me my own Rust code for Windows exploits.

@bob_zim @fhekland @cwebber

@mttaggart @ai6yr @bob_zim @fhekland @cwebber

wow. we've automated mansplaining... shall we call it slopsplaining?

@mttaggart @ai6yr @bob_zim @fhekland @cwebber

An experts guide to Copilot:
1) Do you recognize this as your code?
Yes: go a head and use it
No: Don't use it, it has errors
2) No more steps, we are done here.

@bob_zim @mttaggart @fhekland @cwebber
don't worry. Iff you work for a big corp that has well-heeled corporate lawyers, your employer will be fine.

@mttaggart @fhekland Well, that's right, you can't *license* it, but the public domain is compatible with nearly every FOSS license

The problem is, *not every place has the public domain* and *we don't know that AI generated output will be considered in the public domain everywhere*

This was the motivation that lead to CC0, a public domain declaration with permissive fallback license

We simply *don't know* yet what the legal status of AIgen output is, sufficiently. If it was "public domain worldwide", you'd effectively be mixing its output with yours and contributing that, and it wouldn't likely be tthat big of a deal. For instance, it just might weaken some of the eligibility for coverage under copyleft, but not copyleft compatibility... same with propriettary licenses.

But we *don't know yet*!

@mttaggart @fhekland I read your article btw and thought it was great. I've been meaning to write a response!

@cwebber @fhekland Ah hey, thanks!

And yeah, this question was really to get at the "We don't know," related to your point a while ago about the danger of attempting to license generated code. Basically I wanted more citations on that claim, and it sure seems like the best case scenario is "We don't know," and the worst case scenario is "Almost certainly not licensable." Either way, definitely not safe for us in open source.

@mttaggart @fhekland Ah yeah. I also have been meaning to write a blogpost about the uncertain legal status of LLM based output. I really am worried it's much more uncertain than people are acting...

I think one thing that *is* positive is that I'm glad that the "hey look you can just clean room vibecode a replacement to any open source software" is being applied to leaked software from Anthropic. Now I hope someone does it with a leaked copy of Windows!

@mttaggart @fhekland To put the point there more directly, people feel like they can rewrite whatever from the commons because it's the commons, even though there are license terms attached to that. Well, does that work for proprietary software too?
@cwebber @mttaggart @fhekland Yes, it does.In fact, it may be easier for proprietary software: Since the original software was not part of the AI training corpus, it's easier to prove it wasn't plagiarized. Since Anthropic has been bragging that Claude wrote Claude's code, Claude's code is not copyrightable. Now that it has been made public, it isn't a trade secret either. It is firmly in the public domain, at least in the U.S. Disclaimer: ianal.

@mttaggart @cwebber @fhekland

Nothing is safe for anything or anybody!

And exactly WHO the heck is deciding to enforce WHAT these days, anyhow?

War crimes are fine, human trafficking is cool, but walking around in public not being white can get a person sent to some gulag-- and in the middle of this mess, megacorps making a complete mockery of fair use and what copyriggt shit we may or may not get hit with starts to feel like a circus act with spinning wheels and tossed knives...

@cwebber @mttaggart @fhekland

If you can keep the generated source code secret and keeping it secret provides you economic value you might be able to protect it as a Trade Secret.

https://my.eng.utah.edu/~cs5060/notes/tradesecrets.pdf