I have what may be a very ignorant question: if model-generated code may not be copyrighted due to a requirement of human authorship (current US Copyright Office policy), does it therefore follow that model-generated code may not be licensed under any terms whatsoever? Meaning anything from MIT to GPLv3?

I recognize no answers here would constitute legal advice, but I would love to hear from legal experts on this.

Licensing vs Copyright: Key Differences Creators Must Know

Understand the distinctions between licensing vs copyright. This article explores how licensing agreements affect copyright ownership and usage rights.

Copyright RPM

@fhekland

Can you license something without owning the copyright?

No, you need to own the copyright or have permission from the copyright holder to license the work.

So if no copyright is possible, no license is possible?

@fhekland Hey @cwebber ☝️ this was really bothering me. If the current precedent stands, it's absolutely the case that no open source license is enforceable on generative code, as the copyright is a prerequisite for any license.

I imagine there's a test of amount still, like if most of the code is human-authored, you could still claim copyright. But for example, the tool I just made with Claude Code as an experiment? Full public domain, no terms available to me.

@mttaggart @fhekland @cwebber This is accurate, yes. Illicitly acquired code works the same way: you don’t hold the copyright, so you don’t have the ability to license it to others.

There is an open question of what happens when the LLM emits a verbatim chunk of code against that code’s license terms. For example, if you told an LLM to implement ZFS’ spa_activate, it’s extremely likely to emit verbatim chunks of CDDL code without the attribution required by the license. A tool can’t be liable for the infringement, but does the liability rest with the company which included CDDL code in the training corpus, or does it rest with the user who didn’t verify that the output doesn’t infringe preexisting copyright?

@bob_zim @mttaggart @fhekland @cwebber Just like with written text on a very obscure subject, the LLMs are liable to spit out the ONLY source for a very specific, narrow technical problem. I have played with this on ChatGPT and the number of times you end up with a mishmash of the two public examples of "how to code X" (which doesn't run) is extremely high, with the same variable names and the same commenting and all. The risk of 100% regurgitation (IMHO) is very high for things that have only been coded and exposed to the world once or twice in the corpus.

@ai6yr Yeah I've had Copilot give me my own Rust code for Windows exploits.

@bob_zim @fhekland @cwebber

@mttaggart @ai6yr @bob_zim @fhekland @cwebber

wow. we've automated mansplaining... shall we call it slopsplaining?