Mastodawn

I have what may be a very ignorant question: if model-generated code may not be copyrighted due to a requirement of human authorship (current US Copyright Office policy), does it therefore follow that model-generated code may not be licensed under any terms whatsoever? Meaning anything from MIT to GPLv3?

I recognize no answers here would constitute legal advice, but I would love to hear from legal experts on this.

Show thread

fhekland 6h ago

@mttaggart https://www.copyrighted.com/blog/licensing-vs-copyright#frequently-asked-questions

Licensing vs Copyright: Key Differences Creators Must Know

Understand the distinctions between licensing vs copyright. This article explores how licensing agreements affect copyright ownership and usage rights.

Show thread

Taggart 6h ago

@fhekland

Can you license something without owning the copyright?

No, you need to own the copyright or have permission from the copyright holder to license the work.

So if no copyright is possible, no license is possible?

Show thread

Taggart 6h ago

@fhekland Hey @cwebber ☝️ this was really bothering me. If the current precedent stands, it's absolutely the case that no open source license is enforceable on generative code, as the copyright is a prerequisite for any license.

I imagine there's a test of amount still, like if most of the code is human-authored, you could still claim copyright. But for example, the tool I just made with Claude Code as an experiment? Full public domain, no terms available to me.

Show thread

Zimmie 4h ago

@mttaggart @fhekland @cwebber This is accurate, yes. Illicitly acquired code works the same way: you don’t hold the copyright, so you don’t have the ability to license it to others.

There is an open question of what happens when the LLM emits a verbatim chunk of code against that code’s license terms. For example, if you told an LLM to implement ZFS’ spa_activate, it’s extremely likely to emit verbatim chunks of CDDL code without the attribution required by the license. A tool can’t be liable for the infringement, but does the liability rest with the company which included CDDL code in the training corpus, or does it rest with the user who didn’t verify that the output doesn’t infringe preexisting copyright?

Show thread

AI6YR Ben 4h ago

@bob_zim @mttaggart @fhekland @cwebber Just like with written text on a very obscure subject, the LLMs are liable to spit out the ONLY source for a very specific, narrow technical problem. I have played with this on ChatGPT and the number of times you end up with a mishmash of the two public examples of "how to code X" (which doesn't run) is extremely high, with the same variable names and the same commenting and all. The risk of 100% regurgitation (IMHO) is very high for things that have only been coded and exposed to the world once or twice in the corpus.

Show thread

Taggart 4h ago

@ai6yr Yeah I've had Copilot give me my own Rust code for Windows exploits.

@bob_zim @fhekland @cwebber

Nine Oh Real 4h ago

Show thread

Paul_IPv6

@mttaggart @ai6yr @bob_zim @fhekland @cwebber

wow. we've automated mansplaining... shall we call it slopsplaining?