Don't use LLM generated code in your projects yet! If for no other reason than that the legal case law is NOT ESTABLISHED YET.

I know there was the "copyright laundering" thing that went around a lot, but we actually don't know.

You'll see commenters everywhere on the internet say that "the US Supreme Court ruled that AI generated output is in the public domain". That's misinfo: they *declined to take on* a case from a lower court coming to that conclusion. The US Supreme Court hasn't yet ruled.

And this hasn't shaken out in an international setting yet either.

You may be surprised to hear: I actually think it's more dangerous and empowers centralized AI companies even more if it *isn't* the case that AI output is in the public domain (I'll follow up about that), but regardless, right now we just don't know.

But despite that, I'm STILL saying that you're putting yourself in legally dubious territory right now if you include LLM generated code, for now. We don't know yet.

That said, I think a lot of people think we can fight AI / LLM output on copyright grounds, and I actually think that's a losing strategy. Copyright almost always helps the big players, and it would here too!

You can see, they're already counting on and hoping it will be the case.

What the big players want is for copyright to apply to AI generated output because then *only* the big players can provide LLM services. See also Sam Altman's "running intelligence as a metered utility" pitch.

And the reason they could do this: *they* can make deals with Disney, Netflix, etc. But open models can't.

But what about all the "little guys" stuff? Well, when you sign that ToS on GitHub, Stack Overflow, DeviantArt, etc etc etc, all those places, you give them a right to your content too.

And THOSE places get to sell your rights.

So fighting on copyright grounds won't be an even playing field. It helps the big AI companies win.

There are only two strategies which are acceptable: either AI model output is completely illegal because of copyright stuff (this is unlikely to happen because there is now too much money behind it), or AI model output is fully in the public domain, which has its own problems but at least is an even playing field.

There won't be a middle ground that is safe. Because they want something that looks like a "middle ground", but really, all it does is lock in the big players' control over information, forever.

@cwebber I think we should resist socially and politically, for as long as there is a point, and until we figure out "benign LLMs". I'm pretty sure that's possible.

@promovicz @cwebber

There is validity, with all kinds of different framing, to resisting the careless use of a complex and poorly understood technology as the answer to Life, the Universe, and Everything.

I think the thesis at hand though, is that trying to use outdated and inadequate, poorly fit-for-context copyright law as the tool (a technology, heh) to do that is not likely to be productive. It will consume our resources without meeting our purposes.

@promovicz @cwebber

Part of the problem still being … what, exactly, IS our purpose in this melee?

@cwebber I’d settle for: if the models include licensed sources and use those without a license (proprietary or open source) then the model needs to be published openly and usage needs to be free.

@cwebber

I fully expect well funded companies to repeatedly challenge "AI cannot be copywritten because it wasn't human generated", and I expect it will be continually chipped away. That's going to make things stupidly complicated for a lot of non-technical reasons for a long, long time.

The advice I've given is to absolutely, and definitively denote exactly what code was AI generated keep detailed records of the history around it (including the source and date), because I guarantee that will become the crux of any future decision.

Until there's case law established, AI code is a liability.

@cwebber did you read the copyright office opinion doc? What’s your take on what it says?
@cwebber The UK has a third option: the person operating the AI is the author and the output is copyrighted. Would not surprise me if the industry lobbies more jurisdictions into similar legislation.
@MartyFouts Link to more info on UK case law?

@cwebber I don’t know of case law but the UK’s Copyright, Designs and Patents Act 1988, Section 9(3) states:

"In the case of a literary, dramatic, musical or artistic work which is computer-generated, the author shall be taken to be the person by whom the arrangements necessary for the creation of the work are undertaken."

It’s language any legislature might be lobbied into inserting in their copyright statute.

@cwebber Sorry in advance if am stating something obvious, but these two options are not exactly mutually exclusive.

Output can be, in principle, not copyrightable (what it means will likely differ between different flavors of Civil and Common law) and violate copyright at the same time.

@cwebber Public domain. Let's make "Intellectual property is theft!" popular again. One can always go back to the copyleft strategy if it doesn't play out well.

Related Swedish classic: http://svenskefaen.no/cdne/

Copyright Does Not Exist

@sigismundninja @cwebber Hell yeah, if models can wash strong copyleft off my code, they can wash the trade secrets out of reverse-compiled proprietary binaries.
@cwebber Unlike US Federal law, it is not that cut-and-dried in the UK because of the Copyright, Designs & Patents Act 1988.: https://www.briffa.com/blog/who-owns-the-copyright-in-ai-generated-works/ https://www.aoshearman.com/en/insights/ownership-of-ai-generated-content-in-the-uk
Who owns the copyright in AI generated works? | Briffa Legal

Under UK copyright law, authorship matters, and machines aren’t people. So who owns copyright in content created by artificial intelligence?

Briffa Legal
@cwebber Here in the UK, authors currenly enjoy stronger protection against GenAI plagiarism at law because of "Fair dealing" as opposed to "Fair use", at least on paper, but it's a question of enforcing those rights. Traditional business measures such as trademarks are probably the way to go, rather than trying to rely on the GPL now, because of clean-room-as-a-service, IMHO, but IANAL.

@cwebber I think if you take the architecture of a "Class III - Open Model" (from this list: https://mot.isitopen.ai/models?sort=desc&order=Classification ) and train it _exclusively_* on "The Stack"**, AND add all the required attribution clauses. The you get something that can generate code that, even if "copyright tainted", is permissively licensed enough to be broadly distributed.

I _think_ there's a lot of money on both sides, so I'm less certain copyright issues are going to go the way OpenAI etc. want.

I also DO NOT want strong copyleft code to be training data and then "copyright washed" into the public domain. If that happens, I expect a lot of "pirate" models to be trained on reverse-compiled (e.g.) proprietary code.

I also do not see any "safe middle ground", so I continue to hope that copyrighted training data "taints" the model output.

*: You could possibly add other training data that you (and others?) can redistribute.

**: NOT "The Pile"

Models | Model Openness Tool

@cwebber "The law always bends to capital, and when it doesn't capital buys new laws" is how I've heard that fundamentally expressed. Nobody should be looking at the copyright term extension acts and seeing a tool that benefits the people or the common good.

https://en.wikipedia.org/wiki/Copyright_Term_Extension_Act

Copyright Term Extension Act - Wikipedia

@cwebber This agrees with my intuition on the matter -- the problem is not that content is being "stolen", it's that free AI "labor" "steals" the revenue that creators need in order to survive. For me, that points towards UBI, not reinforcing the highly unjust systems that trickle media revenue back to (a select few) creators.

(...speaking as a lifelong creator who almost made $5 playing live one time.)

@cwebber Now I feel dumb. This is basically what my concern has been - that a situation would arise where the regulatory or legal situation turns it into an oligopoly and destroy smaller software companies. Yet I didn’t consider use of the output as a harm to oss projects that use it (unless the code quality is bad) so I’ve been using it in a few oss repos of mine on the grounds my day job leaves me with insufficient time to do it all myself. And thinking it’ll get more expensive.
@cwebber Reckoning! Reckoning!

@cwebber
Biggest enshittification to come: LLM companies trying to claim rights to the linux kernel and every opensource project their software has touched.

From a copyright perspective, everyone is absolutely insane for doing this.

@cwebber the US is not a country of laws, period. What USPTO says doesn't matter.

The EU however, just 3 days ago adopted text. LLM scammers MUST comply with licenses including payment to train on copyrighted work, regardless of location. And purely LLM generated slop *cannot be copyrighted*. There MUST be significant human contribution.

So purely LLM generated slop to try and license wash something is pretty much definitively unlawful now.

https://www.europarl.europa.eu/news/en/press-room/20260306IPR37511/protecting-copyrighted-work-and-the-eu-s-creative-sector-in-the-age-of-ai

Protecting copyrighted work and the EU’s creative sector in the age of AI | News | European Parliament

To protect the creative sector in the EU, the use of copyrighted work by artificial intelligence requires transparency and fair remuneration, Parliament says.

@cwebber and remember, these are the dipshits pissing off the old companies that have infinite dollars by stealing *their* stuff. The people who spent millions turning copyright into a way to maintain monopolies and permanent rent-seeking.
The people who have used copyright as a weapon for many decades are decidedly not fans of 'companies' stealing the things they own to generate and sell things based on it.
And the LLM grifters absolutely do not have the money to pay them off.

@cwebber
It used to not be copyrightable. But considering nazi track the US is sliping on, the new copyright act prepared by Bezos and Thiel over a some blody drink will say:

1) anything produced by humanity belong to whomever the tyrant wants, as we have it all in the LLM.
2) any royalties are going to us, see above.

[1] https://www.copyright.gov/newsnet/2025/1060.html

NewsNet Issue 1060 | U.S. Copyright Office

Copyright Office Releases Part 2 of Artificial Intelligence Report

@cwebber In my opinion, the moment that personal information gets out in the public domain without proper consent , this becomes an actionable matter.
AI generated code must be open-source and doing this way, helps everybody to freely create.
The moment the $$$ gets in the picture, you are killing the true creativity potential of the people.
@cwebber I can see the future: legal concerns over LLM written code results in people rewriting code by hand to circumvent potential LLM code licence violations.

@cwebber
The case was not so much "is PD" as "can you register a new copyright". Whether LLM output is a derived work of training input was not part of it.

(I agree with your "don't accept LLM output" conclusion.)

@cwebber even if it's accepted that code generated by an algorithm is public domain, there are a couple of issues:

Perhaps it's possible for generated code to contain excerpts from code it was trained on, perhaps large enough to be copyrightable, and that would be a copyright violation.

Take generated code and add a bit of hand-written code and you have a copyrightable work. Somebody can pick out the generated code and ignore the copyright, but hard to do especially if it's compiled.