Mastodawn

If you use AI-generated code, you currently cannot claim copyright on it in the US. If you fail to disclose/disclaim exactly which parts were not written by a human, you forfeit your copyright claim on *the entire codebase*.

This means copyright notices and even licenses folks are putting on their vibe-coded GitHub repos are unenforceable. The AI-generated code, and possibly the whole project, becomes public domain.

Source: https://www.congress.gov/crs_external_products/LSB/PDF/LSB10922/LSB10922.8.pdf

Jamie Gaskins Feb 12

It'll be interesting to see what happens when a company pisses off an employee to the point where that person creates a public repo containing all the company's AI-generated code. I guarantee what's AI-generated and what's human-written isn't called out anywhere in the code, meaning the entire codebase becomes public domain.

While the company may have recourse based on the employment agreement (which varies in enforceability by state), I doubt there'd be any on the basis of copyright.

Jamie Gaskins Feb 12

FWIW I'm not a lawyer and I'm not recommending that you do this. 😄 Even if companies have no legal standing on copyright, their legal team will try it. It *will* cost you money.

But man, oh man, I'm gonna have popcorn ready for when someone inevitably pulls this move.

Francisca Sinn Feb 12

@jamie I *am* an IP lawyer and I (along with many others) have been saying it for a while, that if the position the “AI” co’s are taking with respect to the legality of scraping “publicly available” materials were true (that all “publicly available” materials are “public domain” free to be used as raw materials without consent required), then copyright ceases to exist and all their own materials will be free for everyone else to use the very first time they’re leaked. That’ll be fun for the co.

Jamie Gaskins Feb 13

@fsinn This is amazing

Francisca Sinn Feb 16

@jamie Related:

https://fediscience.org/@seanfobbe/116080312095802102

Seán Fobbe (@[email protected])

AI companies copy all written works they can get their hands on and call it fair use, if someone does it to their models it suddenly becomes "unauthorized distillation" and should be actionable in court. The double-standard is ridiculous. https://www.theregister.com/2026/02/14/ai_risk_distillation_attacks/

FediScience.org

@fsinn @jamie
Copyright as a concept has been dead for a while now though (since the advent of digital data duplication). Society just has a hard time accepting and dealing with that. And the current "AI"-induced crisis is another symptom of that.

Christian Schwägerl Feb 13

@max @fsinn @jamie That's not true. Media organisations and individual journalist make a share of their income from granting licenses for secondary use of their digital works, for copying them or for offering them in libraries. Copyright is one of the few bedrocks of income. It doesn‘t vanish through wishful thinking or ignoring it.

@christianschwaegerl @fsinn @jamie That's the classical model, yes, and it's unfortunate that they have to rely on such an external influence on their integrity and this needs to change.

And it slowly is, both legally (e.g. publicly financed journalism can be one solution to avoid this conflict of interest) as well as illegally (content is reused without permission for "AI" training, or simply shared online for free so that every human has access to the information)

Christian Schwägerl Feb 13

@max @fsinn @jamie Copyright fees are not a negative external influence, in contrast to ad revenue (maximising attention, emotions for eye ball time spent) or financing by “public” state authorities (risk of courting politicians so they don’t cut funds), copyright do not create problematic incentives for the actual reporting. They are as good for financing journalism as user subscriptions.

@christianschwaegerl @fsinn @jamie I'm not sure how having to produce stuff that sells better instead of stuff with high quality is not a negative external influence.

Also please do not confuse stuff, I am not talking about state financing, I am talking about the financing by the general public without state influence.

Christian Schwägerl Feb 13

@max @fsinn @jamie I think public financing models are important, but state’s influence and funding leverage are real, and a 100% public model would be disastrous. Plus, hunting for quotas is deeply ingrained. Therefore private-owned media are hugely important and after their ad business and traffic via search is being killed by Big IT, subscription and licensing incomes are pivotal.

your auntifa liza 🇵🇷 🦛 🦦Feb 13

@fsinn @jamie also, wouldn’t the veil/protections of trade secrets disappear, since the con is basically corporate espionage as a chatbox?

d@nny disc@ mc² Feb 13

@blogdiva @fsinn @jamie not a lawyer but deciding to weigh in regardless for some reason: the legal existence of trade secrets does not seem to be directly threatened by the legal methodology being advanced by these corporations in the same way as it directly opposes the basis of copyright infringement (also see hachette vs IA for an attempt to develop new precedent which also failed). however precisely as you say it may as a practical matter become more difficult to lay claim to the actions of a particular employee for breaching contract terms regarding trade secrets if the employer also subscribes to espionage as a service

@fsinn @jamie My understanding was that training an AI model on copyrighted work was fair use, because the actual "distribution"--when the AI generates something from a prompt--uses a diminimus amount of copyrighted content from an individual work, except if the user explicitly prompted something like, "Give me Homer Simpson surfing a space orca," at which point the AI company would throw the user all the way under the bus.

Jamie Gaskins Feb 13

@Azuaron @fsinn The argument has been that the model doesn't contain the copyrighted works directly. Like, you can't grep the model file on disk for a passage from a book it can still somehow reproduce.

It's a ridiculous argument, though, because the models deal in numbers, not text. Those numbers are converted to text for human consumption only, so of course it won't contain the raw text anywhere in the model.

Christian Schwägerl Feb 13

@jamie @Azuaron @fsinn It's like saying sausages are vegan as long as they do not contain visible body parts.

Jeff Grigg Feb 13

@christianschwaegerl @jamie @Azuaron @fsinn

Yes. Any "direct quoting" of copyrighted works, as text files on a disk, for example, would > only be a bunch of numbers < too. ASCI, Unicode, UTF-8, etc. are ways of encoding text into numbers, and displaying text representations (glyphs) of them later.

So LLMs hold "indirect" and maybe "abstract" (or not) numbers related to the copyrighted works. Not sure how that will or should work out, from a legal perspective.

@JeffGrigg @christianschwaegerl @jamie @fsinn I think this is missing the point and the law (at least, US copyright law).

I buy a book. I then own that book. I can cut that book into individual pages. I can scan all those pages into my computer. I can have an image-to-text algorithm convert the text in the images into an ebook. I can do this to a billion books. I can run whatever algorithms I want on the text of those books. I can store the resulting text of my algorithms on my computer, in any format.

This is all legal, for both me and for any company. Copyright does not prevent use of a work after it has been sold, "use" meaning just about anything--short of distributing the work.

Because what copyright protects against is the reproduction and distribution of copyrighted works. For AI companies, that "distribution" doesn't happen until somebody puts a prompt into the AI, and receives back a result. That result is the distribution. To sue an AI company for copyright infringement, you would have to have a result that infringes on your copyright, and you would have to prove that the AI company was more than just a tool that the prompter used to infringe your copyright.

For the Disney example, if somebody prompted, "Darth Vader in a lightsaber duel with Mickey Mouse," it would be an uphill battle to prove the AI company is responsible for that instead of just the prompter. The argument that the AI company would make is that the prompter clearly used the AI as a tool to make infringing work, but just like you can't sue Adobe if someone used Photoshop to make the same image, you can't sue the AI company because someone used it as a tool to infringe copyright.

Now, I don't find that a wholly persuasive argument because of the, frankly, complicity in the creation that AI has that Photoshop doesn't, but that's definitely the argument they would make, and judges have seemed receptive to that and similar (and even worse) arguments.

As far as I'm concerned, the original point of this thread proves that the AI company should be mostly-to-wholly responsible, even if the prompter was deliberately asking for infringing works. After all, AI-generated work is not copyrightable because it is not human created, it is computer created.

If it's not human created, how can the human be responsible for the infringement?

If it is computer created, then isn't the computer's owner responsible for the infringement?

After all, if I ask a digital artist to create me "Darth Vader in a lightsaber duel with Mickey Mouse," and they do, the digital artist is on the hook for that infringement. They reproduced the work, and they distributed it. There is a "prompter" and a "creator" in both scenarios; it seems illogical that if the "creator" is a human, they're responsible, but if the "creator" is a computer, they aren't responsible.

This is, per @pluralistic, "It's not a crime, I did it with an app!" Why we let apps get away with crimes we'd never tolerate from people, I don't know. But that's where we are.

Christian Schwägerl Feb 13

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic For a start, you bought the book. I doubt AI hyperscalers have met that minimum requirement. Secondly, you buy the book for your private use, not for commercial purposes. Thirdly, you describe reproduction for private purposes. Reproduce and sell, and you infringe. Fourth, you don’t use the book to instruct a machine to paraphrase the content, produce quotes and false quotes, and to write in the style of the author in an infinite number of cases.

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic You're making a bunch of different arguments now. The topic at hand was, "Is it copyright infringement to make and have an AI model trained on millions of books?" The answer is no. This is wholly legal.

Storing copyrighted work is legal.

Modifying copyrighted work is legal.

Storing modified copyrighted work is legal.

It doesn't matter if they have a model that is literally just plain text of every book, or if the model is a series of mathematical weights that go into an algorithm. It's already legal to have and modify copyrighted works.

What becomes illegal is reproducing and distributing copyrighted material.

No, whether it was for "commercial" or "non-commercial" purposes doesn't matter when determining if something is infringing.

No, whether it was "sold" or "distributed for free" doesn't matter when determining if something is infringing.

"What about Fair Use?" Fair use is an affirmative defense. That means that you acknowledge you are infringing, but it's an allowed type of infringement. It's still an infringement, you just don't get punished for it.

But, as already stated, nothing is infringement until there's a distribution. Without a distribution, no further analysis is needed. When a distribution occurs, it is the distribution that is analyzed to determine if it is infringing, and, if so, if there is a fair use defense. Everything that happens prior to the distribution is irrelevant when determining if an infringement has occurred, as long as the accused infringer acknowledges they have the copyrighted work (which AI companies always acknowledge).

There is one further step, because it is illegal to make a tool that is for copyright infringement. The barrier to prove this is so high, though. As long as a tool has any non-infringing uses--and we must acknowledge AI can generate non-infringing responses--then it won't be nailed with being a "tool for copyright infringement". This has to be, like, "Hey, I made a cracker for DRM, it can only be used to crack DRM. It literally can't do anything legal."

Even video game emulators haven't been hit with being "tools for copyright infringement" because there are legitimate uses for them (personal backup, archival, etc.), even though everyone knows they're 99% for infringement.

Christian Schwägerl Feb 13

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic So if somebody invents a gun that simultaneously produces soap bubbles, shooting someone is ok? I doubt it.
You’re trying to normalise LLMs with analogies of profane private behaviour. That’s fundamentally flawed.
LLMs have new characteristics, capabilities. There hasn’t been a machine before that could churn out one million versions of a novel in the style of a contemporary author or art by living creators in no time after being fed their work.

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic Buddy, I'm not trying to "normalize" anything, especially not LLMs. I'm telling you how the law works. I never said the law was good.

Christian Schwägerl Feb 13

@Azuaron @JeffGrigg @jamie @fsinn @pluralistic It’s wide open how existing law will be interpreted and applied here, and which new laws will be created to capture the novelty of the technology. The Anthropic case is interesting. A large number of court cases will proceed and the differences between a private book purchase and an all-purpose multi-billion content production technology will hopefully be apparent to judges.

Ur Ya'ar Feb 14

@Azuaron
Not a copyright expert but I think you are wrong here.
You say yourself "What becomes illegal is reproducing and distributing copyrighted material." But then you only focus on distribution, ignoring reproduction.
If I scan a physical book, that's reproduction. Following your logic, that may be fair use if I don't distribute it, but it's still infringement. That's relatively clear.

...>

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic

Ur Ya'ar Feb 14

@Azuaron
...>
What about a digital book? If I copy a file from one computer to another, is that infringement? I don't know. Perhaps it is allowed explicitly by the author.
But one can agrue that preparing a copy of a book to be digested by LLM training is reproduction, and hence infringement if not explicitely allowed by the copyright holder.
There is an ongoing debate regarding this last argument. It is not clear cut.

<erased a "not">
@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic

@yaarur @christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic If you reproduce something without distributing, that's not infringement. If you distribute something without reproducing it, that's just "reselling" and is not copyright infringement. To be copyright infringement, you have to reproduce and distribute.

I don't believe there's an ongoing debate about this--at least, not in the courts. LLMs are not actually different from other computing systems, and this has already been litigated extensively.

Christian Schwägerl Feb 14

@Azuaron @yaarur @JeffGrigg @jamie @fsinn @pluralistic No, there‘s a large number of ongoing court cases. In Munich, an administrative court has ruled in favour of GEMA over music rights. You keep describing LLMs in old categories. It‘s something new. It analyses copyrighted works to provide a service for emulating, plagiarising, quoting and misquoting them, and the effect of damaging the livelihoods of creators. In other words, it‘s a first-of-its-kind vampire technology, a plagiarism factory.

@christianschwaegerl @yaarur @JeffGrigg @jamie @fsinn @pluralistic As I have said, I am talking about US copyright law. What happens in Munich is irrelevant to anything I've said, and anything I've said is irrelevant in Munich.

Francisca Sinn Feb 15

@Azuaron
Oh dear. A gentle reminder that U.S. laws do not apply around the world and infringing companies are losing. What happens outside of the U.S. matters more than ever with the rise of fascism in the U.S. American companies had to comply with privacy law under GDPR or be locked out of entire markets. France and other countries are pulling government use away from American companies and replacing them with domestic products.

@christianschwaegerl @yaarur @JeffGrigg @jamie @pluralistic

Ur Ya'ar Feb 14

I think your first sentence is wrong.

> Can I make copies of copyrighted material for personal use?
>
> No, you cannot make copies of copyrighted material for personal use. It is not permissible to reproduce copyrighted materials in any circumstance, without the written permission of the copyright holder, unless it falls under Fair Use policy.

https://nytlicensing.com/latest/methods/copyrighted-material-without-permission/

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic

When Can I Use Copyrighted Material Without Permission? | NYTLicensing

In most cases, you will need permission to reuse a copyrighted work, but some uses without permission are legal. Learn how you can repurpose without permission here.

NYTLicensing

Ur Ya'ar Feb 14

If you claim otherwise, please back up your claim.

@christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic

@yaarur @christianschwaegerl @JeffGrigg @jamie @fsinn @pluralistic I think you got me, that seems to be the case. Huh.

It seems like there are broad Fair Use exceptions that almost completely swallow the ability to win a case against someone who's reproducing without distributing, but, yeah, still an infringement.

I hate our laws more than I already thought I did. Oof.

Christian Schwägerl Feb 15

@Azuaron @yaarur @JeffGrigg @jamie @fsinn @pluralistic Please share how you see authors, artists, designers, musicians, journalists (…) make a living in the future when they are stripped off any protection for the commercial use of their work by others and a new technology industrialises unlimited plagiarism by anybody.

@christianschwaegerl @yaarur @JeffGrigg @jamie @fsinn @pluralistic Please share where I ever said I liked AI, or I thought AI was good, or I thought AI was useful, or that I thought US copyright laws were good, or that I thought US copyright laws were sufficient to handle AI.

I really need you to actually read what I'm saying, and not make up some bullshit in your head about what you think I'm saying. That question you just asked me? It's what we call "Not even wrong." You're not even close to being merely wrong. That's how bad your question is.

This is not the first time you've made up some bullshit in your head about what I'm saying, but this is your first and only warning. The next time will just be a block.

Christian Schwägerl Feb 15

@Azuaron @yaarur @JeffGrigg @jamie @fsinn @pluralistic I‘ve asked a question.

@christianschwaegerl Wow, there's not even a working brain in there, is there?

When someone tells you your question is so bad it can't even be considered wrong, you don't double-down on the question.

Sheesh.

And goodbye.

Francisca Sinn Feb 15

@yaarur Correct. I would also add though that “fair use” as it is mostly advanced by these “AI” companies and debated in the media is U.S. centric and I gently remind everyone that U.S. laws do not apply around the world. While the idea behind allowing exceptions is articulated in the Berne Convention, laws vary.

In Canada for example, “fair use” exception doesn't exist. We allow for “fair dealing” which has a specific test.

@Azuaron @christianschwaegerl @JeffGrigg @jamie @pluralistic

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Fair dealing is different, but it's wholly incorrect to characterize it as narrow - the fair dealing exemption for e.g. course packs are broader than anything set in US law.

Francisca Sinn Feb 15

@pluralistic
Most of my work in commercial or not-for-profit, not in education, so for our purposes I find it narrow but I take your point and will rephrase.

For “fair dealing” there is a specific multi-part test as to whether the proposed use fits within the exemption from infringement. If the use fits within that test, the intent of the exemption is embraced and allows a broad application.

Better? If I’m overstating it, I’ll correct.

@yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

I think you're conflating practice with law.

As with fair use, there is a multi-step test, but (as with fair use) judges often interpret those tests more broadly than industry assumes they will.

Which means that the lines firms ask their staff to colour within are more restrictive than the potential contours of the law, which is often expanded when novel uses arise.

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

The reason for this is that fair dealing was modernized after a string of technological changes that showed how risky it is to set copyright's contours too firmly - everything from home recording (first audio, then video) to software backups, to video game rentals, etc etc. All of these were beyond what Parliament could reasonably be expected to have contemplated when deliberating over the law.

Francisca Sinn Feb 15

@pluralistic I appreciate your thoughts on this, Cory, and I’ll take the opportunity to thank you for all of your insightful and frequent writing and publishing generally (oft boosted).

I don’t think I’m conflating, though I understand your point. When I advise on potential use, and when we need to enforce against infringement, we use the written law. “It’s often accepted” is a risk position we discuss but is not guaranteed to hold.

@yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Francisca Sinn Feb 15

You're wrong on the law and that needs to be called out because that’s not a difference of opinion. Also, internationally laws vary on the limited exceptions.

Nevertheless “reproducing” the work is one infringement, and “distributing” the work is a further, secondary, infringement.

For ref, Canada’s Copyright Act:
the company is building an economic engine that creates strong incentives to override its own rules.

@yaarur @christianschwaegerl @JeffGrigg @jamie @pluralistic

Francisca Sinn Feb 15

@yaarur You’ve highlighted something that’s often missed in the sweeping generalizations. The root issue is consent, not compensation. If consent, then the question of $.

If an artist hangs a painting in a gallery window, it’s “available” for viewing. No one would seriously make the argument that any other person can photograph (copy) the work, print tshirts with the image, alone or as one of many works on the shirt, and sell them.

@Azuaron @christianschwaegerl @JeffGrigg @jamie @pluralistic

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie Even where no copyright exception applies, copyright law frequently substitutes blanket licenses or set-rate payments for consent. Everything from the public lending right to the compulsory mechanical license for sound recordings to the blanket license for home recording and sound recording broadcasts, etc.

Francisca Sinn Feb 15

@pluralistic Apologies if this is a repeat of what you’ve said elsewhere, but would you say that these broadening exemptions are based on an evaluation of the reasonable expectations of the rights holder at the time of the initial grant? Essentially that with consent-for-A it’s reasonably understood that consent-for-B is an accepted and understood (or integral and necessary) part of A?

B/c there’s no such argument for the tshirt person.

@yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

No, absolutely not - by definition, a new exemption that implicates a technology that didn't exist at the time a work was made couldn't have been contemplated at that time.

Think of John Philip Sousa railing against the record player ("if these infernal talking machines are allowed to go on, man will not have a voice box, we will lose it as we lost our tails when we came down out of the trees").

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

By definition, he couldn't have contemplated the phonogram when he was writing his compositions.

Francisca Sinn Feb 15

@pluralistic Part of what makes the analysis challenging (and interesting), is that I find that people often conflate the original purpose of the recognition of copyright with those for the introduction of patent laws, where the goal *is* the distribution. I’m not suggesting that you are, only that I’m interested to go to your writings and see how you handle that part of the analysis so I can better address the argument(s).

@yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Cory Doctorow Feb 15

@fsinn @yaarur @Azuaron @christianschwaegerl @JeffGrigg @jamie

Well, the original purpose of copyright law (the Statute of Anne) was to help English publishers wage a trade war against their Scots rivals.