Thousands of authors demand payment from AI companies for use of copyrighted works

https://lemmy.world/post/2191673

Thousands of authors demand payment from AI companies for use of copyrighted works - Lemmy.world

Thousands of authors demand payment from AI companies for use of copyrighted works::Thousands of published authors are requesting payment from tech companies for the use of their copyrighted works in training artificial intelligence tools, marking the latest intellectual property critique to target AI development.

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

There are plagiarism and copyright laws to protect the output of these tools: if the output is infringing, then sue them. However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

When you sell a book, you don’t get to control how that book is used. You can’t tell me that I can’t quote your book (within fair use restrictions). You can’t tell me that I can’t refer to your book in a blog post. You can’t dictate who may and may not read a book. You can’t tell me that I can’t give a book to a friend. Or an enemy. Or an anarchist.

Folks, this isn’t a new problem, and it doesn’t need new laws.

It’s 100% a new problem. There’s established precedent for things costing different amounts depending on their intended use.

For example, buying a consumer copy of song doesn’t give you the right to play that song in a stadium or a restaurant.

Training an entire AI to make potentially an infinite number of derived works from your work is 100% worthy of requiring a special agreement. This even goes beyond simple payment to consent; a climate expert might not want their work in an AI which might severely mischatacterize the conclusions, or might want to require that certain queries are regularly checked by a human, etc

My point is that the restrictions can’t go on the input, it has to go on the output - and we already have laws that govern such derivative works (or reuse / rebroadcast).

Well, fine, and I can’t fault new published material having a “no AI” clause in its term of service. But that doesn’t mean we get to dream this clause into being retroactively for all the works ChatGPT was trained on. Even the most reasonable law in the world can’t be enforced on someone who broke it 6 months before it was legislated.

Fortunately the “horses out the barn” effect here is maybe not so bad. Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the “hands off” clause. It’s untenable, OpenAI will be forced to cave and pay up.

OpenAI and such being forced to pay a share seems far from the worst scenario I can imagine. I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes. That could really mean that large parts of humanity would be cut off from knowledge.

I can well imagine copyleft gaining importance in this context. But this form of licencing seems pretty worthless to me if you don’t have the time or resources to sue for your rights - or even to deal with the various forms of licencing you need to know about to do so.

I think it would be much worse if artists, writers, scientists, open source developers and so on were forced to stop making their works freely available because they don’t want their creations to be used by others for commercial purposes.

None of them are forced to stop making their works freely available. If they want to voluntarily stop making their works freely available to prevent commercial interests from using them, that’s on them.

Besides, that’s not so bad to me. The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

That could really mean that large parts of humanity would be cut off from knowledge.

On the contrary, AI is making knowledge more accessible than ever before to large parts of humanity. The only comparible other technologies that have done this in recent times are the internet and search engines. Thank goodness the internet enables piracy that allows anyone to download troves of ebooks for free. I look forward to AI doing the same on an even greater scale.

Shouldn’t there be a way to freely share your works without having to expect an AI to train on them and then be able to spit them back out elsewhere without attribution?
No, there shouldn’t because that would imply restricting what I can do with the information I have access to. I am in favor of maintaining the sort of unrestricted general computing that we already have access to.

The rest of us who want to share with humanity will keep sharing with humanity. The worst case imo is that artists, writers, scientists, and open source developers cannot take full advantage of the latest advancements in tech to make more and better art, writing, science, and software. We cannot let humanity’s creative potential be held hostage by anyone.

You’re not talking about sharing it with humanity, you’re talking about feeding it into an AI. How is this holding back the creative potential of humanity? Again, you’re talking about feeding and training a computer with this material.

Even the most reasonable law in the world can’t be enforced on someone who broke it 6 months before it was legislated.

Sure it can. Just because it is a new law doesn’t mean they get to continue benefiting from IP ‘theft’ forever into the future.

Imagine the FOMO and user frustration when ToS & legislation catch up and now ChatGPT has no access to the latest books, music, news, research, everything. Just stuff from before authors knew to include the “hands off” clause

How is this an issue for the IP holders? Just because you build something cool or useful doesn’t mean you get a pass to do what you want.

basically like the knowledge cutoff, but forever. It’s untenable,

Untenable for ChatGPT maybe, but it’s not as if it’s the end of ‘knowledge’ or the end of AI. It’s just a single company product.

The thing is, copyright isn’t really well-suited to the task, because copyright concerns itself with who gets to, well, make copies. Training an AI model isn’t really making a copy of that work. It’s transformative.

Should there be some kind of new model of renumeration for creators? Probably. But it should be a compulsory licensing model.

Copyright also deals with derivative works.
Derivative and transformative are quite different though.
The slippery slope here is that we are currently considering humans and computers to be different because (something someone needs to actually define). If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.” We have laws already that deal with this but honestly how many books and movies aren’t just remakes of Romeo and Juliet or Taming of the Shrew?!?

Well, Shakespeare has beed dead for a few years now, there’s no copyright to speak of.

And if you make a book based on an existing one, then you totally need permission from the author. You can’t just e.g. make a Harry Potter 8.

But AIs are more than happy to do exacly that. Or to even reproduce copyrighted works 1:1, or only with a few mistakes.

If a person writes a fanfic harry potter 8 it isn’t a problem until they try to sell it or distribute it widely. I think where the legal issues get sticky here are who caused a particular AI generated Harry Potter 8 to be written.

If the AI model attempts to block this behavior. With contract stipulations and guardrails. And if it isn’t advertised as “a harry potter generator” but instead as a general purpose tool… then reasonably the legal liability might be on the user that decides to do this or not. Vs the tool that makes such behavior possible.

Hypothetically what if an AI was trained up that never read Harry Potter. But its pretty darn capable and I feed into it the entire Harry Potter novel(s) as context in my prompt and then ask it to generate an eighth story — is the tool at fault or am I?

Fanfic can actually be a legal problem. It’s usually not prosecuted, because it harms the brand to do so, but if a company was doing that professionally, they’d get into serious hot water.

Regarding your hypothetical scenario: If you train the AI with copyrighted works, so that you can make it reproduce HP8, then you are at fault.

If the tool was trained with HP books and you just ask really nicely to circumvent the protections, I would guess the tool (=> it’s creators) would certainly be at fault (since it did train on copyrighted material and the protections were obviously not good enough), and at the latest when you reproduce the output, you too are.

It seems like people are afraid that AI can do it when i can do it too. But their reason for freaking out is…??? It’s not like AI is calling up publishers trying to get Harry Potter 8 published. If i ask it to create Harry Potter 1 but change his name to Gary Trotter it’s not the AI that is doing something bad, it’s me.

That was my point. I can memorize text and its only when I play it off as my own that it’s wrong. No one cares that I memorized the first chapter and can recite it if I’m not trying to steal it.

That’s not correct. The issue is not whether you play it off as your own, but how much the damages are that you can be sued for. If you recite something that you memorized in front of a handful of friends, the damages are non-existant and hence there is no point in sueing you.

But if you give a large commercial concert and perform a cover song without permission, you will get sued, no matter if you say “This song is from <insert original artist> and not from me”, because it’s not about giving credit, it’s about money.

And regarding getting something published: This is not so much about big name art like Harry Potter, but more about people doing smaller work. For example, voice actors (both for movie translations and smaller things like announcements in public transport) are now routinely replaced by AI that was trained on their own voices without their permission.

Similar story with e.g. people who write texts for homepages and ad material. Stuff like that. And that has real-world consequences already now.

The issue is not whether you play it off as your own, but how much the damages are that you can be sued for.

I think that’s one in the same. I’m just not seeing the damages here because the output of the AI doesn’t go any further than being AI output without a further human act. Authors are idiots if they claim “well someone could ask ChatGPT to output my entire book and you could read it for free.” If you want to go after that type of crime then have ChatGPT report the users asking for it. If your book is accessible via a library I’m not see any difference between you asking ChatGPT to write in someone’s style and asking me to write in their style. If you ask ChatGPT for lines verbatim i can recite them too. I don’t know what legitimate damages they are claiming.

For example, voice actors

I think this is a great example but again i feel like the law is not only lacking but would need to outlaw other human acts not currently considered illegal.

If you do impressions you’re mimicking the tone, cadence and selection of language someone else does. You arent recording them and playing back the recording, you are using your own voice box to create a sound similar to the celebrity. An AI sound generator isn’t playing back a recording either. It’s measuring tone, cadence, and language used and creates a new sound similar to the celebrity. The only difference here is that the AI would be more precise than a humans ability to use their voice.

If you say “AI read my book and output a similar story, you owe me money” then how is that different from “Joe read my book and wrote a similar story, you owe me money.”

You’re bounded by the limits of your flesh. AI is not. The $12 you spent buying a book at Barns & Noble was based on the economy of scarcity that your human abilities constrain you to.

It’s hard to say that the value proposition is the same for human vs AI.

We are making an assumption that humans do “human things”. If i wrote a derivative work of your $12 book, does it matter that the way i wrote it was to use a pen and paper and create a statistical analysis of your work and find the “next best word” until i had a story? Sure my book took 30 years to write but if i followed the same math as an AI would that matter?

It wouldn’t matter, because derivative works require permission. But I don’t think anyone’s really made a compelling case that OpenAI is actually making directly derivative work.

The stronger argument is that LLM’s are making transformational work, which is normally fair use, but should still require some form of compensation given the scale of it.

But no one is complaining about publishing derived work. The issue is that “the robot brain has full copies of my text and anything it creates ‘cannot be transformative’”. This doesn’t make sense to me because my brain made a copy of your book too, its just really lossy.

I think right now we have definitions for the types of works that only loosely fit human actions mostly because we make poor assumptions of how the human brain works. We often look at intent as a guide which doesn’t always work in an AI scenario.

Yeah, that’s basically it.

But I think what’s getting overlooked in this conversation is that it probably doesn’t matter whether it’s AI or not. Either new content is derivative or it isn’t. That’s true whether you wrote it or an AI wrote it.

I agree with that, but do politicians and judges who know absolutely nothing about the subject?

I haf a professor in college who taught about cyber security. He was renowned in his field and was asked by the RIAA to testify about some cases related to file sharing. I lost respect for him when he intentionally refrained from stating that it wasnt possible for anyone outside of the home network yo know what or who was actually downloading stuff. The technology was being ignored and an invalid view was presented for a judge who couldn’t ELI5 how the internet worked let along actually networking protocols.

It’s not even looking for the next best word. It’s looking for the next best token. It doesn’t know what words are. It reads tokens.

Good point.

I could easily see laws created where they blanket outlaw computer generated output derived from other human created data sets and sudden medical and technical advancements stop because the laws were written by people who don’t understand what is going on.

Challenge level impossible: try uploading something long to amazon written by chatgpt without triggering the plagiarism detector.
Focus: ChatGPT launches boom in AI-written e-books on Amazon

Until recently, Brett Schickler never imagined he could be a published author, though he had dreamed about it. But after learning about the <a href="/technology/chatgpts-popularity-explodes-us-lawmakers-take-an-interest-2023-02-13/">ChatGPT artificial intelligence program</a>, Schickler figured an opportunity had landed in his lap.

Reuters

I asked Bing Chat for the 10th paragraph of the first Harry Potter book, and it gave me this:

“He couldn’t know that at this very moment, people meeting in secret all over the country were holding up their glasses and saying in hushed voices: ‘To Harry Potter – the boy who lived!’”

It looks like technically I might be able to obtain the entire book (eventually) by asking Bing the right questions?

Then this is a copyright violation - it violates any standard for such, and the AI should be altered to account for that.

What I’m seeing is people complaining about content being fed into AI, and I can’t see why that should be a problem. Only the output can be problematic.

No, the AI should be shut down and the owner should first be paying the statutory damages for each use of registered works of copyright (assuming all parties in the USA)

If they have a company left after that, then they can fix the AI.

Again, my point is that the output is what can violate the law, not the input. And we already have laws that govern fair use, rebroadcast, etc.
I think it’s not just the output. I can buy an image on any stock Plattform, print it on a T-Shirt, wear it myself or gift it to somebody. But if I want to sell T-Shirts using that image I need a commercial licence - even if I alter the original image extensivly or combine it with other assets to create something new. It’s not exactly the same thing but openAI and other companies certainly use copyrighted material to create and improve commercial products. So this doesn’t seem the same kind of usage an avarage joe buys a book for.

However, if the output of an AI would not be considered infringing for a human, then it isn’t infringement.

It’s an algorithm that’s been trained on numerous pieces of media by a company looking to make money of it. I see no reason to give them a pass on fairly paying for that media.

You can see this if you reverse the comparison, and consider what a human would do to accomplish the task in a professional setting. That’s all an algorithm is. An execution of programmed tasks.

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers. STEM companies regularly pay for access to papers and codes and standards. Why shouldn’t an AI have to do the same?

If I gave a worker a pirated link to several books and scientific papers in the field, and asked them to synthesize an overview/summary of what they read and publish it, I’d get my ass sued. I have to buy the books and the scientific papers.

Well, if OpenAI knowingly used pirated work, that’s one thing. It seems pretty unlikely and certainly hasn’t been proven anywhere.

Of course, they could have done so unknowingly. For example, if John C Pirate published the transcripts of every movie since 1980 on his website, and OpenAI merely crawled his website (in the same way Google does), it’s hard to make the case that they’re really at fault any more than Google would be.

well no, because the summary is its own copyrighted work
The summary is open to fair use by web crawlers. That was settled in Perfect 10 v Amazon.
Haven’t people asked it to reproduce specific chapters or pages of specific books and it’s gotten it right?
I haven’t been able to reproduce that, and at least so far, I haven’t seen any very compelling screenshots of it that actually match. Usually it just generates text, but that text doesn’t actually match.
Gotcha. This seems like a good way to test for it then, I think.

There is already a business model for compensating authors: it is called buying the book. If the AI trainers are pirating books, then yeah - sue them.

That’s part of the allegation, but it’s unsubstantiated. It isn’t entirely coherent.

It’s not entirely unsubstantiated. Sarah Silverman was able to get ChatGPT to regurgitate passages of her book back to her.
I don’t know if this holds water though. You don’t need to trail the AI on the book itself to get that result. Just on discussions about the book which for sure include passages on the book.

Her lawsuit doesn’t say that. It says,

when ChatGPT is prompted, ChatGPT generates summaries of Plaintiffs’ copyrighted works—something only possible if ChatGPT was trained on Plaintiffs’ copyrighted works

That’s an absurd claim. ChatGPT has surely read hundreds, perhaps thousands of reviews of her book. It can summarize it just like I can summarize Othello, even though I’ve never seen the play.

Silverman v. OpenAI, Inc., 3:23-cv-03416 - CourtListener.com

Docket for Silverman v. OpenAI, Inc., 3:23-cv-03416 — Brought to you by Free Law Project, a non-profit dedicated to creating high quality open legal information.

CourtListener

This is a little off, when you quote a book you put the name of the book you’re quoting. When you refer to a book, you, um, refer to the book?

I think the gist of these authors complaints is that a sort of “technology laundered plagiarism” is occurring.

Copyright 100% applies to the output of an AI, and it is subject to all the rules of fair use and attribution that entails.

That is very different than saying that you can’t feed legally acquired content into an AI.

When you sell a book, you don’t get to control how that book is used.

This is demonstrably wrong. You cannot buy a book, and then go use it to print your own copies for sale. You cannot use it as a script for a commercial movie. You cannot go publish a sequel to it.

Now please just try to tell me that AI training is specifically covered by fair use and satire case law. Spoiler: you can’t.

This is a novel (pun intended) problem space and deserves to be discussed and decided, like everything else. So yeah, your cavalier dismissal is cavalierly dismissed.

I completely fail to see how it wouldn’t be considered transformative work
Typically the argument has been “a robot can’t make transformative works because it’s a robot.” People think our brains are special when in reality they are just really lossy.
Even if you buy that premise, the output of the robot is only superficially similar to the work it was trained on, so no copyright infringement there, and the training process itself is done by humans, and it takes some tortured logic to deny the technology’s transformative nature
Go ask ChatGPT for the lyrics of a song and then tell me, that’s transformative work when it outputs the exact lyrics.
Well, they’re fixing that now. I just asked chatgpt to tell me the lyrics to stairway to heaven and it replied with a brief description of who wrote it and when, then said here are the lyrics: It stopped 3 words into the lyrics.
In theory as long as it isn’t outputting the exact copyrighted material, then all output should be fair use.

Try it again and when it stops after a few words, just say “continue”. Do that a few times and it will spit out the whole lyrics.

It’s also a copyright violation if a human reproduces memorized copyrighted material in a commercial setting.

If, for example, I give a concert and play all of Nirvana’s songs without a license to do so, I am still violating the copyright even if I totally memorized all the lyrics and the sheet music.

This feels like a solution to a non-problem. When a person asks the AI “give me X copyrighted text” no one should be expecting this to be new works. Why is asking ChatGPT for lyrics bad while asking a human ok?
It’s not legal for anyone (human or not) to put song lyrics online without permission/license to do so. I was googling this to make sure I understood it correctly and it seems that reproducing the lyrics to music without permission to do so is copyright infringement. There are some lyrics websites that work with music companies to get licensing to post lyrics but most websites host them illegally and will them then down if they receive a DMCA request.

Wait wait wait. That is not a good description of what is happening. If you and i are in a chat room and you asked me the lyrics, my verbalization of them isn’t an issue. The fact it is online just means the method of communication is different but that should be covered under free use.

The AI is taking prompts and proving the output as a dialog. It’s literally a language model so there is a process of synthesizing your question and generating your output. I think that’s something people either don’t understand or completely ignore. Its not as if there are entire books verbatim stored as a contiguous block of data. The data is processed and synthesized into a language model that then generates an output that happens to match the requested text.

This is why we cant look at the output the same way we look at static text. In theory if you kept training it in a way then opposed the statistical nature of your book or lyrics you could eventually get to the point where asking the AI to generate your text would give a non-verbatim answer.

I get that this feels like semantics but creating laws that don’t understand the technology means we end up screwing ourselves over.

I get how LLMs work and I think they’re really cool. I’m just trying to explain why OpenAI is currently limiting these abilities to continue operating within our legal system.

Publishing lyrics publicly online is illegal while communicating them privately in a chatroom is probably fine. Communicating them in a public forum is a grey area, but you likely won’t be caught or prosecuted. If a big company hosts an AI chatbot which can tell you the lyrics to any song on demand, then that seems like an illegal service unless they have the rights.

Feel free to look up the legality of publishing lyrics online, all I saw was information saying that it is illegal but they don’t prosecute anyone but the larger companies.

I guess my question is why does it seem like an illegal service? Not saying it isn’t but it feels like non technical people will say “it knows the lyrics and can tell me them so it must contain them.”

To me the technology is moving closer to mimicking human memory than just plain storage retrieval. ChatGPT gets things wrong often because that process of presenting data is not copying but generation. The output is the output so presenting anything copyright falls under the appropriate laws but until the material is actually presented some of the arguments being made feel wrong. If i can read a book and then write anything, the fact your story is in my head shouldn’t be a problem. If you prompt the AI for a book…isn’t that your fault by asking?

Go ask a human for the lyrics of a song and then tell me that’s transformative work.

Oh wait, no one would say that. This is why the discussion with non-technical people goes into the weeds.