Mastodawn

Simon Willison Jan 25, 2024

I'm quoted in this @arstechnica piece about that recent "AI generated" George Carlin special

I don't think it was written by AI

I found the whole thing grossly disrespectful, but I do slightly appreciate the meta-joke here that the AI generated text is fake and was actually written by humans

https://arstechnica.com/ai/2024/01/did-an-ai-write-that-hour-long-george-carlin-special-im-not-convinced/

Did an AI write that hour-long “George Carlin” special? I’m not convinced.

"Everyone is ready to believe that AI can do things, even if it can't."

Ars Technica

Show thread

Simon Willison Jan 25, 2024

“The real story here is… everyone is ready to believe that AI can do things, even if it can't,” Willison told Ars. “In this case, it's pretty clear what's going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.”

Show thread

Simon Willison Jan 27, 2024

Confirmed by the New York Times:

> Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.
>
> “It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

https://www.nytimes.com/2024/01/26/arts/carlin-lawsuit-ai-podcast-copyright.html

George Carlin’s Estate Sues Podcasters Over A.I. Episode

The lawsuit claims that an hourlong comedy special on YouTube violated Carlin’s copyright.

The New York Times

Show thread

Ben Ramsey Jan 27, 2024

@simon I’m not able to read the article, but it sounds like a copyright claim issue. Why would it be any less of a copyright violation if it wasn’t A.I.? That is, they claim they wrote it and not A.I., so does that change the copyright infringement claim?

Show thread

Simon Willison Jan 27, 2024

@ramsey I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

The lawsuit still has legs though, see point 81: "Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs"

That's "rights of publicity" which I believe is a separate thing from copyright

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

Show thread

Ben Ramsey Jan 27, 2024

@simon > I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

This is where I’m interested in understanding how the court will respond to cases like this. In a sense, the author of the material trained their brain on George Carlin’s copyrighted material and produced a work that imitates his style.

How is an LLM any different?

Show thread

Simon Willison Jan 27, 2024

@ramsey this is effectively the same argument that's core to the NYT lawsuit against OpenAI and Microsoft - the argument is that the LLM model itself is a derived work of the content that was used to train it, and that it falls outside of "fair use" criteria - that's the key question which needs to be decided in court

Show thread

Ben Ramsey Jan 27, 2024

@simon How is the LLM responding when I ask it to quote from specific books? For example, I just prompted ChatGPT 3.5 to give me the first few paragraphs from The Hobbit, and it gave them to me verbatim.

Show thread

Ben Ramsey Feb 1, 2024

@simon Not sure whether you saw my question here, but I’m still very curious and perplexed by this. If an LLM doesn’t store the full text of materials it was trained on, then how does it produce output like what I’m seeing?

Show thread

Simon Willison Feb 1, 2024

@ramsey my current mental model is that memorization can happen if it's seen multiple copies of the same text, such that it effectively encodes the probability of word 60 in that text as following words 1 through 59 as being extremely high

Show thread

Ben Ramsey

@simon I guess the question the courts will have to answer is whether capturing the probability at such a high level is enough to constitute holding a copy of the work, since the work can be reproduced with such a low level of effort, when prompted.

Show thread

Simon Willison Feb 1, 2024

@ramsey yeah that feels like the right question to me - and honestly I don't think there's an obvious "right" answer to it, no idea how this will shake out in court