I'm quoted in this @arstechnica piece about that recent "AI generated" George Carlin special

I don't think it was written by AI

I found the whole thing grossly disrespectful, but I do slightly appreciate the meta-joke here that the AI generated text is fake and was actually written by humans

https://arstechnica.com/ai/2024/01/did-an-ai-write-that-hour-long-george-carlin-special-im-not-convinced/

Did an AI write that hour-long “George Carlin” special? I’m not convinced.

"Everyone is ready to believe that AI can do things, even if it can't."

Ars Technica
“The real story here is… everyone is ready to believe that AI can do things, even if it can't,” Willison told Ars. “In this case, it's pretty clear what's going on if you look at the wider context of the show in question. But anyone without that context, [a viewer] is much more likely to believe that the whole thing was AI-generated… thanks to the massive ramp up in the quality of AI output we have seen in the past 12 months.”

Confirmed by the New York Times:

> Danielle Del, a spokeswoman for Sasso, said Dudesy is not actually an A.I.
>
> “It’s a fictional podcast character created by two human beings, Will Sasso and Chad Kultgen,” Del wrote in an email. “The YouTube video ‘I’m Glad I’m Dead’ was completely written by Chad Kultgen.”

https://www.nytimes.com/2024/01/26/arts/carlin-lawsuit-ai-podcast-copyright.html

George Carlin’s Estate Sues Podcasters Over A.I. Episode

The lawsuit claims that an hourlong comedy special on YouTube violated Carlin’s copyright.

The New York Times

The lawsuit still has legs I think, since it's not just about using copyrighted content to train an AI (which they didn't do) - it also complains about "violation of rights of publicity" - see point 81 in this PDF:

> Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

@simon Reminiscent of Tom Waits successfully suing Frito-Lay long before AI imitations.
https://www.mentalfloss.com/article/79648/when-tom-waits-sued-frito-lay-over-doritos-ad
When Tom Waits Sued Frito-Lay Over a Doritos Ad | Mental Floss

“There's a new tortilla chip called SalsaRio Doritos," Crooned the Waits impersonator. "It's buffo, boffo, bravo, gung-ho, tallyho but never mellow.”

Mental Floss
@simon If anything, right of publicity is a much stronger legal case. It's well-established in US law -- and while there have been multiple suits alleging that training an AI on copyrighted text is infringement, or that publishing the output is, or... something like that, I don't think there's been a ruling on any of them. Also, in some of those suits, especially the NY Times case, it's at issue that the LLM can be induced to recite training data *exactly*; that's clearly not at issue here.

@simon A colleague notes that both the right of publicity claims are likely to fail. The California common-law right is subject to caselaw (from a case involving Bela Lugosi's estate) that it doesn't continue after the celebrity's death. The California statutory right is post-mortem, but excludes "fictional or nonfictional entertainment.” They might be winners in some other state, but not in California.

State-specific details at https://rightofpublicityroadmap.com/state_page/california/

California – Rothman's Roadmap to the Right of Publicity

@simon one of the areas of IP law I have *zero* understanding of is “likeness rights” or “rights of publicity”. Like I know if X takes a photograph of Y then Y has no *copyright* because X is the “author” of the photo. But what rights *do* they have? In what jurisdictions?
@simon I’m not able to read the article, but it sounds like a copyright claim issue. Why would it be any less of a copyright violation if it wasn’t A.I.? That is, they claim they wrote it and not A.I., so does that change the copyright infringement claim?
@ramsey @simon I had the same thought. Then again, if it was a parody, what would be the difference between an AI and an impersonator? This is all so murky, right now.
@gadgetboy @simon @tappenden So, it sounds like what is at issue isn’t that the content of the podcast itself violates Carlin’s copyright, but the estate contends they trained an AI using copyrighted materials, and that’s what they are suing over. This is pretty interesting.
@ramsey @tappenden Yeah, except they didn't train AI over copyrighted materials at all - they just said that they did because it's part of their "Dudesy" comedy bit

@ramsey I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

The lawsuit still has legs though, see point 81: "Defendants have knowingly and intentionally utilized and continue to utilize the name, image and likeness of Carlin without the consent of Plaintiffs"

That's "rights of publicity" which I believe is a separate thing from copyright

https://deadline.com/wp-content/uploads/2024/01/George-Carlin-AI-lawsuit.pdf

@simon > I don't see how it's a copyright violation if someone wrote an hour of original material trying to imitate George Carlin's style - where's the copyrighted content they are duplicating?

This is where I’m interested in understanding how the court will respond to cases like this. In a sense, the author of the material trained their brain on George Carlin’s copyrighted material and produced a work that imitates his style.

How is an LLM any different?

@ramsey this is effectively the same argument that's core to the NYT lawsuit against OpenAI and Microsoft - the argument is that the LLM model itself is a derived work of the content that was used to train it, and that it falls outside of "fair use" criteria - that's the key question which needs to be decided in court
@simon How is the LLM responding when I ask it to quote from specific books? For example, I just prompted ChatGPT 3.5 to give me the first few paragraphs from The Hobbit, and it gave them to me verbatim.
@simon It is interesting, though, that while it’s a verbatim recreation of the opening paragraphs, all the British (Commonwealth) spellings have been replaced with American spellings. 🤣
@simon Not sure whether you saw my question here, but I’m still very curious and perplexed by this. If an LLM doesn’t store the full text of materials it was trained on, then how does it produce output like what I’m seeing?
@ramsey @simon I don’t know the details, specifically, but isn’t this somewhat like how you know what number comes after 1827391723793472349 without ever having counted to it?
@sean @simon Maybe? So, it can quote entire passages from books, based on that premise?

@ramsey @simon I’m not sure, either. Maybe it tokenizes and stores popular excerpts like the first few paragraphs.

I should probably have just stayed out of this; I admittedly don’t know what I’m talking about. (-:

@sean @simon Haha. It’s fun to guess (hypothesize) at what it does. 🤷‍♂️

I’m asking Simon because I know he’s done a lot of research on this. I’m very close to leaning towards LLMs not violating copyright if they don’t store copyrighted material and are only “learning” patterns. In that way, it’s very similar to the human brain. But if an LLM can reproduce the first few pages of copyrighted material, then thats problematic, for me.

@ramsey @sean @simon Training LLMs on data, for which no permission has been given is problematic to me.
@derickr @sean @simon I’m not saying it’s not problematic to me, but I’m open to thinking about it.

@ramsey @sean @simon

I dunno man. I'm pretty far on the other side. Giving model builders free range to train their stuff on things humans have built seems like a large transfer of wealth from the creative class to the technology class.

Also, if my kid's school wants to teach my kids music. They need to pay for that music. Even though it's just for training! Why give these model building billionaires a free ride?

@preinheimer @sean @simon I’m not saying they shouldn’t have to pay the creators.
@ramsey Thank you for correcting me!
@preinheimer I can’t tell whether this is sarcasm. How did I correct you?

@ramsey It's not sarcasm!

Just your clarification that you weren't suggesting that they shouldn't pay creators.

@preinheimer Stealing from creators to train their models is wrong and evil. My comment about (potentially) not violating copyright was more about how the LLM stores the information.

@ramsey @simon
My mental model of what an llm is that it's a "probability machine": given some input it generates the most probable output.

If you want to go deeper, I have found this article by Stephen Wolfram quite helpful: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

What Is ChatGPT Doing … and Why Does It Work?

Stephen Wolfram explores the broader picture of what's going on inside ChatGPT and why it produces meaningful text. Discusses models, training neural nets, embeddings, tokens, transformers, language syntax.

@ramsey my current mental model is that memorization can happen if it's seen multiple copies of the same text, such that it effectively encodes the probability of word 60 in that text as following words 1 through 59 as being extremely high
@simon I guess the question the courts will have to answer is whether capturing the probability at such a high level is enough to constitute holding a copy of the work, since the work can be reproduced with such a low level of effort, when prompted.
@ramsey yeah that feels like the right question to me - and honestly I don't think there's an obvious "right" answer to it, no idea how this will shake out in court
@ramsey but... the NYT lawsuit has lots of examples of it memorizing full articles - were those present multiple times in the training data or did OpenAI mark NYT content as specifically "high quality" in a way that made it more likely to memorize them?
@simon fwiw sasso did an interview with Lex Friedman about dudesy. P sure it’s like, inside out ai. Like the ai prompts the humans?

@frew Wow that's embarrassing if the Friedman interview didn't hone in on the fact that it's all basically a comedy hoax

They started the Dudesy thing back in early 2022, before even GPT-3.5-Turbo / ChatGPT had been released - there's no WAY they had anything interesting running on 2022-era GPT-3

@simon right, I wouldn’t characterize it as a hoax as much as a gimmick? It’s been a long time though so I could be forgetting

@frew I mean it's a comedy bit - no harm caused at first, but it's started contributing to the problem that people think AI is capable of WAY more than it actually is

I'm not a regular Lex Friedman viewer so maybe I'm wrong in guessing that he would care about whether or not the things his guests tell him are misleading or not!

@simon eh Lex is great but the podcast is an entertainment venue more than anything else. The origins of “AI Podcast” are basically irrelevant at this point
@frew Found that snippet of the Friedman interview here, and yeah he just gave the story that some anonymous company built them an AI without being challenged on it https://www.youtube.com/watch?v=xewD1apJNhw&t=2649
@simon @arstechnica Someone didn't proofread the citation of the quote, which says you are research instead of a researcher 🤔
@mahryekuh @arstechnica hah yeah I spotted that, I've reported it to them
@mahryekuh @simon @arstechnica it's actually true, he's AI research. he's actually an advanced artificial intelligence that researchers are allowing to access to internet freely to see what it does.
@askiiart @arstechnica @simon You say misinfo, I say we make this canon 😂
@simon It's fascinating that this has become a staple genre in our culture – humans trying to make their entertainment more interesting by implying it was generated by computers – from Max Headroom to Horse_ebooks
@simon @arstechnica Wouldn't be shocked if the AI took a crack at it, and then it was "punched up" by actual humans. Seems to be what studios are hoping for.
Keep a few people around to make the AI output functional. 🙄
@SomeGadgetGuy @arstechnica I think they might have thrown a few prompts through ChatGPT to help brainstorm ideas along the way, but that's a long way from "the AI wrote it"
@simon @arstechnica Agreed. We have to keep the hype train running, but the practical application of these AI tools doesn't seem to be quite covering their energy costs yet...
@simon @arstechnica What a mess! Hard to believe they thought they’d get away with it?