Typical ML argument: "If I can read something legally, why can't I train an LLM on it?"

Humans are capable of reading things and later writing a similar thing that is still a copyright violation. If I go and write a book that follows the plot line of Star Wars, that's still a copyright violation, even if no text is literally the same. If I play the melody to a song on my piano and release it without the appropriate mechanical cover license, that's also a copyright violation.

The reason this does not happen often is that, as humans, we are aware that that's plagiarism and there are rules. Sometimes it happens by accident, and people still get sued and lose.

LLMs have no such awareness and routinely output things which are blatant copyright violations when appropriately prompted. That means the model weights encode that work, and therefore, are themselves a derivative work.

Your brain encodes a massive amount of copyrighted information. You are not a walking copyright violation because humans aren't data, can't be copied and distributed en masse, have human rights, etc. This is why "mind reading machines" are a classic dystopian plot point (monetizing your thoughts etc).

An LLM is not a human, does not have human rights, nor human privileges. It is data, and if it encodes copyrighted information, that's a derivative work. If you aren't following the license of the training data, that's a copyright violation.

@lina If I retell Star Wars with different character names, not as a publication but just around a campfire with my family, that still isn't illegal, right? Lucasfilm/Disney only get to send in the legal team to black-bag me if I try to, e.g., publish a wax cylinder of my campfire stories (as sincere non-parody). Just saying a plot aloud isn't a problem, nor is sketching Mickey Mouse on a napkin, right?

@paul The story is still copyrighted, but telling it to your family wouldn't count as a "public performance" so wouldn't infringe copyright. Telling it to a crowd at a park probably would, though.

Copyright of characters is complicated and varies by jurisdiction. That said, Mickey Mouse is in the public domain now, so your sketch is totally fine as long as you aren't trying to sell it or pass it off as legitimate Disney merchandise (because Mickey is still trademarked).

(Disclaimer: IANAL, this is just my understanding.)

@lina On a related note, does camp fall under parody law if it's not intentional? Like, this image is probably covered because "Bugs Bunny + Spiderman" but I didn't specifically prompt for that. Diffusion models are just bad at multi-subject stuff.

@paul I have no idea tbh... ^^;;

Parody rights are also not universal, it's a very jurisdiction specific thing.

@lina Need to make sure you end up in a state court with a sense of humor. So, like, avoid the 5th circuit.
@lina @paul It's horrendous that people have truly tried to kill public storytelling & sharing of stories.

How is this not disgusting to more people?
@lispi314 @lina The legal framework does become philosophically intractable for a lot of edge cases. It's especially indefensible when the original IP depended on a folktale with unknown authorship. Feels like Monsanto patenting genes; very "Wait, they can DO that?"
@paul @lina Even in general, most of the remotely valid arguments for copyright that I've heard are really more arguments against capitalism.