Typical ML argument: "If I can read something legally, why can't I train an LLM on it?"

Humans are capable of reading things and later writing a similar thing that is still a copyright violation. If I go and write a book that follows the plot line of Star Wars, that's still a copyright violation, even if no text is literally the same. If I play the melody to a song on my piano and release it without the appropriate mechanical cover license, that's also a copyright violation.

The reason this does not happen often is that, as humans, we are aware that that's plagiarism and there are rules. Sometimes it happens by accident, and people still get sued and lose.

LLMs have no such awareness and routinely output things which are blatant copyright violations when appropriately prompted. That means the model weights encode that work, and therefore, are themselves a derivative work.

Your brain encodes a massive amount of copyrighted information. You are not a walking copyright violation because humans aren't data, can't be copied and distributed en masse, have human rights, etc. This is why "mind reading machines" are a classic dystopian plot point (monetizing your thoughts etc).

An LLM is not a human, does not have human rights, nor human privileges. It is data, and if it encodes copyrighted information, that's a derivative work. If you aren't following the license of the training data, that's a copyright violation.

@lina

> Yes, this means that anyone downloading "open" models potentially puts themselves in as much legal risk as torrenting a movie does.

Huggingface still seems to be operating?

Also there is a big difference between bidirectionally torrenting somethng and "downloading" it... they are not interchangeable terms; normal humans are not getting into legal peril for downloading / streaming movies or music things (because it is not their "performance" of it but whoever is sending it to them).

@hopeless

> Huggingface still seems to be operating?

People get away with torrenting movies all the time too, doesn't mean it's legal.

Fair point on torrenting vs downloading though (because seeding), edited. But yes people do get in trouble for just downloading too. Depends on the country.

@lina @hopeless > People get away with torrenting movies all the time too, doesn't mean it's legal.

It really should be, and in a lot of places it is. Only the most awful of places have made it criminal to share things without profit.
@lispi314 @lina Yeah. I don't know why there is such a Copyright Maximalism speedrun going on here.