Today's threads (a thread)

Inside: Supreme Court saves artists from AI; and more!

Archived at: https://pluralistic.net/2026/03/03/its-a-trap/

#Pluralistic

1/

@pluralistic

> [..] all seek to establish that training an AI model is a copyright infringement. This is wrong [..]

This is such a strange hill to die on. Of course "training" infringes - it's compression.

Framing it as "mathematical analysis" or collecting "facts" is disingenuous.

Fourier transform could be easily described the exact same way, except it's fully reversible.

"Facts" wrt copyright are not meant to be the building blocks needed to reconstruct a work to some degree.

@archo

> "Facts" wrt copyright are not meant to be the building blocks needed to reconstruct a work to some degree.

Literally you could write a book describing each brushstroke needed to reconstruct a copyrighted painting without violating that painting's copyright.

@archo Further: the cases against models are only tangentially related to memorization. It's undeniable that these suits would exist even if models had "guardrails" that *perfectly* prevented them from reproducing their training data verbatim. Memorization is *not* the crux of any of these complaints - *training* is.

@pluralistic Perfect memorization is also irrelevant - lossy compression is also a thing, doesn't stop copyright from applying (otherwise reencoding videos would remove their copyright).

Ultimately though what matters for infringement is whether the original work was substantially used to produce the new work (regardless of the exact process) - and clearly that's the case.

Not even the AI companies dispute infringement, they argue that it's "fair use" instead.

@archo

> Ultimately though what matters for infringement is whether the original work was substantially used to produce the new work (regardless of the exact process) - and clearly that's the case.

This is completely incorrect.

You could cut a picture into a billion pixels and reconstitute them as a completely different picture without infringing anyone's copyright.

Copyright has nothing to do with whether the constituent components came from a copyrighted work. That's just completely wrong

@pluralistic In that case you can argue that the particular work was not substantially used in the end result, if all you needed was a pile of pixels (or a palette) without their positional association to each other.

To me, cases like this sit far outside the "substantially used" circle.

AI "models" on the other hand, explicitly consider the positional associations of the pixels of the original images at various scales, not just their color.

@archo You could produce a painting that consisted of the "average" face by measuring and averaging out the features of every face in every painting extant and it would not infringe copyright.

@pluralistic If it's just measurement and not color, features or anything else, then probably yes.

Aggregate functions are naturally destructive - the source data doesn't come through almost at all. Very much unlike autoencoders.