Pretty much EVERY book I've ever published got stolen by Meta and is listed in this database. That's over 30 books, a 25 year career output. (Need to find a UK class action lawsuit to join, or a US one that's open to non-US residents whose work was published in the USA).
https://retro.pizza/@digitalraven/114199906574357235
Everyday Cyborg (@[email protected])

Seven of the #RPG books I worked on were pirated by #Meta to train their #AI #Bullshit Regardless as to my feelings on copyright, the IP owner did not consent to their inclusion in the dataset. Meta's use is fundamentally immoral to the point that my own works will have an exclusion to their existing permissive licences to say "Fuck you and your idiot autocorrect" See if they've pirated your work here: https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/

retro.pizza
@cstross that's a tough case, because they aren't copying, they're ingesting. The defense is that it's like borrowing a book and reading it, rather than taking it or making it available.
@quinn @cstross Wouldn’t this logic mean it should be okay to download and share ebooks, movies, and music as long as we delete the files when done with them?
@marcink @cstross downloading is distinct from sharing in the law, but i will note deleting things before law enforcement gets to you has proven a pretty solid strategy in many, many cases.
@marcink @cstross (I'm really not trying to take a position, just pointing out that it's a pretty complicated case, and honestly a novel one too, so it's likely to be unguessable until litigated.)

@marcink @cstross ...and as charlie has pointed out, it won't be one case; the law for each country is quite different when you get to that point..

it's kind of a fascinating mess. I know folks have a lot of strong feelings for a lot of good reasons, but my little law brain can't stop going "ohh! shiny! interesting!"

@quinn @marcink It's a scary mess, made even scarier because back in the 1970s there was a big push to harmonize copyright law internationally (the USA was a big outlier and miscreant) but then the USA pushed the WIPO treaty out globally via WTO and froze everything in amber in a manner favourable to huge corporate owners rather than creators. And now you need to get buy-in from about 190 governments if you want to re-tool the foundations of copyright.
@cstross @marcink i might not completely agree on the miscreant front, given the use and abuse of copyright by major corps and powerful people especially in Europe *cough* France *cough* but we did lie a lot about moral rights.
@quinn @marcink Moral rights, in practical terms, aren't worth a bucket of warm spit. It's the right to be identified as the creator of a work, but with zero control over the uses of the work and no right to be paid for it.
@cstross @marcink not so in France! *ptsd laughter*

@cstross @quinn @marcink

I don't know that it is as bad as re-tooling copyright negotiating with 190 governments.
E.g., a block such as the EU could create a new intellectual property right in not being datamined for training LLMs. After all, the EU created the sui generis database right out of whole cloth a while back!
AI companies in Silicon Valley would scream, but the US has gone full torment nexus now, so to hell with them: other regions can make clean parts of the 'net free of AI slop.

@cstross @quinn @marcink
There's also an argument that LLM-training is exactly what copyright was designed to prevent: the MB of model parameters are a lossy compression of the training corpus, and LLM outputs are a parametrized remix.
Sure, remix of lots of small pieces, but my feeling is that with great (economic) power (and scale) comes great (copyright) responsibility, so doing thousands of small thefts per second for a powerful company has to be against the idea of copyright, doesn't it?
@cstross @quinn @marcink *all* big problems that remain are because we don't have a world government.