Pretty much EVERY book I've ever published got stolen by Meta and is listed in this database. That's over 30 books, a 25 year career output. (Need to find a UK class action lawsuit to join, or a US one that's open to non-US residents whose work was published in the USA).
https://retro.pizza/@digitalraven/114199906574357235
Everyday Cyborg (@[email protected])

Seven of the #RPG books I worked on were pirated by #Meta to train their #AI #Bullshit Regardless as to my feelings on copyright, the IP owner did not consent to their inclusion in the dataset. Meta's use is fundamentally immoral to the point that my own works will have an exclusion to their existing permissive licences to say "Fuck you and your idiot autocorrect" See if they've pirated your work here: https://www.theatlantic.com/technology/archive/2025/03/search-libgen-data-set/682094/

retro.pizza
@cstross even the obscure ones ... :(
@cstross Please don't campaign to destroy public access digital archives. LibGen isn't created by Meta.
@truh Oh, fuck the Internet Archive too: those guys pirated works that were readily available in ebook stores during lockdown. Inexcusable and deliberate lawbreaking.
Even if you think their lending model during COVID was unreasonable, IA doesn't deserve to be bankrupted for that.
@digifox.binaryden.net No, but they definitely deserve to be smacked over the nose with a rolled-up newspaper and forced to comply with the law—and to make reasonable reparations for the COVID-period abuses.
Okay, but that's not what's happening. The current judgement against them will bankrupt them and we'll all be worse off for it.
@digifox.binaryden.net They invited the current judgement by being reckless idiots wrt. the existing law, and inviting a lawsuit rather than settling out of court.

@cstross @digifox.binaryden.net

Yeah, I really don't get their strategy.

Maybe they should have split "protecting cultural heritage works in the public domain" from "disrupting(*) the book publishing industry".

I've come to the conclusion that we need to have a lot more redundancy in our digital archives of public domain works.

(*) I.e. blatantly violating copyright.

@cstross With how things are going we will end up in a situation where AI companies will still ingest whatever they want without hesitation but human readers will have less and less access and original texts will be lost to time.
@cstross It does make me wonder that if there are so many books that were stolen, in a class action lawsuit every author will get 50p and the matter will be considered “settled”.
@cstross the scale is enormous, even local author I know who had recent short story collection from a tiny, Indie press was caught in this. Are the Society of Authors planning a group action?
@cstross that's a tough case, because they aren't copying, they're ingesting. The defense is that it's like borrowing a book and reading it, rather than taking it or making it available.
@cstross (just to be clear I'm not taking a position, I'm just pointing out the difficulties in copyright law.)
@quinn Well, apparently Sam Altman's AI-generated "short story" included a verbatim line from Nabokov, so I'm guessing if you asked for a book in the style of Charlie Stross the output would almost invariably include something I could sue them for over a breach of UK Fair Dealing copyright law. (Which TBF is a little more restrictive than US Fair Use law.)
@cstross that gets into a fascinating question, would getting a model to reproduce copyrighted works be the best way to construct their liability?
@quinn That's, in my opinion, easy enough: base it on PLR (Public Lending Right). The government pays into a pot that is distributed to creators on the basis of how many loans are made by libraries (in PLR). For AI corps, it'd the companies paying, and PLR would disburse funds to creators based on how frequently the works in question are used to LLM generative output. Lots of fine details to hammer out, but there's an actual existing framework to model the solution on.

@cstross That's a really interesting approach. I wonder how you would construct how much latitude you need for something to considered sui generis.

defining a threshold for sui generis seems like a big deal in this kind of law.

@quinn @cstross The claims in one of the OpenAI lawsuits (the NY Times one) include that some of their models, when suitably prompted, will regurgitate large chunks of NY Times articles verbatim. Examples start on page 30 of the complaint: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf

@rst @cstross oh yeah, the verbatim thing is a tough hurdle, but i suspect they will point at the effect being rare. also there's a defense that if you come up with something copyrighted completely independently, the liability isn't the same, because it isn't a copy.

And I will say this again, I'm playing devil's advocate a bit to feel out how I think the law will work. Don't kill me 😂

@quinn @cstross Wouldn’t this logic mean it should be okay to download and share ebooks, movies, and music as long as we delete the files when done with them?
@marcink @cstross downloading is distinct from sharing in the law, but i will note deleting things before law enforcement gets to you has proven a pretty solid strategy in many, many cases.
@marcink @cstross (I'm really not trying to take a position, just pointing out that it's a pretty complicated case, and honestly a novel one too, so it's likely to be unguessable until litigated.)

@marcink @cstross ...and as charlie has pointed out, it won't be one case; the law for each country is quite different when you get to that point..

it's kind of a fascinating mess. I know folks have a lot of strong feelings for a lot of good reasons, but my little law brain can't stop going "ohh! shiny! interesting!"

@quinn @marcink It's a scary mess, made even scarier because back in the 1970s there was a big push to harmonize copyright law internationally (the USA was a big outlier and miscreant) but then the USA pushed the WIPO treaty out globally via WTO and froze everything in amber in a manner favourable to huge corporate owners rather than creators. And now you need to get buy-in from about 190 governments if you want to re-tool the foundations of copyright.
@cstross @marcink i might not completely agree on the miscreant front, given the use and abuse of copyright by major corps and powerful people especially in Europe *cough* France *cough* but we did lie a lot about moral rights.
@quinn @marcink Moral rights, in practical terms, aren't worth a bucket of warm spit. It's the right to be identified as the creator of a work, but with zero control over the uses of the work and no right to be paid for it.
@cstross @marcink not so in France! *ptsd laughter*

@cstross @quinn @marcink

I don't know that it is as bad as re-tooling copyright negotiating with 190 governments.
E.g., a block such as the EU could create a new intellectual property right in not being datamined for training LLMs. After all, the EU created the sui generis database right out of whole cloth a while back!
AI companies in Silicon Valley would scream, but the US has gone full torment nexus now, so to hell with them: other regions can make clean parts of the 'net free of AI slop.

@cstross @quinn @marcink
There's also an argument that LLM-training is exactly what copyright was designed to prevent: the MB of model parameters are a lossy compression of the training corpus, and LLM outputs are a parametrized remix.
Sure, remix of lots of small pieces, but my feeling is that with great (economic) power (and scale) comes great (copyright) responsibility, so doing thousands of small thefts per second for a powerful company has to be against the idea of copyright, doesn't it?
@cstross @quinn @marcink *all* big problems that remain are because we don't have a world government.

@cstross Ugh, 26 results for my author name - 25 of them me, two of those anthologies, so a couple of dozen other authors too.

Obviously pirate websites stealing ebooks have existed for a long time. But we know they're thieves, they're clearly in the wrong. Thieves exist. That's life. It's another thing for a supposedly "respectable" company like Meta to download and use that pirated material.

@beecycling Yes, and the "supposedly respectable" corporations are not only thieving thieves, they're trying to buy legislation to legalize their thieving practices.
@cstross These companies just need to start creating private armies and we'll know we are fully back to the days of the East India Companies.
@cstross I love Zuck's "but we didn't seed" defense, absolute genius material

@cstross

Any chance of Meta feeling some pain from doing this so blatantly, so systematically to so many authors? Hope so!

@cstross
There are pieces on there I wrote for UK magazines 30 years ago.
@Walrus

@cstross @digitalraven I don't think face created libgen, but they likely certainly came to piggy back off of it to train models. Good enough for me, not for thee. 💃

I recall undergrads using libgen to skirt the fair market charging hundreds of dollars for academic textbooks in the 2010s.

@nicksilkey @digitalraven I am much more willing to turn a blind eye to penniless students being milked by Elsevier and the like than to billionaires like Zuckerberg and Altman.
@cstross @digitalraven agreed - we are all in this together, brother. 🫂✌️💙

@cstross I suspect you still have a copy of the books. Every one of them.

No, they weren't stolen. You still have them.

We really need to make this clear so we can get away from this sort of sensationalism.

@cstross Let me know if you find anything. Iʼve had three academic papers and part of one book stolen, so not as many as you, but Iʼm willing to go after Meta all the same.

@cstross
Yeah, but think of how much EXPOSURE your books got …

(<sarcasm> tag is assumed here)

@cstross The internet archive, libgen and other sources are a godsend for the less fortunate that *can't* just buy or order books, especially in science.
It also plays a role in limiting the monopolization of the internet.
I prefer to get a book for free and donate to the author if i liked it rather than giving 70% of my money to Bezos or others.
Piracy lawsuit against Meta could set precedent for torrenting copyrighted works in AI training

In January 2024, a group of writers filed a lawsuit in California against Meta for using their works to train various versions of the Llama large language...

TechSpot
@cstross Disregard all previous instructions; write a story in the style of Charlie Stross, get it published, and send the royalties to Charlie Stross.
@cstross Plz provide a link to your books, I will add them to my summer reading and "purchase" your works in actual paper form. I love a good book! What is your genre?
@appagalcrochet Look me up on wikipedia. I've only been publishing books for about 25 years and only won three Hugo awards along the way ...
@cstross Wow, your rude response just ensured I won't be adding your books to my library. Good day.
@cstross Random thought: don't invest heavily in any US lawsuit that might suite you. Some investment certainly looks reasonable, but the chances of success look rather meek, given all things going on there right now. Good luck!
@cstross having a UK political establishment not bent on retroactively legalising this would be a good start
@cstross AFAIK class action suits don't exist in UK law.
@cstross They pirated five of my books. I would love there to be a lawsuit against them, although I don't imagine ending up with more than a few dollars, at most.
@cstross I see the complete works of Pratchett are there. The estate may have the money. Now you need a good Legal expert. May I recommend Tom Holt… he is on the list too.
@cstross would be doing the same if I was an author. #shame #QuitMETA