I've just discovered that Meta illegally used one of my books ("I Can't Believe It's Not Buddha") to train its AI.

They are paraphrasing (uncredited) the contents of my book to make money.

I am not okay with this.

Sarah Silverman and others have started lawsuits. I hope some of the less rich among us can team up and do likewise.

h/t @petergleick

https://www.theatlantic.com/technology/archive/2023/09/books3-database-generative-ai-training-copyright-infringement/675363/?gift=QdM39RGtR94-pmclr4oVwi9lpuV63yswBnoweowTTIM&utm_source=copy-link&utm_medium=social&utm_campaign=share&s=03

These 183,000 Books Are Fueling the Biggest Fight in Publishing and Tech

Use our new search tool to see which authors have been used to train the machines.

The Atlantic
@bodhipaksa @[email protected] I have a horrible fear that it's not illegal, because this time it suits the rich and powerful.
@alastair Copyright law is fairly clear cut, as I understand it.
@bodhipaksa @petergleick artists and writers need to team up and bring the mother of all class action lawsuits. A result of anything less than paid residuals for every single work sampled to every single creator whose work has been stolen would be optimal.
@bodhipaksa @petergleick wish I could edit on the mobile app; that starts off one thought and ends another and they're not entirely related. >.@
@solient @petergleick There are better apps than the official Mastodon one, which lacks functionality. I recommend Ice Cubes if you're on iPhone. For Android, I hear Tusky is good.

@bodhipaksa @petergleick I've heard decent things about Tusky; Ice Cubes sounds new to me. 🤔

(the official app is definitely lacking, but what it does have mostly works, at least)

@solient @petergleick Well, the things that work work. But the things that are missing are missing. Such as the ability to see alt text, which is necessary if you don't want to boost images lacking alt text into the feed of people with visual impairments — which is the Mastodon equivalent of tripping up a blind person you see walking down the street.
@bodhipaksa @petergleick *entering* alt text isn't that great, either.
@solient
@bodhipaksa
If you do have Android, give Fedilab a look, it can *translate* alt text too, & a whole basket of other useful stuff.
@solient @petergleick In another post I share a link to an article about a class-action lawsuit. The article contains contact details of the law firm handling the suit. I'll be writing to them later today. First, breakfast!
@bodhipaksa @petergleick Sounds like a class action begging to happen.

@bodhipaksa

I'm in there too. I'm already concerned by the fact that contracts typically offer a flat fee rather than a share of subscription revenue. This isn't going to help matters.

CLARIFICATION - I poorly phrased this -- I was referring to subscription services which give electronic access to a publisher's catalogue -- not the standard royalty on hard copy / e-book sales. Also I don't know if the flat fee is typical. It is the case for me.

@inflatableink I'm sorry that your book has also been pirated. I'm not sure what you mean by this, though: book contracts usually involve royalties for every copy sold, with a flat-fee advance on those royalties.
@bodhipaksa Yes, I get an advance and royalties as you describe. However, the publisher also has a netflix-style subscription service. For this service, readers pay a monthly fee and have access to a wide catalogue of books. I only get a flat fee for this. I don't know what proportion of sales go through this model -- but, unless it is renegotiated, as it grows it will eat into royalties.

@inflatableink Interesting. I've never heard of an arrangement like that.

I have wondered how Amazon's "all you can read" program works. Presumably authors get royalties, but I'm not confident in Amazon's ethics.

@bodhipaksa I wonder about Amazon too. I guess it's a similar model. In my sector -- tech/academic -- it's increasingly common. I think O'Reilly pioneered it (or at least adopted it early). One of the issues in the actors' strike seems to concern the transparency of the streaming services WRT the popularity of their shows. Without knowing how many views a show gets it's hard to negotiate a fair remuneration. I wonder if authors will face a similar issue in the medium term.
@bodhipaksa @petergleick Is it actually illegal? I don’t know if copyright extends to AI learning models, at this point?

@dubiago @petergleick Is taking a pirated copy of a book, feeding it into a computer, and then paraphrasing the contents of the book in order to make a profit illegal? Yes. Yes, it is.

Copyright law existed before LLM's were developed. New technologies aren't magically exempt from those laws just because they're new.

@bodhipaksa @petergleick Was it pirated? Or was it legally purchased?
@dubiago @petergleick It was pirated, along with many others, hence all the lawsuits OpenAI is facing.
@dubiago @petergleick At a certain point one has to lose patience and say, "RTFA."
@bodhipaksa @petergleick Yes. I did. There is a *claim* of piracy. But, we don’t know the source of these books. The article doesn’t specify—my guess is that the author doesn’t know. Ambiguity and assumption launched from that ambiguity is always fraught with danger…not that people with pitchforks care…

@dubiago @petergleick As the article says: "The data set, known as “Books3,” was based on a collection of pirated ebooks."

One of my books is in that collection, without my permission. No author whose work is in it gave permission for their book to be used in this way. Which is why there are now multiple lawsuits against OpenAI.

@bodhipaksa @petergleick Pirated according to who? Can the owner of the data set furnish legitimate purchase receipts? Was that question even asked?
@dubiago @petergleick Even if at some point the books were legitimately purchased, by being passed on to a third party they become pirated by definition.
@bodhipaksa @petergleick That’s for the courts to determine
@dubiago @petergleick Oh, boy. You are such a waste of time. Blocked.
@bodhipaksa @petergleick apologies if this is a dumb question but, what if the AI got that summary from say, the ingestion of a bunch of Reddit comments discussing your book but not actually your book itself?
@kobra_ @petergleick It's not a dumb question, but it's one you wouldn't have to ask if you read even the first paragraph of the article I linked to 🙂

@bodhipaksa @petergleick gotcha, I obviously don’t care enough to do that. Thanks for responding though, have a good day.

Edit: actually I DID go back and click the article and still didn’t see the information I asked about. I truly don’t care anymore though, no longer interested in learning more about this situation.

@bodhipaksa @petergleick 2 questions:

Are you in the US?

If so, is legal insurance a thing there?

@monsoonrains @petergleick I am in the US, although I had never heard of legal insurance.
@bodhipaksa @petergleick It's a thing in a lot of Euro countries... might be worth looking into there.
@monsoonrains @petergleick I imagine it's a bit late since the plagiarism has already happened. I recently joined the Authors' Guild, and I'll be interested to hear what advice they give.