Mastodawn

John Carlos Baez

Hey, Anthropic owes me $9000! They illegally used at least 3 of my books on LibGen to create Claude. Now they're paying a $1.5 billion settlement, at $3000 per book. See if *your* books are on the list:

https://www.anthropiccopyrightsettlement.com/

If so, you have until March 23, 2026 to file a claim. The above website lets you file a claim, but this one explains everything more clearly:

https://authorsguild.org/advocacy/artificial-intelligence/what-authors-need-to-know-about-the-anthropic-settlement/#next-steps

Actually I exaggerated: the payment will be split between authors and publishers, but I have to make the claim - so the settlement is making me do some work my publisher should be doing for me. My coauthors and I will just get half, $4500. One of these books has 2 coauthors, one has 3, and one is a book I edited, with essays by lots of authors. So $1000 is a more realistic estimate of what I get. Oh well.

Bizarrely, my most popular book, Gauge Fields, Knots and Gravity, is not on the list. But I guess it's not surprising:

"The settlement agreement discloses that approximately 500,000 titles out of the 7 million copies of books that Anthropic reportedly downloaded from LibGen and PiLiMi meet the definition required to be part of the class."

Only books whose copyright is registered with the US Library of Congress meet that defiinition!

If you have a book on the list, you can opt out of the current settlement and join future lawsuits. But you have to take action to do that!!! For more information on that, see item 40 here:

https://www.anthropiccopyrightsettlement.com/faq

Homepage | Bartz v Anthropic Settlement Site

DougMerritt (log😅 = 💧log😄)Oct 28

@johncarlosbaez
> Bizarrely, my most popular book, Gauge Fields, Knots and Gravity, is not on the list.

Clearly you should raise its price to the general public to $1000 to $3000, since that's now the going rate for your books.

John Carlos Baez Oct 28

@dougmerritt - I suspect most readers get my books free from LibGen, just like Anthropic did. And I'm fine with that, since I didn't write books to make money: I just felt I needed a publisher to distribute them. My two most recent books, I just give away for free.

Except to people named Claude.

ianthe's inferno 🏴🇧🇿⚽️Oct 28

@johncarlosbaez I love LibGen because when I was doing obscure theatre research there were some out of print books that the library didn't have that I just needed a cursory glance at

it's nice to see someone advocating for authors getting their due vs Anthropic without also saying LibGen should be shut down

foldworks Oct 28

@johncarlosbaez To qualify, the downloaded works need to have been registered with the US Copyright Office before being downloaded.

Unsurprisingly, many authors outside the US did not register their US copyright as registration is not needed in other countries (or their publisher could/should have registered the copyright but didn't). It costs about 50 USD per work.

Copyright exists as soon as the work is created, but US registration is needed for US statutory damages. Without registration, there is a higher bar to receive statutory damages. (AFAIK, only the US requires copyright registration like this).

As said elsewhere, this case is about Anthropic downloading from a shadow library -- training AI with copyrighted works appears to be fair use, so far.

US copyright history would make an interesting (but different) post.

(edited for clarity)
#copyright #Anthropic

Chris Fox Oct 29

@foldworks @johncarlosbaez As I mentioned in another post, copyright holders who are not US citizens shouldn't need to register to have their rights protected in the US since the US signed up to the Berne Convention. If the more restrictive terms are court-approved, then that's a breach.

foldworks Oct 29

@foxcj @johncarlosbaez

It looks like the judge and lawyers for the class Action and Anthropic wanted an expedient solution. Some expect the class action lawyers to take 25% of the 1.5bn USD compensation.

I can pursue my own case for my work that was downloaded, but they know that's unrealistic ☹️

From the FAQ https://www.anthropiccopyrightsettlement.com/faq

"Even though Anthropic downloaded approximately 7 million files from LibGen and PiLiMi, many of those files were duplicates of works, or unregistered works, or were empty, corrupted, or incomplete files.

About 40% of the files Anthropic downloaded (approximately 3 million) were duplicates. Duplicate copies are not eligible to be in the Works List.

Likewise, works not validly registered with the U.S. Copyright Office are not eligible to be in the Works List. Many of the works Anthropic downloaded were not registered. For example, non-English works have very low registration rates. Of the approximately 4 million unique works that Anthropic downloaded, around 2.5 million were written in languages other than English. Most of those 2.5 million non-English works were unregistered. In addition, many of the 1.5 million unique English works were not registered. And even among registered works, many failed to satisfy the date criteria in the Class definition, because registration occurred more than five years after publication or after Anthropic downloaded such works.

Finally, many works failed validation tests. For example, the Works List does not include files that were empty, corrupted, or incomplete."

#copyright #Anthropic

Chris Fox Oct 30

@foldworks @johncarlosbaez FWIW, the database seems to be flawed: some of my work is listed as eligible for compensation, but neither using my Library of Congress registered full-name, nor assigned ISBNs, retrieved it; I had to resort to searching by title. I also suspect other works of mine contained in the original sources may have been excluded for stupid reasons (e.g. some trivial difference in file sizes? Who knows...)

Jeffrey Harlan Oct 28

@johncarlosbaez

The definition is books that had a copyright registered with the US Copyright Office prior to the infringement, because the US Supreme Court set a precedent a while back (but within the last couple decades IIRC) that filing was a prerequisite for standing to sue for damages. I had one book that was part of the 7 million, but its copyright wasn't filed yet at that time, so I'm SOL. Whereas my books that were filed prior were not on the original list.

Chris Fox Oct 29

@Harlander @johncarlosbaez I understood that the conventional legal theory was that only US citizens have to register US copyright to have standing in the US, but that non-US citizens don't, as they automatically have standing under the Berne Convention. If this has changed, and rights under the Berne Convention are being ignored, then it would appear to be a breach of international treaty obligations (assuming those are still a thing).

Jeffrey Harlan Oct 30

@foxcj
@johncarlosbaez

I'm not clear on the details; it may just be for the purposes of the settlement that the class only includes US-registered authors. Authors outside the US may be able to file a different suit in their jurisdiction. But I'm not a lawyer.

Michael Hartle Oct 28

@johncarlosbaez Is this a compensation for them using your work illegally, barring them from further use and keeping you in control, or a compensation that "retroactively licenses" usage of your work in Anthropics current and future LLMs?

John Carlos Baez Oct 28

@mhartle - I read this:

"Q: Does the settlement mean that now Anthropic can continue to use the pirated books to train AI?

A: No. The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession."

But I wonder if Anthropic will somehow remove the pirated training information from Claude, or bulld new LLMs based on this information. Perhaps only future lawsuits will clarify this.

Authors are free to opt out of the current settlement and join future lawsuits. But apparently opting out requires taking action on the website I listed, before March 23, 2026!!!

https://authorsguild.org/advocacy/artificial-intelligence/anthropic-settlement-faq/

Michael Hartle Oct 28

@johncarlosbaez So this is a settlement mechanism requiring you to do stuff the publisher should do, share compensation with the publisher, would prevent you from joining future lawsuits and becoming binding by default on inaction?

Given how some people and companies got sued into oblivion on digital piracy for a lot less, this sounds like quite a bargain for Anthropic.

Charlie Stross Oct 28

@mhartle @johncarlosbaez It is *totally* a bargain for Anthropic. (But: AIUI the money for the pay-out is already in escrow. And I expect Anthropic and OpenAI to be in *serious* financial trouble within 6-18 months as they have no viable route to profitability and they're burning cash like it's rocket fuel and they're a moonshot. Which is why I'm taking the money on the table rather than joining another class action lawsuit: I have no confidence there'll be any more money later.)

Russell Phillips Oct 28

@cstross @mhartle @johncarlosbaez I hadn't realised that the money is already in escrow. That's excellent news - I've submitted claims for my two eligible books, but I'd half expected them to go bust before the payout and thus avoid paying.

I don't think it's enough, but I also think that it's the best we'll get 🙄

Carnildo Oct 28

@mhartle @johncarlosbaez

Neither. This is compensation for the act of downloading your work from Library Genesis. In the future, if one of the "training is not fair use" lawsuits is ruled in favor of authors, you'll be able to get another payout.

(I don't expect that to happen except in cases where an LLM can be prompted to regurgitate the training data verbatim, such as New York Times v Microsoft.)

Charlie Stross Oct 28

@johncarlosbaez @gnoll110 in my case, it's 29 books. The truncated settlement cash—minus publisher's cut—if it arrives is in the same order of magnitude as five years' backlist royalty payments (not counting the current year's book).

I want those fuckers to do serious jail time for theft pour encourager les autres, not pay a fine and skate.

ma𝕏pool Oct 28

@johncarlosbaez

Suggestion: I think you should consider putting your website and blog behind Cloudflare if they are not already, and use it to block AI crawlers or use their pay-per-crawl service https://blog.cloudflare.com/introducing-pay-per-crawl/

block: https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/

pay per crawl beta https://www.cloudflare.com/paypercrawl-signup/

Same for mathstodon.xyz btw. @christianp

Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

Pay per crawl is a new feature to allow content creators to charge AI crawlers for access to their content.

The Cloudflare Blog

John Carlos Baez Oct 28

@maxpool - I'm actually trying to maximize distribution of my personal essence before I die. For example: everyone please take copies of 2616 pages of my writings on math and physics here:

https://math.ucr.edu/home/baez/TWF.html

But that's just me: I've *decided* to give away my work for free. I'm not in favor of AI companies, or for that matter publishing companies, exploiting the work of authors against their will.

TWF

ma𝕏pool Oct 28

@johncarlosbaez

In that case, I recommend that you state your intention clearly with a Creative Commons (CC) license you prefer and attach it to your work. https://creativecommons.org/share-your-work/cclicenses/

Just because something is available does not mean that it can be used in good conscience when you don't give away any rights. A layman's ad hoc statement usually leaves something out and makes it less useful.

The CC0 Public Domain Dedication is the broadest and ensures the widest use.https://creativecommons.org/publicdomain/zero/1.0/legalcode.txt
If you want attribution, maybe CC BY 4.0

About CC Licenses - Creative Commons

Creative Commons licenses give everyone from individual creators to large institutions a standardized way to grant the public permission to use their creative work under copyright law. From the reuser’s perspective, the presence of a Creative Commons license on a copyrighted work answers the question, What can I do with this work? The CC License…

Creative Commons

John Carlos Baez Oct 28

@maxpool - so far I have been too lazy to do Creative Commons licenses for all my output. I should really hire someone to do it.

ma𝕏pool Oct 28

@johncarlosbaez

Wordpress has a guide how to add Creative Commons license to your page: https://wordpress.com/support/creative-commons/ If you add it into a proper template, it's done at once.

Add a Creative Commons license

A Creative Commons license allows you to specify to your readers what they can and cannot do with your blog or website content. This guide will show you how to create this type of license for your …

WordPress.com Support

John Carlos Baez Oct 28

@maxpool - this sounds like it works for webpages and posts run by Wordpress. I have tons of my pages on my UC Riverside website, and also many files, like papers. My webpages are in crude HTML, not CSS.

Alas, I don't want to think about this stuff much: my desire to do new math and physics always trumps my desire to faff around with software.

I can, however, afford to pay someone to do this stuff. I need to find someone trustworthy, intelligent yet not too expensive.

dmi 💽 Oct 28

@maxpool @johncarlosbaez @christianp please don’t centralize the internet any more than it already is. We have different solutions to this problem, all of which are much better than “putting your shit behind cloudflare”.

Bodhipaksa Oct 28

@johncarlosbaez They pirated three of my books, but I won't get a penny because my publishers didn't officially register copyright with the Library of Congress. Even if I wasn't in that situation I would think it was absurd that copyright registration was required before authors could be compensated. Authors hold copyright over their books, period. That should mean something.

John Carlos Baez Oct 28

@bodhipaksa - yes it should, but we live in a world where the rich and powerful run the show.... for now.

Tim Ward ⭐🇪🇺🔶 #FBPE Oct 28

@bodhipaksa @johncarlosbaez That depends where you live, apparently. In the UK you have copyright by virtue of having written something, you don't need to register it.

John Carlos Baez Oct 28

@TimWardCam - Also in the US you have copyright just by virtue of having written something - but the settlement in the case against Anthropic requires that the copyright for your book be registered with the Library of Congress, if you want to get $3000.

Even if your book is *on* the list, you can opt out of the settlement and join some other lawsuit of Anthropic. And if your book is *not* on the list, you certainly should.

MidgePhoto Oct 29

@bodhipaksa @johncarlosbaez

I suspect that other remedies may exist, including action in a jurisdiction other than the USA.

That would be a separate case, best tackled by a coalition, I'd suppose.

IANAL, and I don't know (yet) if the few copyrights I have have been infringed.

#copyright #conspiracy #money

@johncarlosbaez I checked to see if my dissertation was there, but it wasn't. It would have been nice to know that it was read by more than just my committee, even if it was a machine. 🙂

John Carlos Baez Oct 28

@zornslemmon - if it wasn't copyrighted in the US, it won't be on that list, even if Anthropic's Claude was trained on it. Less than 8% of the books Anthropic got from LibGen are on that list.

Is your dissertation on LibGen? What's it about?

@johncarlosbaez My comment was partly tongue-in-cheek, but I did actually check the link. My dissertation was about galactic cosmic ray electrons. It was written in the mid-nineties and I'm pretty certain it doesn't exist in any digital format I generated, which would have been PostScript. I vaguely recall having to provide proof-ready paper copies that were sent off to a company (UMI Microform) to be converted to microfiche.

I can find it online, however. There is a company, ProQuest, that has a digital copy posted online that was made from scanning the UMI microfiche. It's a direct scan, so it not only has my copyright on the title page, but it also has a copyright statement from UMI saying "this version was created with permission of the copyright holder."

John Carlos Baez Oct 29

@zornslemmon - galactic cosmic ray electrons are pretty intersting to me, though I don't know much about them.

I think it's good to have an electronic copy of your thesis. This is the kind of writing that really brings back memories (good and/or bad). I *don't* have my thesis in electronic form, only a printout. I should scan it and put it on my website!

Graham Downs Oct 28

@johncarlosbaez Bah. Unfortunately, none of my books are there. Guess I'm not famous enough. ;-)

John Carlos Baez Oct 28

@GrahamDowns - or copyright of your books weren't registered with the US Library of Congress.

Are your books on LibGen?

@johncarlosbaez does this settlement allow them to continue using the data they illegally scraped?

John Carlos Baez Oct 28

@dagi3d - I read

"The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession."

However, I don't think Anthropic is promising to destroy Claude's knowledge gained from these books! I imagine some issues will only be sorted out by future lawsuits.

Some details here:

https://authorsguild.org/advocacy/artificial-intelligence/what-authors-need-to-know-about-the-anthropic-settlement/

wirepair Oct 28

@johncarlosbaez holy shit thank you for mentioning this, they stole 7 of my mothers books, she's gonna get paiiiiidddd

JayMoore Oct 28

@johncarlosbaez I check a month or two ago before the settlement and I noticed several of my scientific papers are in the list of things they used too. Will check into this further.

Toni Aittoniemi Oct 28

@johncarlosbaez What? 3000?? That’s nothing. They’ll make 1000x that for it.

This is not justice, this is cost of doing business.

What a scam!

TripTilt /// tt Oct 28

@johncarlosbaez
will they from now on leave those works OUT of their model? Or is this some late licensing... on their term... with no say in if they are allowed to use it at all?

John Carlos Baez Oct 28

@TripTilt - I read this:

"Q: Does the settlement mean that now Anthropic can continue to use the pirated books to train AI?

A: No. The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession."

But I wonder if Anthropic will somehow remove the pirated training information from Claude, or bulld new LLMs based on this information. Perhaps only future lawsuits will clarify this.

Authors are free to opt out of the current settlement and join future lawsuits. But apparently opting out requires taking action on the website I listed, before March 23, 2026!!!

https://authorsguild.org/advocacy/artificial-intelligence/anthropic-settlement-faq/

@johncarlosbaez The thing that bothers me here is, if I were on the list, by accepting the $3000 would I agree to my material continuing to be included in the Anthropic data set?

Greg Egan Oct 28

@mcc @johncarlosbaez

https://authorsguild.org/advocacy/artificial-intelligence/anthropic-settlement-faq/

“Does the settlement mean that Anthropic can now continue to use pirated books to train AI?

No. The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession.”

@gregeganSF @johncarlosbaez Thanks

John Carlos Baez Oct 28

@mcc - Long time no see! I read this:

"Q: Does the settlement mean that now Anthropic can continue to use the pirated books to train AI?

A: No. The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession."

But I wonder if Anthropic will somehow remove the pirated training information from Claude, or build new LLMs based on this information. Perhaps only future lawsuits will clarify this.

Authors are free to opt out of the current settlement and join future lawsuits. But apparently opting out requires taking action on the website I listed, before March 23, 2026!!!

https://authorsguild.org/advocacy/artificial-intelligence/anthropic-settlement-faq/

jorendorff Oct 28

@mcc @johncarlosbaez Yes, you give up your right to sue Anthropic over these claims. This is bothersome, now I have to decide what I think about all this

Carnildo Oct 28

@mcc @johncarlosbaez

Yes and no. No, they can't continue to use the copy they pirated. But they can still use your book to train AI if they purchase a fresh copy: the lawsuits claiming that training does not constitute fair use are still working their way through the courts. (They can also do that if you refuse, since none of the "training is not fair use" lawsuits has been settled or even produced an injunction.)

@carnildo @mcc @johncarlosbaez The rulilng in this case was clear that (at least here) training *is* fair use. I sure hope that gets overturned somewhere else but the courts seem pretty well in the bag for the "AI industry" on this point. In fact it seems like Anthropic might have had more leeway if they could have said that all the pirated books *were* for training? One of the facts in the case relates to the fact that maybe they pirated a bunch of books and then *didn't* use them for training

@carnildo @mcc @johncarlosbaez as legally incoherent as "training on copyrighted inputs and then producing potentially identical derivative works to destroy the market power of the original authors is 'fair use'" is, "if you're an AI company, you can just use the pirate bay for whatever, who cares if it's training" would admittedly have been MORE of a batshit ruling, so, yay I guess

Carnildo Oct 30

@glyph @mcc @johncarlosbaez The actual ruling is rather nuanced. The key line is "Authors do not allege that any infringing copy of their works was or would ever be provided to users by the Claude service" -- that is, training is fair use *if it is done in a way that the resulting model cannot produce output that can be considered copyright infringement*. The authors did not allege the production of infringing works, so the court made no ruling either way on that.

@carnildo @mcc @johncarlosbaez ah, thank you for the correction

Wolfgang Lutz Oct 28

@dimsumthinking this might also apply to you

dimsumthinking Oct 28

@WLBORg thank you

Alavi | علوی Oct 28

@johncarlosbaez
Imagine how much they are making off of your books that they agreed to share a bread crumb of it with you for 9000.
Sad days we live in.

John Carlos Baez Oct 28

@alavi - sad days indeed. But I give away my books for free these days.

@johncarlosbaez : but by accepting the money, you validate their use of your book, right?

John Carlos Baez Oct 28

@ploum - I read this:

"Q: Does the settlement mean that now Anthropic can continue to use the pirated books to train AI?

A: No. The settlement does not give Anthropic—or any AI company—permission to use pirated books going forward. It only resolves Anthropic’s liability for past use of books. In fact, the agreement requires Anthropic to destroy all copies in its possession."

wonder whether Anthropic will somehow remove the pirated training information from Claude, or whether it will build new LLMs based on this information. Perhaps only future lawsuits will clarify this.

Authors are free to opt out of the current settlement and join future lawsuits. But apparently opting out requires taking action on the website I listed, before March 23, 2026!!!

https://authorsguild.org/advocacy/artificial-intelligence/anthropic-settlement-faq/

@johncarlosbaez @ploum so authors who can't take the settlement will get their content pulled from their dataset and the models destroyed?

Cuz to me it seems that it's an exception from the norm.

Also it won't help people outsitde the #USA at all...

https://infosec.space/@kkarhan/115457098510779725

Kevin Karhan :verified: (@[email protected])

@[email protected] so again like all #ClassAction|s it just is a way for offenders to simply *#pay and continue* and not *pay damages and undo the violation(s)* as would be the norm in amy decent juristiction… - Bonus points for the #US-only bs, since in many juristictions (i.e. #Germany) there is no *"Copyright Office"* as all works are *automatically copyrighted* at time of authoring *unless explicitly licensed permissively otherwise*! This is just like the *"#OtherOSsettlement"* with the #PS3: - Another case where it's clear that rich #corporations can do anything - and get away with it *if they can just throw #money after it*. Cuz IMHO not only should #Anthropic be firced to pay the settlement to *EVERYONE affected* but also be forced to pull the offending model(s) and access to those from the public & market!

Infosec.Space

John Carlos Baez Oct 29

@kkarhan @ploum

"so authors who can't take the settlement will get their content pulled from their dataset and the models destroyed?"

No. Authors who have a book on this list

https://www.anthropiccopyrightsettlement.com/

who fill out a form by January 7, 2026 saying they "opt out" of the settlement will continue to have the right to sue Anthropic regarding their use of that book.

"Also it won't help people outside the #USA at all..."

You don't have to be in the USA for any of this to apply. But outside the US you can sue Anthropic in your own country.

Homepage | Bartz v Anthropic Settlement Site