Anthropic destroyed millions of print books to build its AI models
Company hired Google's book-scanning chief to cut up and digitize "all the books in the world."
https://arstechnica.com/ai/2025/06/anthropic-destroyed-millions-of-print-books-to-build-its-ai-models/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social
@arstechnica I am totally blind. When I scan print books, I often ruin them because I have to either press down on them if I use a flatbed scanner, or hold them open if I use a document scanner. Making books available online (provided they're in accessible formats), means that more won't have to be destroyed and those of us who must rely on screen readers and ocr won't have to spend hours scanning just so we can read the books.

@dandylover1 @arstechnica Have you contacted people who scan books on a large scale, like Carl Malamud's projects?

They have tools that seem to be able to do scanning without destroying the books. And they don't have a lot of money. They are friendly folks who are probably more than happy to share ideas, plans, and know sources for such machines.

@karlauerbach @dandylover1 @arstechnica Specifically, it's usually not one, but two scanners, and instead of scanners they're normal off the shelf digital cameras. Books simply rest comfortably on two plates of glass that meet at an angle, with each camera pointed at its own page, which means there is no need to destroy, or even carefully unbind anything. You just open the book as far as it was designed to be opened in the first place.
@TheRealPomax @karlauerbach @arstechnica I would love something like that. I have a Pearl camera, which has a wonderful stand and guide to help place the book, but it still just lies flat on the table, so if it's prone to closing or doesn't open all the way, I still have to hold it to ensure that everything scans properly.
@dandylover1 @karlauerbach @arstechnica there's a few "how to make one yourself" tutorials out there, but I'll be honest, if I had to make one myself I'd find a much handier friend to make one for me and then pay them in food and/or drinks =D
@karlauerbach @arstechnica No. Most of what I read is in the public domain, and the only books I scan are ones I own, so usually, a little warping to the spine is okay. It only annoys me if the book is an antique. However, this is truly a wonderful source, and I may ask if they can scan Male and Female Costume by Beau Brummell, so that it can be made accessible for all!

@dandylover1 @arstechnica sure but you are one person who only has access to a flatbed scanner.

Industrial scanners exist that can hold a book open at an angle and scan the page while in the book without damaging them. A company with billions in funding can afford that.

@indiealexh @arstechnica I agree about that. Usually, even I use my Pearl camera, which is far less fdamaging. But if these scanners are available to more than just libraries, museums, and other such institutions, they should definitely use them! Why destroy books if it's not necessary to do so?

@dandylover1 @arstechnica Contrary to quasi-religious belief it‘s absolutely a-ok to destroy a paperback that has many prints in the process of media transformation for accessibility.
It may even save some books from vanishing. I still have tons of obscure, old media that’ll never be made available electronically that I want to transform to save it.
Telling a blind person about destruction free book scanning of paperbacks is misguided.

(Still: make sure to feed your authors and poets!)

@chris @arstechnica Yes. There is a huge difference between scanning a modern copy of an old book and scanning the original! What drives me crazy is when modern paperbacks are copied from old ones, and instead of actually retyping or scanning and correcting them so that the new one is clean and ledgible, they just take a picture of it, so that the new hard copy has the same handwriting, fading,, discolouration, ripped pages, etc. as the original! It basically makes the book useless to me as an alternative to an online scanned copy in similar condition because my software would scan the printed copy as badly as the pdf!
@dandylover1 @arstechnica That‘s a real bizarre, low effort way of reprinting a book. I can understand that creating a real facsimile has a value but most current books just are a media to transport the content.
Some are beautiful works of art or craft of course, but most are not. (Which does not lessen the value of their content.)
@dandylover1 @arstechnica Yeah, but that's not what they did.