"PDFs putzen": @jplie hat in netbib einen linktipp dazu:
https://netbib.hypotheses.org/78649526
"Viele #PDFs haben zuwenig Metadaten, könnten mehr gebrauchen. Jedoch gibt es auch das Gegenteil, wenn sie zuviel haben und man sie gern putzen möchte, damit sie nicht unwillkürlich zuviel verraten. Wie zum Beispiel beim Journalismus und hier insbesondere für den Quellenschutz. Die 58. Ausgabe des Online-Recherche Newsletters stellt mehrere Tools vor, um das zu erledigen."
PDFs putzen

Viele PDFs haben zuwenig Metadaten, könnten mehr gebrauchen. Jedoch gibt es auch das Gegenteil, wenn sie zuviel haben und man sie gern putzen möchte, damit sie nicht unwillkürlich zuviel verraten. Wie zum Beispiel beim Journalismus und hier insbesondere für den Quellenschutz. Die 58. Ausgabe des Online-Recherche Newsletters stellt mehrere Tools vor, um das zu erledigen. … „PDFs putzen“ weiterlesen

netbib
yfedoseev/pdf_oxide: The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

"The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0."

https://github.com/yfedoseev/pdf_oxide

#documents #pdfs #python #rust
GitHub - yfedoseev/pdf_oxide: The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830 PDFs. MIT/Apache-2.0.

The fastest PDF library for Python and Rust. Text extraction, image extraction, markdown conversion, PDF creation & editing. 0.8ms mean, 5× faster than industry leaders, 100% pass rate on 3,830...

GitHub
@BertrandCaron @anj @dsalo I’m getting back to my pile of #pdfs and hope to have some files that represent PDF-HUL errors to share on the #JHOVE wiki.
🍔 Welcome to the Library of Congress' attempt to digitize your greasy spoon nostalgia with all the charm of a government form. 📚 Why settle for syrup-soaked memories when you can drown in a sea of ISSNs and acronyms? 🥞 Because nothing says "Classic American Diner" like a downloadable PDF! 🤦‍♂️
https://blogs.loc.gov/picturethis/2026/04/the-classic-american-diner/ #LibraryOfCongress #DigitalNostalgia #AmericanDiner #FoodHistory #PDFs #GreasySpoon #HackerNews #ngated
The Classic American Diner | Picture This

Diners have long been a unique part of the American restaurant scene, and the Prints & Photographs Division collections are full of colorful visual representations of the genre. Explore some examples with us in this blog post.

The Library of Congress
🐢💤 Oh wow, hold the phone everyone! The #GNU #libc #atanh is now "correctly rounded"—a riveting #update surely to shake the very foundations of #digital #arithmetic. 📈😴 Meanwhile, the entire planet remains blissfully unaware, as another groundbreaking #breakthrough is drowned in a sea of #jargon and #PDFs. 📚🔍
https://inria.hal.science/hal-05591661 #HackerNews #ngated
The GNU libc atanh is correctly rounded

<div><p>We prove the binary64 hyperbolic arc-tangent function from GNU libc 2.43, released end of January 2026, is correctly rounded.</p></div>

boss: we want to train a ServiceBot on our #documentation

me: here you go <gives #PDFs>

boss: ServiceBot says these PDFs are too big

me: isn't parsing vast amounts of #data to answer questions and distill salient takeaways supposed to be a key feature of these #LLM bots?

boss:

me: ;)

boss: >:(

me: =D

Veil – Dark mode PDFs without destroying images, runs in the browser

https://veil.simoneamico.com/

#HackerNews #Veil #DarkMode #PDFs #BrowserImages #PDFTools

veil - Dark mode PDFs without destroying your images

Dark mode PDFs without destroying your images. Open source, no AI, 100% Private.

Tip of the day: Smart groups are a good way to view items matching specific criteria, like all the flagged #PDFs in your #DEVONthink database. While it’s easy enough to make a smart group, if you find yourself doing the same search over and over, you can use it to create a smart group in a few clicks. #pkm #productivity #tipoftheday #workflow https://www.devontechnologies.com/blog/20240102-reuse-search-devonthink-smartgroup
Tip of the day: When it comes to #PDF documents, what you see and what is actually there can be two different things. Just because you can read words in the document doesn’t mean it is searchable. Or perhaps you have a searchable #PDF but it’s still not found in a search in #DEVONthink. Here are a few ways to deal with such #PDFs. #macos #paperless #pkm #productivity #tipoftheday https://www.devontechnologies.com/blog/20230418-pdf-searchable