FYI it took roughly 341 hours to process 175K PDF files with #veraPDF #arlington I haven’t started looking at the results yet. #digipres
A week ago I started running (24/7) veraPDF Arlington on my pile of 175K PDFs and it has processed ~100K in that time. I’m hoping it’s finished by this time next week. #pdf #veraPDF #digipres
Out of these 175K+ files #veraPDF indicatesd that 85 were ‘invalid PDFs’. #JHOVE thinks that 31 of these 85 #pdfs are valid. Hmmm… #digipres #digitalpreservation

As #veraPDF feature reports may be very long, I created a basic XSL file to extract only elements identified by @bitsgalore in his post https://bitsgalore.org/2023/05/25/identification-of-pdf-preservation-risks-with-verapdf-and-jhove.html

Certainly not perfect, but it seems to work...

https://gist.github.com/BertrandCaron/a4e07f2e0c9f1db79cb265d112bc05ab

#digipres

Identification of PDF preservation risks with VeraPDF and JHOVE

The PDF format has several features that are potential preservation risks. This post reviews to what extent such features can be detected using VeraPDF and JHOVE.

bitsgalore.org

#veraPDF 1.28.2 is out now! This release sees updated dependencies with vulnerability warnings, updates to (PDF/UA-2, WTPDF) and fixed validation of Unicode ✍️🧑‍💻 Read more: https://openpreservation.org/news/verapdf-1-28-2-released/?q=3

As always, HUGE shout to the Dual Lab development team for all their efforts! #digipres

I don't care if #Acrobat thinks, something with an alt-text shouldn't be embedded in somthing else that also has an alt-text. Both #PAC and #VeraPDF are happy with the PDF/UA-1, and so am I... https://publishup.uni-potsdam.de/opus4-ubp/frontdoor/deliver/index/docId/67222/file/pardes30.pdf #TeXLaTeX #accessibility

Sort of a #PDF #doubleTrouble release day at the Open Preservation Foundation today, with new releases of both #veraPDF (which checks conformance to PDF/A and PDF/UA) and the #Arlington PDF Checker (which checks conformance to the Arlington PDF model):

https://openpreservation.org/news/verapdf-and-arlington-1-28-released/

Good stuff!

Hello @Georgia !

I wanted to use the online public demo #veraPDF #Arlington model at https://arlington.verapdf.org/ but it returns a 502 "Bad gateway" error...

🚀 Join @carl today at #iPRES2024 where he demonstrates #veraPDF's #Arlington, an invaluable resource for anyone developing or testing #PDF #DigitalPreservation tools.

📍 De Bijloke - Kraakhuis

Check the release notes here + our thanks go to all contributors, particularly the Dual Lab development team, for their work on this : https://openpreservation.org/news/arlington-pdf-model-checker-released/

Arlington PDF Model Checker released - Open Preservation Foundation

  The Open Preservation Foundation is pleased to announce that the Arlington PDF Model Checker powered by veraPDF has been released! Arlington is an invaluable...

Open Preservation Foundation

Just found out there's now a development prototype of veraPDF-rest, which exposes #VeraPDF's functionality through a REST API:

https://github.com/veraPDF/veraPDF-rest

Will need to try this out, but this definitely looks really useful!

This could also be good for developing performant VeraPDF wrappers in other programming languages, like Python (similar to how Tika-python currently wraps around #Apache #Tika's REST API).

GitHub - veraPDF/veraPDF-rest: veraPDF RESTful web services and clients.

veraPDF RESTful web services and clients. Contribute to veraPDF/veraPDF-rest development by creating an account on GitHub.

GitHub