@mutanthumb

Y, #ApacheTika will extract what the PDF alleges it is.

These are some of the fields that I'll focus on in the #digipresBakeoff #ipres2025 #ipresBakeOff

These include pdf/a and pdf/x. hasMarkedContent suggests PDF/UA.