How many links are buried inside a large PDF — and where do they really go?
I extracted every URL from a 291-page Voron assembly manual, isolated shortlinks, resolved redirects, and built a TSV [tab-delimited] manifest with video duration + titles using:
pdfgrep
awk
curl
yt-dlp
A practical method for auditing technical PDFs and embedded media.
Full walk-through:
https://salemdata.net/johnpress/?p=523
#PDF #Linux #OpenSource #CommandLine #DataExtraction #UnixTools
#Documentation #DigitalPreservation

