Mastodawn

Jason B.Dec 13, 2022

#Linux and #Python community, anyone aware of an open source library that works for opening an existing PDF file, and re-saving it with some level of compression?

Finding a lot of expensive, licensed options, and options that have problematic dependencies.

Has to work within a lambda/azure function type environment, i.e. all libraries needed can be pulled in via requirements.txt.

Show thread

codetholdory

Dec 13, 2022

@neogodless I've used PyPDF2 previously to open and read PDF, it also saves back to PDF, not sure about compression options.

https://github.com/py-pdf/PyPDF2

GitHub - py-pdf/PyPDF2: A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files - GitHub - py-pdf/PyPDF2: A pure-python PDF library capable of splitting, merging, croppin...

GitHub

Show thread

codetholdory

Dec 13, 2022

@neogodless this documentation deals with PDF size reduction.

https://pypdf2.readthedocs.io/en/latest/user/file-size.html

Reduce PDF Size — PyPDF2 documentation

Show thread

Jason B.

@codetholdory Sadly, there were a few issues. #PyPDF4 isn't really ready - no documentation (though maybe in theory it's still identical to PyPDF2). And #PyPDF2 documentation is all outdated! All of the methods were renamed from snake case to camel case, and some given completely new names. Managed to figure out all that and reproduce the compression example, but it actually results in a slightly larger file instead!

#python #pdf #compression

Show thread

codetholdory

Dec 14, 2022

@neogodless that's a bit unfortunate, let me know if you find another library, I do quite a lot of reading from pdf, and writing PDF from HTML.

Could you have a look in this library and see what it uses.

https://pypi.org/project/pdfkit/

pdfkit

Wkhtmltopdf python wrapper to convert html to pdf using the webkit rendering engine and qt

PyPI