Tutorial: Paperless-NGX in a LXC-Container on Proxmox - Part 1
Important notes:
- I am not a leading expert on any of the topics covered here. This is surely not the the "ultimate guide". If you stumble over things I could have done better, I gladly listen to your advice.
- This tutorial will come in multiple parts. I will link them together in the end and put them in a blog post.
Background
Paperless NGX ist an Open Source Software that allows document management. It allows you to organize your documents in an efficient way. You can find the software and a lot of documentation here: https://docs.paperless-ngx.com/
LXC is a system-container technology. It is an attempt to provide something more lightweight than a virtual machines but that still feels like one . They are intended that you login to them and manage them mostly like a Linux virtual machine. You can find a more in depth explanation here: https://linuxcontainers.org/
Proxmox VE is a virtualization environment like VMware. It allows you to run full fledged virtual machines and LXC containers. It has become my virtualization platform of choice. There is a free and a comercial version available here: https://www.proxmox.com/en/products/proxmox-virtual-environment/overview
Why Paperless-NGX?
In 2009 we had to take over the paperwork for elderly relatives they could no longer handle themselves. To make things even more complicated, my wife and I were living in different towns during the week due to work reasons. Also we had recently bought a house which also significantly increased joint paperwork. In order to work jointly on documents, we needed to have them in a digital format.
So I bought a scanner. Better said: I bought multiple scanners until I settled for a Brother ADS-1600W.
I don't want to go into too much depth, but there are two points really important in a scanner:
Luckily I am experienced with processes, so the best decision I made was to establish a process my wife and I both adhered to:
- Any document that arrrived in paper first received a serial number. For this I bought a pagination stamp which creates an automatic serial number on each stamping process. This was one of my best purchasing decisions.
- Then the document was scanned.
- When the document was scanned it was stamped as "Scanned".
- The paper document was archived in folders ordered by serial number. That allows fast retrieval of any original copy.
- The scanned file was named according to a scheme that included Serial#, Recipient, Sender, Purpose, Date and "tags" like "relevant for taxes" or "healthcare". After naming it was sorted into one or more directories.
Document retrieval was looking for file names. And this was were the process was now breaking after 16 years. We now had about 5.000 documents stashed there and it became more and more complicated to retrieve them. We needed full text search that works across multiple platforms (Windows, MacOS, IOS).
The second reason to start with Paperless-NGX was that more and more document arrived in a digital fashion (download, email). The workflow was rudimentarily adjusted by having an "empty" serial# for such documents. I didn't want to print them, serialize them and the scan them again. That would have felt stupid. But this would not have been a good solution.
The third reason was the amount of manual effort involved. Naming the files felt bothersome. I put it off for month and then hundreds of documents were waiting. Usually tax season was the ultimate deadline. Also in manual processes, you make a lot of mistakes, mostly spelling the name of the correspondent wrong or hitting a wrong number. There was no automation assisting me.
I thought of using AI to partially automate this process (naming and retrieval). But then I stumbled over Paperless-NGX and had the impression that it would solve most of my problems "out of the box".
Design choices
The standard way to run Paperless-NGX is to use a docker container. I am using Proxmox VE and that solution did not support running docker containers directly. There were two options to address this:
I decided for option 2 as it eliminated a technology layer completely. Docker is not extremely complex, but in my experience adding a technology layer to make your life simpler always backfires at some point. Furthermore using Docker makes you more prone to not understanding how the application really works. In case of debugging or updates this can carry a hefty price tag.
This has one huge advantage to the reader of this tutorial. You can take 95% of this tutorial to install Paperless-NGX on a virtual machine or even a bare metal hardware.