The German Literature Archive Marbach (DLA Marbach) recently ran a large-scale job with the forensic indexer FileTrove, the open-source tool I’m developing. In a single run, they processed around 4 million files in just over four hours, including metadata extraction and checks against the NSRL.
Thanks to the team (@lignum, @harvey) in Marbach for testing FileTrove on such a large dataset and sharing the results. 4 million files in ~4 hours is pretty fast.






