Got #PDF? 8 million PDFs/8TB. Derived from #CommonCrawl. We refetched 2 million truncated files.
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/
Got #PDF? 8 million PDFs/8TB. Derived from #CommonCrawl. We refetched 2 million truncated files.
https://pdfa.org/new-large-scale-pdf-corpus-now-publicly-available/