3) I reran Tika, 'file' and #siegfried on all the files.

You can explore the mimes via datasette: https://corpora.tika.apache.org/datasette

Or, download the whole sqlite db: https://corpora.tika.apache.org/base/share/tika-mimes-20230714.db.gz

I mean, who wouldn't want to spend the weekend looking for differences btwn #siegfried and #file and #ApacheTika?!

#filefun #digipres #fileformat #fileformatology

What say we run 'file' and #siegfried against #ApacheTika's 600k 'application/octet-stream's in the most recent #CommonCrawl crawl?

Anyone else want to join in the fun?

https://issues.apache.org/jira/browse/TIKA-3992

#filefun #fileophiles #digipres #mimedetection

[TIKA-3992] Add common missing mimes based on Common Crawl data - ASF JIRA

TFW, during a vendor's demo, the vendor's tool hits a stackoverflow on the quines in our unit tests. 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 🤣 #filefun #quine #filequines #filehumor