Mastodawn

Tim Allison

@tallison

547 Followers

318 Following

2.4K Posts

Files and search. Founder Rhapsode Consulting LLC. Chair/VP Apache Tika, committer Apache PDFBox, Apache POI, Apache Lucene/Solr, Apache Nutch, Apache OpenNLP. Philologist emeritus.

#ApacheTika #ApachePDFBox #ApachePOI #FileFormats #FileForensics #ApacheSolr #OpenSearch #ApacheNutch #ApacheStormCrawler #JavaSecurity
#foss #OpenSource #bassist #fedi22 #🏳️‍🌈🏳️‍⚧️Ally

github	https://github.com/tballison
linkedin	https://www.linkedin.com/in/tim-allison-5a6722/

Tim Allison 1d ago

Character bigrams and naive bayes can get you pretty darned far.

Oh, and a couple of agents and a boatload of data.

And, I guess all of the researchers whose shoulders I'm standing on...

Tim Allison May 5

Voting is underway for #ApacheTika 4.0.0-alpha-1! 🎉

Started work on the 4.x branch in October 2024. Lots has changed, core principles remain.

Many, many thanks to the community of fellow devs and users!

Onwards towards 4.0.0!

https://lists.apache.org/thread/bjowzh4ssgtrghqjk7g2dtn9hs3qmyrv

Tim Allison Apr 9

Preview revamp of our website for #ApacheTika 4.x is live: https://tika.apache.org/docs/4.0.0-SNAPSHOT/

Let us know what you think and/or open PRs! Please!

Apache Tika Documentation :: Apache Tika Documentation

Tim Allison Mar 18

Voting is underway for #ApacheTika 3.3.0! Please give it a try and let us know if there are any surprises!

https://lists.apache.org/thread/pq4zjvqf3w5zbm5yoyg14qvr2kpd2by3

Tim Allison Feb 13

Living the dream... 🤖

Tim Allison Feb 12

mhoye Feb 12

Anthropomorphizing the technology is just one more way humans try to escape accountability. “The AI contributed a patch”, “the AI wrote the blog post”, “the car hit the pedestrian” and “the knife killed the victim”, those are all the same framing.

https://swecyb.com/@anderseknert/116056950299738296