Matt Miller

332 Followers
236 Following
55 Posts

Blog: Visualizing 14,000 Released Epstein Emails.

I built a viz of the emails released as part of the 20K House Oversight Committee docs.

https://thisismattmiller.com/post/email-visualization/

- Provides a clustered high level view of the emails exchanged by contact across time
- Allows you zoom into individual email exchanges and open the source documents

New Post: PEN America Banned Books 2025 dataset

https://thisismattmiller.com/post/book-bans-2025/

Looking at school district book bans

- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis

New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
https://thisismattmiller.com/post/lc-flickr-commons/

- Organizing 95K photo comments by embedding clustering.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph

New blog post, three interfaces to explore the 50K 1929 HathiTrust resources that entered the public domain last month:

https://thisismattmiller.com/post/hathi-pd-2025/

Including this one which lets you find literature/fiction books by genre and lcsh.

Hathi PD 2025

Data and tools to explore 50,000 1929 public domain titles in HathiTrust

Matt Miller
If you have +11 million names, like in the LC Name Authority File, how many of them anagram to each other? A lot: https://thisismattmiller.com/post/lcnaf-anagrams/
LCNAF Anagrams

Names Names Names Names.

Matt Miller

Browse 1928 books in HathiTrust that entered the public domain this week by popularity.

I made a couple interfaces that allow you browse and explore by Library of Congress Classification:

https://thisismattmiller.github.io/hathi-pd-2024/

#publicdomain

I wrote a blog post about political GIFs in Library of Congress Web Archives (https://thisismattmiller.com/post/animated-gifs-in-us-elections/) and I included some examples and now years later I'm getting shook down by a copyright troll for one of the images.

Is an Obama Blingee gonna cost me $400 🙃

Animated Gifs in US Elections

The useage of animated gifs in US Election websites

Matt Miller

They're moving us into another building at work, and everyone is throwing away their old stuff. And I found a print out of the lc homepage from 2001.

i'm a bit of a web archivist myself...

Made some improvements to:

https://pomodoro.semlab.io/

An OCR tool for complicated docs that lets you manually select what text to extract. You can now structure the text into fields and download as JSON. It now also supports multipage PDFs. New tutorial video on the home page.

A little viz to browse HathiTrust resources that are flipping to public domain today. Narrow the 58K by LCC and then scroll for the list of titles.
https://thisismattmiller.github.io/hathi-pd-2023/
#publicdomain