Matt Miller

332 Followers
236 Following
55 Posts

I wrote a little Chrome extension to make animated WebP (“weppy”) files from a region of a webpage:

https://chromewebstore.google.com/detail/weppysnap/kepcjkdedeocincdhenhepinpkkoghmn

I use it when writing documentation and I want to show a short animation of how to do something (great for Github README for example). A bit simpler than a WebM video and more modern than an animated GIF.

I'll try to port it to Firefox at some point.

WeppySnap - Chrome Web Store

Capture a region of any browser tab as an animated WebP

Very happy to introduce a new tool, BookReconciler! You can take spreadsheets with book data and add subject headings, descriptions, ISBNs, HathiTrust IDs, & more. You can also cluster editions & variations of the same "Work." Led by @[email protected] and supported by @[email protected].

Example and analysis of how AI web scrapers are breaking small and medium cultural heritage sites.

AI Scrapers vs Wikibase
https://semlab.io/blog/ai-scrapers-vs-wikibase.html

- Analysis of 13 million requests to our Wikibase over 2 weeks
- Good vs Bad bots (self-identifying vs camouflaged)
- Close look at how GPTBot interacts with the Wikibase instance

Blog: Visualizing 14,000 Released Epstein Emails.

I built a viz of the emails released as part of the 20K House Oversight Committee docs.

https://thisismattmiller.com/post/email-visualization/

- Provides a clustered high level view of the emails exchanged by contact across time
- Allows you zoom into individual email exchanges and open the source documents

Future shape of the Library of Congress?

https://www.congress.gov/bill/119th-congress/house-bill/6028/text

Librarian appointed by congress, Copyright office removed from the library.

Blog Post: LCNAF & Trie – Storing +11M unique names in 50MB data structure in the browser

https://thisismattmiller.com/post/lcnaf-trie/

- Optimizing LCNAF authorized headings into a trie data structure
- In browser MARC file (binary & XML) name reconciliation tool
- In browser LCNAF search tool
OpenRefine / API / Command line tools for name reconciliation

Looks at what applications are possible when you can represent large indexes like LCNAF in a small data footprint.

LCNAF & Trie

Storing +11M unique LCNAF names in 50MB Trie data structure

Matt Miller

Halloween blog post: Italian Giallo Horror Films

https://thisismattmiller.com/post/giallo/

- Using vision language model Qwen2.5-VL to analyze a 70 film corpus (🧟) / 80,000 frames
- Automatically build “trope clusters” finding similar scenes across movies
- Plotting tropes across the run time of movies to see patterns

If nothing else probably the longest eye acting supercut you’ve ever seen: https://youtu.be/cGrmkOwut6k

Giallo

Using a vision language model to analyze Italian Giallo films

Matt Miller

New Post: PEN America Banned Books 2025 dataset

https://thisismattmiller.com/post/book-bans-2025/

Looking at school district book bans

- Interactive Map interface to the books banned in 2024-2025
- A faceted browse interface to the 3700 books
- Subject heading analysis

New Blog Post.
Library of Congress & Flickr Commons: Analysis of user interactions on 40,000 images
https://thisismattmiller.com/post/lc-flickr-commons/

- Organizing 95K photo comments by embedding clustering.
- Viewer to explore user georectified images
- Folksonomy tagging vs LCSH Vocabulary
- Placing into the Wiki* knowledge graph

Trying out workflows that use multimodal LLMs for validating and QA.

In this blog I walk through a test using 1000+ Siskel and Ebert videos to extract key video frames and other data.

https://thisismattmiller.com/post/building-datasets-from-video-collections-using-local-cloud-llms/

Building datasets from video collections using local & cloud LLMs

Using Qwen2.5-VL, Gemini 2.5 and Whisper to build a Siskel and Ebert dataset

Matt Miller