Dr. Liz Fischer

241 Followers
31 Following
25 Posts

Medievalist DH researcher | Consultant @WeAreAVP
working with GLAM on AI/data R&D

Writing book on #NetworkAnalysis in #BookHistory

(they/she)

Websitehttps://www.lizmfischer.com
GitHubhttps://github.com/lizfischer/
TikTokhttps://www.tiktok.com/@betta_fisch
Twitterhttps://twitter.com/lizfischer0
Any #Gephi power users out there know why sometimes the direct select option stops showing connecting nodes on hover, and instead shows just the edges? This drives me *nuts* and I am in the middle of teaching a Gephi unit. #DH #datavis #networkanalysis
& Check out the original Twitter thread I posted discussing this in 2022 (!) for more context & an example of it in use https://threadreaderapp.com/thread/1539839040303816705.html
Thread by @lizfischer0 on Thread Reader App

@lizfischer0: This is the part of my dissertation I've been working on for the last couple of months! It's a tool to help split PDF-bound documents (so far, mostly scans of printed books) into "units of...…

Would love to hear feedback from anyone else who tries it! Initially, I had plans for even more advanced rule-writing capabilities, but had to scale development back to an MVP that would support my specific research at the time.
It also includes an interface for splitting & merging entries, and editing the OCR text
You write rules about the size of whitespace between text to group text in ways that Tesseract's layout detection can't
This tool guides you through the process of turning your PDF into machine-readable text, split up into intellectually meaningful units (like individual entries in a catalog). It takes advantage of the ways publishers use whitespace to communicate page layout
The tool & especially the documentation are very much works in progress, but at a point where other people can (hopefully!) actually use them. You can read the user guide here: https://github.com/lizfischer/document-segmentation/wiki/User-Guide
User Guide

Browser-based app for segmenting & OCRing PDF pages based on whitespace rules. To assist researchers (especially in the humanities) with turning their materials into machine-actionable datasets...

GitHub

Happy to finally be able to share this tool!

If you need to split a PDF into intellectually meaningful pieces, this can help you! Whitespace-based segmentation in your browser. No training of a model, just good ol' fashioned rules-based "AI" #DH

https://github.com/lizfischer/document-segmentation?tab=readme-ov-file

GitHub - lizfischer/document-segmentation: Browser-based app for segmenting & OCRing PDF pages based on whitespace rules. To assist researchers (especially in the humanities) with turning their materials into machine-actionable datasets.

Browser-based app for segmenting & OCRing PDF pages based on whitespace rules. To assist researchers (especially in the humanities) with turning their materials into machine-actionable datasets...

GitHub
For a project I'm working on🤫 I need #medieval -inspired fonts, so I made these and am SUPER pleased with how they're coming #dh