I've got a new paper in Digital Scholarship in the Humanities. It gives an high-level overview of a method for producing specilised corpora from large digitised newspaper datasets + code. It also contains a discussion of the importance of going beyond keyword search and discussion of a case study of philosophical writing in pre-1900 New Zealand newspapers.. I've written some more about the paper here: https://joshua.wilsonblack.nz/post/digital-scholarship-in-the-humanities/
Code & data: https://osf.io/7crgt/
I hope it's useful!
New Publication: Creating Specialised Corpora from Digitized Historical Newspaper Archives | Joshua Wilson Black
One of the promises of digital humanities for the ‘historical sciences’ is that we’ll be able to, in Tim Hitchcock’s words, shift from ‘piles of books’ to ‘maps of meaning’. That is, we’ll be able to produce high level representations of phenomena across large collections of text.