There seems to be a lot of interest in the question (thanks for the boosts!), but not so many suggestions yet. So I thought I’d share what I have found so far:
Re WebScraping, I think this paper by Black is a really good high-level overview: Black, Michael L. 2016. “The World Wide Web as Complex Data Set: Expanding the Digital Humanities into the Twentieth Century and Beyond through Internet Research.” IJHAC 10 (1): 95–109. https://doi.org/10.3366/ijhac.2016.0162.
The Journal of Open Humanities Data (JOHD) aims to be a key part of a thriving community of scholars sharing humanities data. The journal features peer reviewed publications describing humanities research objects or techniques with high potential for reuse. Humanities subjects of interest to JOHD include, but are not limited to Art History, Classics, History, Library Science, Linguistics, Literature, Media Studies, Modern Languages, Music and musicology, Philosophy, Religious Studies, etc. Submissions that cross one or more of these traditional disciplines are particularly encouraged.