In a methods / #DigitalHumamities class next semester, I want to cover basic corpus creation. Especially, I’ll probably focus on #OCR/#HTR/#ATR and #WebScraping. I find it incredibly hard to find good papers that can serve as a general introduction into these topics. All I find are either practical tutorials, or very specialized papers about specific approaches. Do you have any favorite readings about how to get to a text corpus in DH in the first place? Please share!
@felwert
Paige, J. (2024) ‘The Legality and Ethics of Web Scraping in Archaeology’, Advances in Archaeological Practice, 12(2), pp. 98–106. http://doi.org/10.1017/aap.2023.42 may be worth a look? It's intended to act as an introduction to webscraping as a research method.
The Legality and Ethics of Web Scraping in Archaeology | Advances in Archaeological Practice | Cambridge Core

The Legality and Ethics of Web Scraping in Archaeology - Volume 12 Issue 2

Cambridge Core
@BlckheathHopper That looks very interesting, thanks!