Mastodawn

Stephanie Santschi Jul 12, 2024

Second day at #ChartingDSEA ! In the panel on Digital Resource Building in Japanese Studies, Kazumi Hasegawa on Global Christian Missionary Archives, and how both the Japanese section of the archive as well as Japanese collections abroad functions as local contributions to global and transcultural research questions.

Show thread

Stephanie Santschi Jul 12, 2024

Next, Anatole Bernet on how #DigitalHumanities tools help understand Imperial Japan's swift transformation of medical science, as visible in the annual reports by the Sanitary Bureau - a mine of information and statistics so extensive that computational analytics (OCR, table detection, results visualisation) necessary to look beyond its annual framing, and that contextual analysis can then help understand the patterns.

#ChartingDSEA

Show thread

Stephanie Santschi

Folled by Taka Oshikiri, stressing a comparative approach to studying cultural and literary modernisation in Japan, Russia and Ottoman Turkey in the 19th century by developing a database ( #Nonwestlit) on cultural and literary criticism in Meiji-period literary magazines.
#DigitalHumanities tools help to understand the discourse on enlightenment and innovation in these journals, and overcome access, scan quality, copyright restrictions, and Japanese language-related issues.
#ChartingDSEA

Show thread

Stephanie Santschi Jul 12, 2024

In "Launching a Decade-Long Project Model Building in the Humanities through Data-Driven Problem Solving'", Nobuhiko Kikuchi demonstrates how projects using the Union Catalog Database of Japanese Texts at the National Institute of Japanese Literature strengthen infrastructures that can be utilized to research premodern texts and circumvent the associated challenges - API endpoints, image viewers, machine-learning supported image analysis, open data sharing, etc.)
#ChartingDSEA

Show thread

Stephanie Santschi Jul 12, 2024

And concluding this panel, Nobutake Kamiya and
Tamako Kitaoka ask how students, researchers and teachers can find and use their desired information among vast collections of Japanese resources. They propose the 'Collaborative Resource Guide for Japanese Studies and Humanities in Japan', and highlight the collaboration necessary to maintain such infrastructure and update the research guide, at #EAJRS and the National Institutes for the Humanities.

#ChartingDSEA

Show thread

Stephanie Santschi Jul 12, 2024

The following #ChartingDSEA panel is on #OpticalCharacterRecognition in East Asian scripts. First, Wayne de Fremery on 'Authoring Optical Character Recognition (OCR) Solutions for East Asian Special Collections', who's warns that failure to reproduce historical scripts digitally endagers survival of these languages! OCR is an infrastructure which represents how we think and argue humanity. To conduct humanities research, we must be able to apply OCR to various scripts and styles.

Show thread

Stephanie Santschi Jul 12, 2024

Next at #ChartingDSEA - millions of pre-modern Chinese titles and archival documents face digitisation backlog. To counter, Colin Brisson suggests CHAT_models, guiding us "Towards Massive Production of Open Digital Corpora for the Study of Pre-Modern China".
He urges: datasets need to become reusable (XML TEI), not constantly recreated - and models to be published with code!

Show thread

Stephanie Santschi Jul 12, 2024

Representing Japanese OCR and the complexities of Japanese script which ranges from character and syllable alphabets to sinitic scripts and kuzushiji - Alíz Horváth on challenges and opportunities in pre-modern J-OCR which has so far produced tools like Miwo and KuroNet, but also faces the major issues of Kanbun (directionality, variation, reading aids, commentaries, character size). Some tools which offer solutions remain proprietary.
#ChartingDSEA

Show thread

Stephanie Santschi Jul 12, 2024

And the last speaker on the #ChartingDSEA OCR panel: Matthias Arnold on: "Towards fulltext of Republican Chinese newspapers" - how can segmentation work in complex newspaper layouts? Crowdsourcing, machine learning, annotation, OCR classification as processes forbrecognising registers and contents.

~ "Ground truths" essential for working with neural networks! ~