🧵 1/ It's #ResearchSupportPartnershipUiO #Wednesday! I started it by talking to a colleague of many years about practical data management (hint: older software for organising & classifying digitized archival materials). We also talked about ways of automatically recognizing #Arabic texts - a hot topic here in Oslo!
They will create a small working group doing manual transcriptions to train models for their materials together. The newly established #SustainableDSEs network will support them!
🧵 2/ Continuing the morning w/ interviews to find a new research assistant for my bilingual #EthicaComplementoria #DSE to help quality check the #Markup and #Annotations and prepare the TEI/XML export from #Transkribus.
The Dept. of Archaeology, Conservation and History has many eager master students who expressed interest in working on the project. Exciting!
🧵 3/ After lunch, I have a meeting w/ other research support staff from the Dept. and a colleague from the #UniversityLibrary about our institutional repositories for publications and research data.
@arockenberger Are you annotating directly in Transkribus?
@catominor I am! That is: we are using structural and textual tags to prepare smart TEI export; we're also using the "comment" tag rather wildly to catch all kinds of oddities, which we will decide on how to annotate at a later point properly.
We're using #Transkribus to prepare digital editions in a rather minimalistic way which hopefully aids more re-use scenarios.
@arockenberger How do you deal with tags going across pages? :) And issues with TEI export (like problems with overlapping elements and non-valid tags)? I really appreaciate your answers. I am doing some tagging in Transkribus but for non-critical things.
@arockenberger I hope I am not mistaken but that the most workflows I have seen in the digital editing were using page XML export from Transkribus and then they did conversion to TEI and annotating afterwards. So I am just curious.
@catominor There's a transformer written by Dario Kampkaspar, and it works well enough for my purposes. There's lots of data cleaning in the actual XML file afterwards, but I am OK with that; it's still a massive relief not having to manually transcribe texts, especially in an environment like Oxygen (no hate, just personal preference).
@arockenberger I completely understand it. I thought more about annotating part (not transcribing itself) and Transkribus -> TEI transformation. And if you are using annotating in Transkribus, how do you deal with its limitations like tagging over pages.
@arockenberger (I am using Transkribus as well :). Actually, I will be at the Transkribus User Conference this week. Are you going to be there as well?)
@catominor I never ran into that problem, actually. Transkribus doesn't let you create tags that span over more than 3 lines anyways, so we have resolved this problem by tagging them line-by-line when needed. It will be fixed in the XML file afterwards.
@catominor I'm creating a SMALL edition of an early modern book with a relatively simple layout.
@arockenberger Definitely. There are still few tools for making simple digital editions easy to prepare, unfortunately. I think that much harder than encoding itself (already difficult) is displaying the digital edition afterwards.
@catominor Ah, my favourite pet peeve! I am a code purist: I mostly want to see the encoding, no need to render it into a "pretty" text later :)