I have asked Claude Opus 4.6 (via GitHub Copilot Chat) to summarize various approaches to XML-plaintext-NLP-XML roundtripping, providing it with the respective GitHub repositories (listed in the report).

Claude finds FIVE different approaches. IMHO, in some cases I think it misses where it should have gone into details, but for an overview it is quite good. What do you think?

https://pad.gwdg.de/wwNnTvaETHKuzFiyhIYHog?view#Response-Report-Approaches-to-XML%E2%86%94Plaintext-Conversion-with-Annotation-Preservation

@eeditiones @davidlassner @TEIConsortium
@aboutgeo @cmboulanger

#TEIXML #NLP #StandoffAnnotation #TEIPublisher #Recogito

(TEI) XML plaintext Roundtripping Review - HedgeDoc

@anwagnerdreas @eeditiones @davidlassner @TEIConsortium @cmboulanger

Thanks for sharing! I'll read through the details later with interest!

FWIW: Claude's assessment of "Family E" might be based more on the eeditiones repo, perhaps. But at least it only vaguely relates to text-annotator-js ;-)

As far as Recogito is concerned, here's what we are using for exactly the described use case instead:

https://github.com/recogito/tei-standoffconverter-js

It's essentially a TypeScript port of the "Family A" code, slightly modified to our use case.

GitHub - recogito/tei-standoffconverter-js: Convert between TEI/XML and plaintext without losing markup context.

Convert between TEI/XML and plaintext without losing markup context. - recogito/tei-standoffconverter-js

GitHub

@aboutgeo

Yes, that's also something I noticed in the report: it seems a bit contingent where Claude goes into detail and looks up things and where it relies on hunches, speculation or keywords that seem to suggest how something is solved in fact. Well, running the whole thing in GitHub Copilot gives it access to the explicitly provided repositories, but prevents it from doing other web searches. Maybe it would have been better to put the question in a different setting with autonomous web search enabled...