It is neat to see MetaAI using LaTeXML productively for arXiv preprocessing in their Nougat OCR work.

Good discussion in "5.2 Text modalities": there is indeed a lot of hidden complexity when recovering TeX input strings.

Rather tempting to wish for a way to normalize to "canonical" expressions...

project homepage: https://facebookresearch.github.io/nougat/

arXiv preprint:
https://arxiv.org/abs/2308.13418

#latexml #arxiv

Nougat