A few years back, a friend of mine was part of a team that collected and compiled lots of #language samples for preservation of #culture and further #research, named #DoReCo, the DOcumentation REference COrpus.
Project website: http://doreco.info/
Their paper is here:
https://aclanthology.org/2020.lrec-1.324/






