Good news: languages that are more widespread have a higher complexity. This means that underserved languages are more likely to be learned well using a smaller corpus, which could help a bit with the rich-get-richer problem of LLMs and existing corpora.
https://phys.org/news/2025-02-complex-languages-efficient-communication.html#google_vignette
Study finds complex languages may be more efficient for communication
How do languages balance the richness of their structures with the need for efficient communication? To investigate, researchers at the Leibniz Institute for the German Language (IDS) in Mannheim, Germany, trained computational language models on a vast dataset covering thousands of languages.

