and there are certainly more, just search on huggingface...
You can find models like https://huggingface.co/Davlan/afro-xlmr-large-114L or even Apertus that boasts about "1811 natively supported languages" https://huggingface.co/swiss-ai/Apertus-70B-2509 ...
Some remarks and outlook:
- only 41 (2%) of African Languages substantially covered
- Only Latin, Arabic & Ge'ez scripts covered
- <10 languages are frequently supported
- 18GB of data in 23 datasets
- focus on classification
- focus on specialized small language models
- In Africa, research is often community-driven: participatory research, not driven by universities but communities like Masakhane
- it remains challenging to even find speakers/annotators
- It is necessary to invest in scalable infrastructure, ethical frameworks, and context-sensitive evaluation
10/10 (fin)
#MultilingualDH