@theklan part of of those frequency lists are based on the bible. For each of the 1001 #languages #Unilex scrapped various *open* online resources: wikipedias, bibles translations, wordpress blogs. It is used in all our android phones for text autocomplete. Sometimes, only the bible was available so the frequency list reflects it. I would prefer larger, more diverse raw data but they did with what they got. I do not know any better hyperlingual open #corpus. cc @alexture
https://github.com/unicode-org/unilex