I didn't expect anyone to think I am asking for a General AI.
I just want semi-accurate heuristic to spare users having to tag language manually.
Proprietary APIs like Google Translate are out of the question. Already using WhatLanguage, it's not working well. If I can't find anything, welp, there's that. Not a big deal.
@hatsuki @Gargron seems to me if it contains Unified Han and kana you call it "Japanese"; if it contains hangul with or without Unified Han you call it "Korean"; and if it contains Unified Han but no significant amount of those other things, you call it "Chinese." Done, and close enough. Telling Swedish from Norwegian is much harder.
As for needing a "lang" tag to choose fonts, nearly all the Web has that problem anyway.
@mattskala @Gargron It's broken everywhere, so don't fix it. Sounds strange enough, but I am not the one doing the work, so I have nothing to complain.
On detecting languages though, this assumes single language posts. At least it should be clear to the user which lang tag will be added when tooting.
@Gargron #NLProcer here, Have you taken a look at https://github.com/diasks2/ruby-nlp?
Most NLP stuff isn't in Ruby (Python, Java, etc. are more prevalent).