Wait hold on I just realized. Is

八人入

A reasonable Chinese sentence

…Also waaaa why did the character rendering change so much when I copied from Pleco to Tusky. Who gave eight a hat

In Pleco they look like this. I don't know if this is a different but regular hanzi font or if the CJK unification is messing me up somehow

EDIT: I currently think Tusky is showing me Japanese character variants https://social.mildlyfunctional.gay/@artemist/116146010272716935

This is what Tusky looks like.
WAIT WTF this is an actual Chinese IME and it seems to be showing me Japanese characters. Ok I think Lenovo is fucking with me, one minute

Okay I now believe the problem is neither Tusky nor Lenovo but rather that Android is not a serious product and never has been. It seems Android may outright refuse to show scripts unless you've whitelisted the language. Problem: I think this menu is asking me which version of Chinese I want but the menu is in Chinese. I want to look at Chinese text so I can learn Chinese. I don't know it yet. I feel like I'm playing an adventure game.

* I may explore a PR later anyway.

Actually I'm pretty sure 简 already means simplified, so I selected simplified at the top level, and this second menu is asking… I don't know. Locale? TTS dialect?!

@mcc
This is the problem with han unification; we're partway back to code pages and picking the right font to render a particular language.

Like telling Danes and Swedes that ä and æ is the same character and so we'll just make them the same in Unicode.

@jannem Mmm, not sure about that. In my experience, “text encoding” and “language” are 2 orthogonal axes, and proper text handling requires you to know both.

This is one of the minor annoyances of Mastodon — it doesn't seem to be possible to mark parts of a post as being in different languages.

I don't have a huge problem with Han unification. I think it's a valid technical decision.

@mcc

@krans @mcc
The bigger problem is that on the web and in apps there's usually no information on what language something is written in. Which means a browser or an app they can only guess what font to render Unicode han characters in. And when a user has installed support for more than one it is certain to frequently go wrong.

Edit: you don't need to know the language to always render "ä" correctly. You do need to know the language in order to render "骨".

@jannem I agree. The root cause is that file formats, protocols and most programs are written almost entirely by English-speakers, who assume that only English-speaking people use computers and that all content will be in English.

For my entire lifetime, support for multilingual text has always been an afterthought — and many development frameworks make it incredibly difficult.

@mcc

@jannem Also: “rendering” is necessary, but not sufficient. Collation, dictionary selection, punctuation, text-to-speech, etc. are all language-dependent.

@mcc

@jannem @mcc and like telling everybody in the west plus the Greek plus everybody in the eastern Europe that actually all "A"s are the same character.
And that English "B" or "H" and Cyrillic "В" or "Н" are also the same (hint: these Cyrillic letters are actually for "v" and "n")