Wait hold on I just realized. Is

八人入

A reasonable Chinese sentence

…Also waaaa why did the character rendering change so much when I copied from Pleco to Tusky. Who gave eight a hat

In Pleco they look like this. I don't know if this is a different but regular hanzi font or if the CJK unification is messing me up somehow

EDIT: I currently think Tusky is showing me Japanese character variants https://social.mildlyfunctional.gay/@artemist/116146010272716935

This is what Tusky looks like.
WAIT WTF this is an actual Chinese IME and it seems to be showing me Japanese characters. Ok I think Lenovo is fucking with me, one minute

Okay I now believe the problem is neither Tusky nor Lenovo but rather that Android is not a serious product and never has been. It seems Android may outright refuse to show scripts unless you've whitelisted the language. Problem: I think this menu is asking me which version of Chinese I want but the menu is in Chinese. I want to look at Chinese text so I can learn Chinese. I don't know it yet. I feel like I'm playing an adventure game.

* I may explore a PR later anyway.

Actually I'm pretty sure 简 already means simplified, so I selected simplified at the top level, and this second menu is asking… I don't know. Locale? TTS dialect?!

@mcc
This is the problem with han unification; we're partway back to code pages and picking the right font to render a particular language.

Like telling Danes and Swedes that ä and æ is the same character and so we'll just make them the same in Unicode.

@jannem Mmm, not sure about that. In my experience, “text encoding” and “language” are 2 orthogonal axes, and proper text handling requires you to know both.

This is one of the minor annoyances of Mastodon — it doesn't seem to be possible to mark parts of a post as being in different languages.

I don't have a huge problem with Han unification. I think it's a valid technical decision.

@mcc

@krans @mcc
The bigger problem is that on the web and in apps there's usually no information on what language something is written in. Which means a browser or an app they can only guess what font to render Unicode han characters in. And when a user has installed support for more than one it is certain to frequently go wrong.

Edit: you don't need to know the language to always render "ä" correctly. You do need to know the language in order to render "骨".

@jannem I agree. The root cause is that file formats, protocols and most programs are written almost entirely by English-speakers, who assume that only English-speaking people use computers and that all content will be in English.

For my entire lifetime, support for multilingual text has always been an afterthought — and many development frameworks make it incredibly difficult.

@mcc

@jannem Also: “rendering” is necessary, but not sufficient. Collation, dictionary selection, punctuation, text-to-speech, etc. are all language-dependent.

@mcc