Mastodawn

mcc Feb 28

Wait hold on I just realized. Is

八人入

A reasonable Chinese sentence

Show thread

mcc Feb 28

…Also waaaa why did the character rendering change so much when I copied from Pleco to Tusky. Who gave eight a hat

Show thread

mcc Feb 28

In Pleco they look like this. I don't know if this is a different but regular hanzi font or if the CJK unification is messing me up somehow

EDIT: I currently think Tusky is showing me Japanese character variants https://social.mildlyfunctional.gay/@artemist/116146010272716935

Show thread

mcc Feb 28

This is what Tusky looks like.

Show thread

mcc Feb 28

WAIT WTF this is an actual Chinese IME and it seems to be showing me Japanese characters. Ok I think Lenovo is fucking with me, one minute

Show thread

mcc Feb 28

Okay I now believe the problem is neither Tusky nor Lenovo but rather that Android is not a serious product and never has been. It seems Android may outright refuse to show scripts unless you've whitelisted the language. Problem: I think this menu is asking me which version of Chinese I want but the menu is in Chinese. I want to look at Chinese text so I can learn Chinese. I don't know it yet. I feel like I'm playing an adventure game.

* I may explore a PR later anyway.

Show thread

mcc Feb 28

Actually I'm pretty sure 简 already means simplified, so I selected simplified at the top level, and this second menu is asking… I don't know. Locale? TTS dialect?!

Show thread

Janne Moren

@mcc
This is the problem with han unification; we're partway back to code pages and picking the right font to render a particular language.

Like telling Danes and Swedes that ä and æ is the same character and so we'll just make them the same in Unicode.

Show thread

Peter Brett Feb 28

@jannem Mmm, not sure about that. In my experience, “text encoding” and “language” are 2 orthogonal axes, and proper text handling requires you to know both.

This is one of the minor annoyances of Mastodon — it doesn't seem to be possible to mark parts of a post as being in different languages.

I don't have a huge problem with Han unification. I think it's a valid technical decision.

@mcc

Show thread

Janne Moren Feb 28

@krans @mcc
The bigger problem is that on the web and in apps there's usually no information on what language something is written in. Which means a browser or an app they can only guess what font to render Unicode han characters in. And when a user has installed support for more than one it is certain to frequently go wrong.

Edit: you don't need to know the language to always render "ä" correctly. You do need to know the language in order to render "骨".

Show thread

Peter Brett Feb 28

@jannem I agree. The root cause is that file formats, protocols and most programs are written almost entirely by English-speakers, who assume that only English-speaking people use computers and that all content will be in English.

For my entire lifetime, support for multilingual text has always been an afterthought — and many development frameworks make it incredibly difficult.

@mcc

Show thread

Peter Brett Feb 28

@jannem Also: “rendering” is necessary, but not sufficient. Collation, dictionary selection, punctuation, text-to-speech, etc. are all language-dependent.

@mcc

Show thread

Inga stands with 🇺🇦 🇵🇸Feb 28

@jannem @mcc and like telling everybody in the west plus the Greek plus everybody in the eastern Europe that actually all "A"s are the same character.
And that English "B" or "H" and Cyrillic "В" or "Н" are also the same (hint: these Cyrillic letters are actually for "v" and "n")