Mastodawn

mcc Feb 28

Wait hold on I just realized. Is

八人入

A reasonable Chinese sentence

Show thread

mcc Feb 28

…Also waaaa why did the character rendering change so much when I copied from Pleco to Tusky. Who gave eight a hat

Show thread

mcc Feb 28

In Pleco they look like this. I don't know if this is a different but regular hanzi font or if the CJK unification is messing me up somehow

EDIT: I currently think Tusky is showing me Japanese character variants https://social.mildlyfunctional.gay/@artemist/116146010272716935

Show thread

mcc Feb 28

This is what Tusky looks like.

Show thread

mcc

WAIT WTF this is an actual Chinese IME and it seems to be showing me Japanese characters. Ok I think Lenovo is fucking with me, one minute

Show thread

mcc Feb 28

Okay I now believe the problem is neither Tusky nor Lenovo but rather that Android is not a serious product and never has been. It seems Android may outright refuse to show scripts unless you've whitelisted the language. Problem: I think this menu is asking me which version of Chinese I want but the menu is in Chinese. I want to look at Chinese text so I can learn Chinese. I don't know it yet. I feel like I'm playing an adventure game.

* I may explore a PR later anyway.

Show thread

mcc Feb 28

Actually I'm pretty sure 简 already means simplified, so I selected simplified at the top level, and this second menu is asking… I don't know. Locale? TTS dialect?!

Show thread

mcc Feb 28

Update: I solved the problem, not by adding Chinese as an alternate language for my Android, but by deleting Japanese as an alternate language. Not sure when I added Japanese in the first place or what I was trying to accomplish but I question Google's decision that informing it I may look at text in Japanese makes it conclude I DEFINITELY won't be looking at Chinese!

Show thread

mcc Feb 28

Anyways I think the sentence was wrong to start with because it's missing 个s or something

Show thread

mcc Mar 2

鸡机

Attempting to imagine the Chicken Machine

Show thread

udonchy

Mar 2

@mcc fun fact 雞雞 is baby talk for penis lol

@ionchy nice lol

@mcc AT-ST

@mcc one needs not imagine the chicken machine https://youtu.be/dl9beG4LbJU

chair chicken

YouTube

Show thread

My name is Gordo Mar 2

@mcc I would love to understand this joke, but the present lack of context is good too, lol.

Show thread

rk: it’s hyphen-minus actually Feb 28

@mcc

From what I understand, the 个 is not optional and must be included except that it’s entirely optional and is dropped half the time or something.

Show thread

Heliograph Feb 28

@rk love this!!! no idea what this means (sorry to barg in the thread door) but will pinch this 个 and use for houses 😁 (or as fancy arrow)
@mcc

Show thread

mcc Feb 28

@Heliograph @rk The 个 is a friend that you give to a number so that it does not get lonely

Show thread

jonathankoren™Feb 28

@mcc @Heliograph @rk I prefer to think of it as the units people (and other things) come in. As in, “Going down to the bar to drink a couple of pints, and maybe bring back a ge or two.”

Show thread

Heliograph Feb 28

@jonathankoren o-0 a "G" or two? @mcc @rk

Show thread

jonathankoren™Feb 28

@Heliograph @mcc @rk
Narrator: They brought home zero ges of companions

Show thread

Heliograph Feb 28

@jonathankoren ゲ ✓ @mcc @rk

Show thread

Mister Dave Feb 28

@mcc @Heliograph @rk my mind went immediately to Knuth up-arrow, which gives numbers lots of friends

Show thread

slowtiger Feb 28

@mcc @Heliograph @rk
It's a Totoro umbrella.

Show thread

R Feb 28

@mcc not sure about the context it'll be used in, but the choices are:

* Mainland China
* Macau (you'd never guess this without looking it up)
* Hong Kong
* Singapore

Show thread

mcc Feb 28

@r thank you very much.

Show thread

artemist Feb 28

@r @mcc i mostly remember macau as "the one that has 门 in it".

Android will render text from languages not in your list, that's why pleco shows the right forms. it just won't do so unless explicitly told to with an android.text.style.LocaleSpan, which most apps don't bother to do.

You can get the same problem in web browsers if it isn't told what language to use. I regularly see japanese forms in chinese subtitles because google isn't setting lang="zh-Hans" for their subtitles.

Show thread

Perry Lee Feb 28

@mcc The second menu displays locations like Hong Kong and Singapore.

Show thread

Janne Moren Feb 28

@mcc
This is the problem with han unification; we're partway back to code pages and picking the right font to render a particular language.

Like telling Danes and Swedes that ä and æ is the same character and so we'll just make them the same in Unicode.

Show thread

Peter Brett Feb 28

@jannem Mmm, not sure about that. In my experience, “text encoding” and “language” are 2 orthogonal axes, and proper text handling requires you to know both.

This is one of the minor annoyances of Mastodon — it doesn't seem to be possible to mark parts of a post as being in different languages.

I don't have a huge problem with Han unification. I think it's a valid technical decision.

@mcc

Show thread

Janne Moren Feb 28

@krans @mcc
The bigger problem is that on the web and in apps there's usually no information on what language something is written in. Which means a browser or an app they can only guess what font to render Unicode han characters in. And when a user has installed support for more than one it is certain to frequently go wrong.

Edit: you don't need to know the language to always render "ä" correctly. You do need to know the language in order to render "骨".

Show thread

Peter Brett Feb 28

@jannem I agree. The root cause is that file formats, protocols and most programs are written almost entirely by English-speakers, who assume that only English-speaking people use computers and that all content will be in English.

For my entire lifetime, support for multilingual text has always been an afterthought — and many development frameworks make it incredibly difficult.

@mcc

Show thread

Peter Brett Feb 28

@jannem Also: “rendering” is necessary, but not sufficient. Collation, dictionary selection, punctuation, text-to-speech, etc. are all language-dependent.

@mcc

Show thread

Inga stands with 🇺🇦 🇵🇸Feb 28

@jannem @mcc and like telling everybody in the west plus the Greek plus everybody in the eastern Europe that actually all "A"s are the same character.
And that English "B" or "H" and Cyrillic "В" or "Н" are also the same (hint: these Cyrillic letters are actually for "v" and "n")

Show thread

Rachel Stantz Feb 28

@mcc han unification strikes again

Show thread

rex❗ (he/him) ♿🏴Feb 28

@mcc somehow i keep forgetting about your cursed tablet.

Show thread

dcbaok Feb 28

@mcc are you on ubuntu still?

https://www.thomasvanderberg.nl/blog/fix-cjk-font-order-linux/

Fix CJK font order on Linux (Ubuntu) | Thomas van der Berg

A short post for something I’ll probably need to remember in the future. In Unicode, Chinese simplified, traditional, and Japanese style characters all use t...

Show thread

mcc Feb 28

@dcbaok if I've somehow got onto Ubuntu I'm more screwed than I thought