Mastodawn

mcc Feb 28

Wait hold on I just realized. Is

八人入

A reasonable Chinese sentence

Show thread

mcc Feb 28

…Also waaaa why did the character rendering change so much when I copied from Pleco to Tusky. Who gave eight a hat

Show thread

mcc Feb 28

In Pleco they look like this. I don't know if this is a different but regular hanzi font or if the CJK unification is messing me up somehow

EDIT: I currently think Tusky is showing me Japanese character variants https://social.mildlyfunctional.gay/@artemist/116146010272716935

Show thread

mcc Feb 28

This is what Tusky looks like.

Show thread

mcc Feb 28

WAIT WTF this is an actual Chinese IME and it seems to be showing me Japanese characters. Ok I think Lenovo is fucking with me, one minute

Show thread

mcc Feb 28

Okay I now believe the problem is neither Tusky nor Lenovo but rather that Android is not a serious product and never has been. It seems Android may outright refuse to show scripts unless you've whitelisted the language. Problem: I think this menu is asking me which version of Chinese I want but the menu is in Chinese. I want to look at Chinese text so I can learn Chinese. I don't know it yet. I feel like I'm playing an adventure game.

* I may explore a PR later anyway.

Show thread

mcc Feb 28

Actually I'm pretty sure 简 already means simplified, so I selected simplified at the top level, and this second menu is asking… I don't know. Locale? TTS dialect?!

Show thread

mcc

Update: I solved the problem, not by adding Chinese as an alternate language for my Android, but by deleting Japanese as an alternate language. Not sure when I added Japanese in the first place or what I was trying to accomplish but I question Google's decision that informing it I may look at text in Japanese makes it conclude I DEFINITELY won't be looking at Chinese!

Show thread

mcc Feb 28

Anyways I think the sentence was wrong to start with because it's missing 个s or something

Show thread

mcc Mar 2

鸡机

Attempting to imagine the Chicken Machine

Show thread

udonchy

Mar 2

@mcc fun fact 雞雞 is baby talk for penis lol

@ionchy nice lol

@mcc AT-ST

@mcc one needs not imagine the chicken machine https://youtu.be/dl9beG4LbJU

chair chicken

YouTube

Show thread

My name is Gordo Mar 2

@mcc I would love to understand this joke, but the present lack of context is good too, lol.

Show thread

rk: it’s hyphen-minus actually Feb 28

@mcc

From what I understand, the 个 is not optional and must be included except that it’s entirely optional and is dropped half the time or something.

Show thread

Heliograph Feb 28

@rk love this!!! no idea what this means (sorry to barg in the thread door) but will pinch this 个 and use for houses 😁 (or as fancy arrow)
@mcc

Show thread

mcc Feb 28

@Heliograph @rk The 个 is a friend that you give to a number so that it does not get lonely

Show thread

jonathankoren™Feb 28

@mcc @Heliograph @rk I prefer to think of it as the units people (and other things) come in. As in, “Going down to the bar to drink a couple of pints, and maybe bring back a ge or two.”

Show thread

Heliograph Feb 28

@jonathankoren o-0 a "G" or two? @mcc @rk

Show thread

jonathankoren™Feb 28

@Heliograph @mcc @rk
Narrator: They brought home zero ges of companions

Show thread

Heliograph Feb 28

@jonathankoren ゲ ✓ @mcc @rk

Show thread

Mister Dave Feb 28

@mcc @Heliograph @rk my mind went immediately to Knuth up-arrow, which gives numbers lots of friends

Show thread

slowtiger Feb 28

@mcc @Heliograph @rk
It's a Totoro umbrella.

Show thread

Gaelan Steele Mar 2

@mcc hmm, to me (learned Mandarin as a first language in parallel with English but am now extremely rusty) it reads as archaic but understandable; I think the use of 入 as a standalone verb also contributes here? like for something like “eight people entered [the room]” my natural translation would be 八个人进去了 (or 进来了, depending on my perspective)

…god I need to find an excuse to properly unrust my Mandarin

Show thread

mcc Mar 2

@Gaelan I thought this was a good solution https://mastodon.social/@noone2333/116146702971358411

Show thread

mcc Mar 2

@Gaelan Gaelan how do you feel about wuxia and/or girls who are really really *really* good friends

Show thread

Gaelan Steele Mar 2

@mcc uh, I’m not particularly familiar with Wuxia but am myself a girl who is really^3 good friends with a number of girls

Show thread

mcc Mar 4

@Gaelan i was gonna recommend a Chinese net animation called soulmate adventure / 风灵玉秀 which is a very cute, everyone-claims-this-is-GL YA-ish adventure series about two young female kung fu masters traveling together. bilibili used to have a youtube video with the entire first season but they took it down

Show thread

Erin Feb 28

@mcc the witch who cursed you: this isn't what I anticipated at all

Show thread

Rachel Barker Feb 28

@mcc I have to wonder if this is downstream of Unicode's choices around CJK unification. Because I seem to remember reading that it ended up causing some situations where, in order to correctly render a block of text, you need out-of-band knowledge of which language it's in.

Show thread

John-Mark Gurney Mar 1

@rachelplusplus
Yeah, I remember this from doing some i18n work, and the Wikipedia article appears to agree.

https://en.wikipedia.org/wiki/CJK_Unified_Ideographs

@mcc

CJK Unified Ideographs - Wikipedia

Show thread

xrvs Feb 28

@mcc isn’t this more of a font issue? i was sure having seen 八 and 入 with top hook in chinese subtitles sometimes

Show thread

abadidea Feb 28

@mcc unfortunately there’s not really a good solution to this problem and Android, like everyone else, just has to pick a resolution method and stick with it. If you’ve heard of “Han Unification,” well it sounds like something that happened violently in 2200 BC but actually it happened quite recently in a Unicode meeting room and it causes this exact specific intractable issue

Show thread

abrasive Feb 28

@0xabad1dea @mcc also an act of violence, I would argue

Show thread

groxx Feb 28

@0xabad1dea @mcc I suppose the only actually reliable approach would be to store the IME locale per character or something so that it can be accurately rendered as it was written... or are these truly identical graphemes, and there's no chance of confusion in context? Even when people use multiple languages simultaneously?

(late edit after reading a lot more: ah, I see they DID just add a variant-selector character to effectively specify the locale... that seems a bit unlikely to gain major use, but technically I like it I guess)

Maybe one day we'll have UTF-8-2 and it'll just be infinitely extendable, rather than using a limited length prefix.

Show thread

mcc Feb 28

@groxx @0xabad1dea There are various existing solutions but just because the solutions exist does not mean people follow them corectly

Show thread

groxx Feb 28

@mcc @0xabad1dea definitely agreed. even technically, it seems very unlikely to me that any IME is going to choose to, like, add variant selectors *to every single character* and confuse their users when it's blended with other text or in a size-limited scenario. those characters already take up a ton of space, making it worse won't go over well.