Mastodawn

mcc

Somebody linked me RFC 7565, which linked to RFC7564, and if that's the place to look this appears to be the list of disallowed characters in a Fediverse username, and I'm cracking up because it's *mostly* stuff you'd expect, except the very first category of banned characters, specially, is "pre-1700 Korean characters".

The fediverse is welcome to all. EXCEPT KOREAN TIME TRAVELERS. Did you just wake up from being frozen in ice during the Joseon dynasty? The IETF is targeting you PERSONALLY

Cassandra is only carbon now Feb 20

@mcc I get it, but the exclusion of "Q" property characters is an interesting and odd one.

Charles A-M Feb 20

@xgranade @mcc Reminds me of how CIRA decided that anyone who buys a .ca domain automatically gets reserved all accented character variations: https://www.cira.ca/en/ca-domains/register-your-ca/domains-french-accented-characters/

Domains with French accented characters – CIRA

Looking to own a domain name with an accented character in it? Learn more about domains with accented characters, including how CIRA manages them.

CIRA

Tiff Feb 20

@mcc well, darn, I guess I don't comply to the PRECIS IdentifierClass profile

Yes, *that* dawn person Feb 20

@mcc 왜 그렇게 되셨나요?

@thatdawnperson I thiiiiink that the way they fit the antique Korean jamo into requires a really awkward hack that they just don't want these systems to have to deal with

@thatdawnperson But seeing them lead with that just makes it seem oddly vindictive

the Hearth

@mcc ...is there any reasoning given for this?? and for the latter two, those seem weird too
-F

Athena L.M.Feb 20

@mcc @Hearth @xgranade I'm guessing Q and R are disallowed to mitigate homoglyph attacks. Maybe Old Hangul too, which presumably contains some homoglyphs with modern Hangul.

@alilly @Hearth @xgranade ohhhh wait that would make so much sense :O with the old jamo

the Hearth

@mcc @alilly @xgranade that makes sense! homoglyph attacks are still possible with e.g. replacing latin o with greek ο or cyrillic о, though?

...unless that's what section Q is talking about, i don't know exactly what it means
-F

Athena L.M.Feb 20

@mcc @Hearth @xgranade Yeah but that's much harder to do anything about, unless you want to ban modern speakers of languages written using Cyrillic from using names in their native language, which… don't do that.

the Hearth

@alilly @mcc @xgranade yeah, i guess the difference with the hangul thing is that it's a safe assumption no one is using thsoe characters to write their names in modern times, which is not the case for greek or cyrillic
-F

James Henstridge Feb 20

@Hearth The "Q" section is mostly about accented latin alphabet characters.

For example, "á" can be represented as either the single code point U+00E1, or as a pair of code points U+0061 U+0301. The second version is the code point for the letter "a" followed by "COMBINING ACUTE ACCENT" to add the accent to the previous code point.

Since they render identically (not just similarly), you probably don't want both sequences to be valid in names humans are meant to distinguish.

André van Schoubroeck Feb 21

@jamesh @Hearth it makes sense to say unicode forms should be normalised. One form for identical characters. Something like rfc7613

Athena L.M.Feb 20

@Hearth @xgranade @mcc … Damn, that might be a valid argument in favor of Han unification. How dare things I already made up my mind on have nuance I didn't consider?

R Feb 20

@alilly @mcc @Hearth @xgranade ... just saw this boosted out of context and was very confused why @q and I would be disallowed from something

chris martens Feb 20

@mcc this was a subplot in Analog surely

@chrisamaphone so remember, part of the revanchivist ideology in Analogue involved enforcement of writing in Hanja

Elizabeth

@mcc Oh! Yeah. It's because they don't have a well-defined canonical composition order, unlike modern Jamo, which do.

A weird bit of trivia: there is no composition for hanzi/kanji/hanja/chữ Hán characters (what many call "Chinese characters"). You can't just build one in Unicode. If you could, they'd also be in this list, for the same reason that Old Hangul Jamo are disallowed (which were only added because scholars needed them).

Ridley @ WATCH LYCORECO Feb 20

@Elizafox @mcc I regret to inform you, https://en.wikipedia.org/wiki/Chinese_character_description_languages#Ideographic_Description_Sequences
though afaik no implementation actually renders these sequences composed

Chinese character description languages - Wikipedia

@rcombs @Elizafox I AM NOW VERY EXCITED ABOUT USING THESE COMBINERS ON EMOJI, EVEN IF NOBODY CAN RENDER IT

Elizabeth

@mcc @rcombs Jamo are canonicalised to a glyph according to a formula. There’s no such thing for the Chinese character composition characters. Unfortunately.

Philip Guenther Feb 20

@mcc It doesn't come through in the RFC, but afaict it's more like "Hangul is too harmonic for our feeble algorithms to handle" Without reasonably interoperable "does <this> equal <that>?" algorithms, IDNA would be unreliable...

To quote selectively from https://www.alvestrand.no/pipermail/idna-update/2008-February/001117.html

"<...>the fact that Hangul is designed so well structured on so many levels (feature, phoneme, syllable) is actually the very reason for why there are so many (fundamentally, not only superficially) different proposals for encodings, [...]. Encoding designers all saw the beauty, but the differed on which level to consider most important. All the other, not-so-well-thought-through scripts give the encoders much less options to work (and mess) with."

Normalization of Hangul

https://en.wikipedia.org/wiki/Andrew_Lee_(entrepreneur)

technomancy Feb 20

@mcc I choose to interpret this as a personal slight to the self-proclaimed crown prince of the Joseon dynasty (who totally deserves it after what he did to Freenode)

Andrew Lee (entrepreneur) - Wikipedia

Gabe Feb 20

@mcc what will they have done to deserve this??