Somebody linked me RFC 7565, which linked to RFC7564, and if that's the place to look this appears to be the list of disallowed characters in a Fediverse username, and I'm cracking up because it's *mostly* stuff you'd expect, except the very first category of banned characters, specially, is "pre-1700 Korean characters".
The fediverse is welcome to all. EXCEPT KOREAN TIME TRAVELERS. Did you just wake up from being frozen in ice during the Joseon dynasty? The IETF is targeting you PERSONALLY
@Hearth The "Q" section is mostly about accented latin alphabet characters.
For example, "á" can be represented as either the single code point U+00E1, or as a pair of code points U+0061 U+0301. The second version is the code point for the letter "a" followed by "COMBINING ACUTE ACCENT" to add the accent to the previous code point.
Since they render identically (not just similarly), you probably don't want both sequences to be valid in names humans are meant to distinguish.
@mcc Oh! Yeah. It's because they don't have a well-defined canonical composition order, unlike modern Jamo, which do.
A weird bit of trivia: there is no composition for hanzi/kanji/hanja/chữ Hán characters (what many call "Chinese characters"). You can't just build one in Unicode. If you could, they'd also be in this list, for the same reason that Old Hangul Jamo are disallowed (which were only added because scholars needed them).
@mcc It doesn't come through in the RFC, but afaict it's more like "Hangul is too harmonic for our feeble algorithms to handle" Without reasonably interoperable "does <this> equal <that>?" algorithms, IDNA would be unreliable...
To quote selectively from https://www.alvestrand.no/pipermail/idna-update/2008-February/001117.html
"<...>the fact that Hangul is designed so well structured on so many levels (feature, phoneme, syllable) is actually the very reason for why there are so many (fundamentally, not only superficially) different proposals for encodings, [...]. Encoding designers all saw the beauty, but the differed on which level to consider most important. All the other, not-so-well-thought-through scripts give the encoders much less options to work (and mess) with."
@mcc I choose to interpret this as a personal slight to the self-proclaimed crown prince of the Joseon dynasty (who totally deserves it after what he did to Freenode)