Consider a Mastodon/Fediverse handle, like @[email protected] . What kinds of restrictions are there on "username"? Can I assume any valid unicode could go in there?

Somebody linked me RFC 7565, which linked to RFC7564, and if that's the place to look this appears to be the list of disallowed characters in a Fediverse username, and I'm cracking up because it's *mostly* stuff you'd expect, except the very first category of banned characters, specially, is "pre-1700 Korean characters".

The fediverse is welcome to all. EXCEPT KOREAN TIME TRAVELERS. Did you just wake up from being frozen in ice during the Joseon dynasty? The IETF is targeting you PERSONALLY

@mcc ...is there any reasoning given for this?? and for the latter two, those seem weird too
-F
@mcc @Hearth @xgranade I'm guessing Q and R are disallowed to mitigate homoglyph attacks. Maybe Old Hangul too, which presumably contains some homoglyphs with modern Hangul.
@alilly @Hearth @xgranade ohhhh wait that would make so much sense :O with the old jamo

@mcc @alilly @xgranade that makes sense! homoglyph attacks are still possible with e.g. replacing latin o with greek ο or cyrillic о, though?

...unless that's what section Q is talking about, i don't know exactly what it means
-F

@mcc @Hearth @xgranade Yeah but that's much harder to do anything about, unless you want to ban modern speakers of languages written using Cyrillic from using names in their native language, which… don't do that.
@alilly @mcc @xgranade yeah, i guess the difference with the hangul thing is that it's a safe assumption no one is using thsoe characters to write their names in modern times, which is not the case for greek or cyrillic
-F

@Hearth The "Q" section is mostly about accented latin alphabet characters.

For example, "á" can be represented as either the single code point U+00E1, or as a pair of code points U+0061 U+0301. The second version is the code point for the letter "a" followed by "COMBINING ACUTE ACCENT" to add the accent to the previous code point.

Since they render identically (not just similarly), you probably don't want both sequences to be valid in names humans are meant to distinguish.

@jamesh @Hearth it makes sense to say unicode forms should be normalised. One form for identical characters. Something like rfc7613
@Hearth @xgranade @mcc … Damn, that might be a valid argument in favor of Han unification. How dare things I already made up my mind on have nuance I didn't consider?
@alilly @mcc @Hearth @xgranade ... just saw this boosted out of context and was very confused why @q and I would be disallowed from something