This is a very funny page https://unicode-explorer.com/list/large
List of Super-Wide Symbols - Unicode Explorer

%s, Unicode symbol table, copy and paste

@mcc three⸻em dash can't hurt you
@gsuberland @mcc you underestimate my moisturizing routine 💅
@gsuberland @mcc developed by 3m during the cold war
@mcc the glyph '﷽' was mentioned in a discussion I saw recently about computing character widths as *rendered* in a terminal, and fundamental futility of this task
@SnoopJ @mcc it might have been a mistake to put "an entire prayer, or perhaps a dozen angels dancing on the head of a pin" as a single code point
@rotopenguin @mcc alas, the Unicode Consortium has limited authority over the evolution of human language in general
@SnoopJ @rotopenguin in my opinion, they have much more than they should
@mcc @rotopenguin oh? Anything in particular?

@SnoopJ @rotopenguin Well, for example, if the people of China decide to invent a new hanzi, effectively now they just can't

Or they can, but they have to ask someone for permission. They'd have to do some complex set of steps with a PUA codepoint. Before computer encoding they could just draw it

@mcc @rotopenguin nothing stops them from doing it and not encoding it (e.g. seal forms) but sure the reality is that someone's gonna want to put the thing on the computer at some point, and someone's gonna be in charge of that encoding. Not sure that problem has any solution other than "fuck it all text is purely graphical now"

I'd point to U+32FF SQUARE ERA NAME REIWA as an example of UTC acting in good faith here, but I don't follow along very closely with the massive volume of communication with their colleagues working on standards bodies in China. What I have read makes it seem like a pretty good working relationship

@SnoopJ @mcc @rotopenguin there is the option of looking at how folks actually go about composing new characters out of existing ones (already a thing people study) and construct an encoding for that.

that wouldn’t give complete flexibility, but it could be similar to what alphabetic language users have.

@tryst @mcc @rotopenguin it's also something Unicode already has, for languages where it is clear how to specify it without making a mess (e.g. the Hangul Jamo block and associated combiner semantics)

As the ligature above demonstrates, this is a hard problem, and almost always harder than one thinks

@SnoopJ @tryst @rotopenguin so for the record this exists except (1) arguably it doesn't exist and (2) you can't add new radicals (although arguably you can co-opt radicals from anything that exists in unicode already) https://social.treehouse.systems/@rcombs/116101096263014337
Ridley @ WATCH LYCORECO (@[email protected])

@Elizafox @[email protected] I regret to inform you, https://en.wikipedia.org/wiki/Chinese_character_description_languages#Ideographic_Description_Sequences though afaik no implementation actually renders these sequences composed

Treehouse Mastodon
@mcc @tryst @rotopenguin oh yes def not for hanzi
@mcc @SnoopJ @rotopenguin yep :) i was thinking something that captures a bit more than that (like the example in wikipedia of ⼟ vs ⼠).

@SnoopJ @rotopenguin my impression is that the unicode body is *very* keen on working with stakeholders, authorities, influential organizations. there are processes for things. if the chinese government wanted to introduce some new hanzi, they would not have difficulty working with the unicode body.

however *people* lack the ability to spontaneously do things under such a system, acting democratically or anarchically. they'd have to work their way up a pyramid

@SnoopJ @rotopenguin and one can imagine this system breaking under some future conditions. the simplification of Chinese under Mao is incredibly political and was heavily resisted by non-mainland Chinese speaking communities, afaik *because* it was associated with Mao. The Unicode body was lucky enough to emerge after the regularization of relations between the west and China. What if a country America hates introduces a new script? What if a *legitimately morally loathsome* government does?
@SnoopJ @rotopenguin There's an unspoken prime directive of Unicode which is "Unicode must not be forked". They will go to any lengths to prevent you *needing* a second standard. That currently leads to a maximalist approach that immunizes them from politics. Is there a future politics which is so rancid that the immunity fails?

@mcc @rotopenguin in your opinion, are there any standards bodies that operate at similar scale but empower individuals better? what does the UTC need to change to make "we accept proposals directly, from anyone" more effective and less "working up a pyramid"?

Edit: to clarify, I'm legitimately interested in your answers to these questions, they are not barbs.

@SnoopJ @rotopenguin I am attempting to raise a conceptual problem with standards bodies as a concept. We have attempted to systemize human thought and in the process we created, without particularly meaning to, restrictions on human thought. I am not at this time requesting changes.

@mcc @rotopenguin oh, okay. That makes sense, a lot of what I'm hearing sounds like fundamental problems with standards existing.

I don't know if there's a better way, but thank you for letting me pick your brain along these axes for a bit

@SnoopJ @mcc @rotopenguin With respect to "fuck it, text is graphical," can custom emoji such as those on Mastodon or Discord (intentionally picking two wildly divergent examples here) be an imperfect and limited kind of escape hatch?

I think to all the controversies about adding the Cool S to Unicode, which seems like a quintessential example of spontaneous new language development without governmental recognition that Unicode can work from?

@xgranade @mcc @rotopenguin I think so, at the expense of most of the "directly represented in the data" semantics. Or at least, it's *much* harder to encode those alongside a visual representation.

I'm not aware of any proposal for cool S or controversy about same, but that's absolutely My Bullshit, do you happen to have a pointer to more reading handy?

@SnoopJ @mcc @rotopenguin I don't have one handy, sorry, remembering something (possibly misremembering even) I read a couple years ago.

@xgranade @SnoopJ @mcc @rotopenguin

You're not misremembering; I saw it discussed here at some point in the last couple years, with someone sharing a link to the formal submission of the "cool S" proposal to the Unicode committee and some of the comments on it.

@SnoopJ @rotopenguin There's also, and this is quite small but its fascinating to me, in UAX 31 Unicode comes up with a set of recommendations for which characters should be allowed in programming language variable names. They specifically bar variables from "clerical scripts".

In other words, without meaning to, the Unicode body backed themselves into being a body *holding the power to decide what is and is not a religion*

@mcc @SnoopJ that's ... separation of Church and program state

@mcc @rotopenguin I think I'd quibble about "without meaning to" but I concede that this is a funny/tragic reading of UAX #31's guidance

Wondering what the equivalent quip is for moving Bopomofo to the limited use set is 

@mcc @rotopenguin I *will* give them a lot of credit for not making UAX 31 a cop, though. I.e. the spec bending over backwards to accomodate "do whatever you want as long as you write it down somewhere" profiles

But so it goes, a standards document must define things, and they have the unenviable task of writing standards about something deeply political

@SnoopJ @rotopenguin i'm not saying they *shouldn't* do this thing. only that it's a very large thing that a person with a different college degree would go "this should be done with incredible thoughtfulness!" and it was not done with thoughtfulness. that is interesting to me
@mcc 𒐫
@mcc this is how you say "go fuck yourself" to a font renderer

@mcc dunno about you, I use those all the time.

Hope you are enjoying the U+2614, aka ☔.

(horribly roundabout route to get that here: in R (!), save the unicode char in a variable, as a <- "\U2614", then print a, then copy-paste the result)

@mcc

Ha

@mattdm @mcc I'd like to see us return to cuneiform number writing...
@mcc My fav: ‱ - people should actually know how to use this one.

@mcc

King vs King is badass:

𒈙

@mcc

I've been thinking about this some more (because it's important) and I think in this age of AI we need to introduce new dash sizes.

The em dash is the width of an "M", and the en dash is the width of an "N" - so why not have a "Lugal vs Lugal dash" and other dashes of various widths, so that we can identify human-written content and reclaim dash usage from the word-repeating machines?

Take this magnificent character:

𒈙

As a dash:

—————

It's very eye-catching ————— it could perhaps indicate a pregnant pause in conversation.

@mcc ‱ it's a caterpillar!
@mcc ooh ohh, I call dibs on this guy
. 🎩
@mcc ﷺ no way l can type this nice
@mcc
Seeing them hang over the bounds of the nice fixed width display boxes I am reminded of the Grook 'On Problems'
'Our choicest plans have fallen through, our airiest castles tumbled over,
because of lines we neatly drew and
later neatly stumbled over.'