Except in Safari, whose maxlength implementation seems to treat all emoji as length 1. This means that the maxlength attribute is not fully interoperable between browsers.
I filed a WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=252900
Following up with that, as I was thinking of some examples of what I mean...
Take kanji, for example. 漢字 is 2 characters, but it's 6 bytes, so is the length 2 or 6?
Or the phrase "Góða nótt" in Icelandic. It's 9 characters (counting the space in the middle), but it's 12 bytes. So, should this fail the maxlength check, if the maxlength is 10?
@ramsey @jkt @simevidas yes, but working on bytes means that the encoding has to be carried thorough the different layers and might cut utf-8 sequences apart (assuming utf-8 being the default encoding)
With either codepoints or grapheme clusters you at least get some valid (while not always sensible) result.
@ramsey @jkt @simevidas Safari may be Unicode-aware, but is it HTML-aware? The specification is clear on this point: "A string’s length is the number of code units it contains."
Kinda wondering what the rules are: CodePoints, bytes? What if the page is UTF32 or ASCII? (Hopefully that insanity is gone)
Thanks to your link I did some digging and I came to the same conclusion. It even says that JavaScript strings are UTF-16. However a quick check in javascript on both Firefox and safari and the JS implementation is the same.
Kinda wierd that HTML5 spec suggest UTF-8. (also mastodon counts 👩👩👧👧 as a single character)
@DevWouter @simevidas unfortunately, W3C defines “length” as UTF-16 code units. https://infra.spec.whatwg.org/#string-length
So Safari’s behavior is technically wrong.
@jens @chucker @DevWouter Speaking of spec: I wanted to look up how maxlength is defined and got rewarded with this example:
The following extract shows how a messaging client's text entry could be arbitrarily restricted to a fixed number of characters, thus forcing any conversation through this medium to be terse and discouraging intelligent discourse.
<label>What are you doing? <input name=status maxlength=140></label>
On strong password @simevidas
Sometimes I wonder if I should use accented characters when usin pass phrase in my native French language. But I often see systems breaking them (e.g. summer, été, becomes été)
As for your example the unicode family members in <p>...</p> you probably parsed e.textContent, and treated it as a product (e.g. first unicode is number 1, etc.) <p>1*2*3*4</p>
Because I notice that it's the boy the last. I imagine if you put another one at the end, it'll the last one
@simevidas you could add a reference to https://infra.spec.whatwg.org/#string-length that specifies that the length of a string is the number of UTF-16 code units.
(Alas, I personally would would prefer that graphemes would be the length – disappearing children or others tend to surprise users)
@simevidas Ugh, don't you just LOVE when browser makers go off on their own and refuse to adhere to convention…
…thus making it the web devs' problem. 🙄
@wilmhit, see also https://www.unicode.org/emoji/charts/emoji-zwj-sequences.html
Unicode going from static code points to a DSL is one of my least favorite modern development, not just for emoji but also Zalgo text. I do have extra distaste for emoji though because they are so tiny and I have to look them up every time for the meaning. IMHO accessibility standards should require title text in user agents.
Cc: @simevidas
@cnx @simevidas @wilmhit I've been pushing for the ability to expand an emoji when long-pressing on it. IMO this should be a common accessibility feature of all apps that display emoji. If not long press, then something else de facto standard.