Mastodawn

Šime Vidas Feb 24, 2023

If you drag an emoji family with a string size of 11 into an input with maxlength=10, one of the children will disappear.

Show thread

Šime Vidas Feb 24, 2023

Except in Safari, whose maxlength implementation seems to treat all emoji as length 1. This means that the maxlength attribute is not fully interoperable between browsers.

I filed a WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=252900

252900 – HTML maxlength attribute treats emoji of string length 11 as length 1

Show thread

Jonathan Kingston Feb 24, 2023

@simevidas dang not unicode aware :')

Show thread

Ben Ramsey Feb 24, 2023

@jkt @simevidas In this case, Safari is the one that’s Unicode aware. The other browsers are treating maxlength as the number of bytes rather than the number of characters. 🙂

Show thread

Ben Ramsey Feb 24, 2023

@jkt @simevidas

Following up with that, as I was thinking of some examples of what I mean...

Take kanji, for example. 漢字 is 2 characters, but it's 6 bytes, so is the length 2 or 6?

Or the phrase "Góða nótt" in Icelandic. It's 9 characters (counting the space in the middle), but it's 12 bytes. So, should this fail the maxlength check, if the maxlength is 10?

Show thread

Johannes ✔️Feb 24, 2023

@ramsey @jkt @simevidas bytes assume an encoding. Codepoints vs. grapheme clusters is the distinction in experience, I guess.

Show thread

Ben Ramsey Feb 24, 2023

@johannes @jkt @simevidas I thought it would be the other way around. The same grouping of bytes could represent different codepoints, based on the encoding.

Show thread

Johannes ✔️Feb 24, 2023

@ramsey @jkt @simevidas yes, but working on bytes means that the encoding has to be carried thorough the different layers and might cut utf-8 sequences apart (assuming utf-8 being the default encoding)

With either codepoints or grapheme clusters you at least get some valid (while not always sensible) result.

Show thread

Daisy Leigh Brenecki Feb 25, 2023

@johannes @ramsey @jkt @simevidas I think most OSes and language stdlibs/runtimes use something other than utf8 internally. NSString on Apple platforms is UTF-16, and str in Python 3 is actually a custom variant of UTF-32!

Show thread

Jonathan Kingston Feb 25, 2023

@daisy @johannes @ramsey @simevidas the HTML spec is UTF 16, surely safari is correct here?

Show thread

Jonathan Kingston Feb 25, 2023

@daisy @johannes @ramsey @simevidas ah the spec says the length is based on utf16 length so Safari is wrong as @simevidas stated initially

Show thread

Johannes ✔️

@jkt @daisy @ramsey @simevidas interesting choice to require a specific encoding. Probably that was the time where one assume utf-16 would be the encoding all operating systems etc. would use and then tying to storage buffers etc. makes somewhat sense. Especially also pre-Emoji ...

Show thread

Jonathan Kingston Feb 26, 2023

@johannes @daisy @ramsey @simevidas yeah utf16 was picked as a base in HTML and browsers before all of this.