If you drag an emoji family with a string size of 11 into an input with maxlength=10, one of the children will disappear.

Except in Safari, whose maxlength implementation seems to treat all emoji as length 1. This means that the maxlength attribute is not fully interoperable between browsers.

I filed a WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=252900

252900 – HTML maxlength attribute treats emoji of string length 11 as length 1

@simevidas

Kinda wondering what the rules are: CodePoints, bytes? What if the page is UTF32 or ASCII? (Hopefully that insanity is gone)

@DevWouter @simevidas As I understand the spec, it’s “code units”, ie, 2-byte UTF-16 units, for historical or compatibility reasons probably. Wouldn’t make sense IMO if you started in a modern “codepoint” world. https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#attr-fe-maxlength
HTML Standard

@ujay68 @simevidas

Thanks to your link I did some digging and I came to the same conclusion. It even says that JavaScript strings are UTF-16. However a quick check in javascript on both Firefox and safari and the JS implementation is the same.

Kinda wierd that HTML5 spec suggest UTF-8. (also mastodon counts 👩‍👩‍👧‍👧 as a single character)

@DevWouter @simevidas Yes, JavaScript strings have been UTF-16 since the beginning of time. I think that’s where many of the compatibility issues come from. The Go language, eg, has a more modern approach combining UTF-8 byte sequences and codepoints for characters (“runes”).
Introduction to character encoding in .NET

Learn about character encoding and decoding in .NET.

@DevWouter @simevidas From an end-user point of view, the only concept that would make sense as a measure of length IMO is what Unicode calls a “glyph”, ie, a sequence of code points that display or print as ONE visible symbol, ONE (possibly complex composite) emoji or ONE (possibly multiply accented) character.
@ujay68 @DevWouter I guess, this could be based on text caret steps (when the user presses the Arrow Left/Right keys to move the caret).

@DevWouter @simevidas unfortunately, W3C defines “length” as UTF-16 code units. https://infra.spec.whatwg.org/#string-length

So Safari’s behavior is technically wrong.

Infra Standard

@chucker @DevWouter However, the spec defines maxlength both as a “length” and a “number of characters”, and “characters” is defined as code points, not code units. In this case the “length” is 11 and the “number of characters” is 7; the spec is malformed.
@jens @DevWouter so there is hope yet!
@chucker I feel quite confident that any correction will be towards the UTF-16 interpretation, for “compatibility”
@jens @chucker Yeah, the maxlength attribute was defined a long time ago. Browsers will not risk changing it now and breaking a bunch of websites in the process. However, a new attribute (maxchars or similar) could be proposed.

@jens @chucker @DevWouter Speaking of spec: I wanted to look up how maxlength is defined and got rewarded with this example:

The following extract shows how a messaging client's text entry could be arbitrarily restricted to a fixed number of characters, thus forcing any conversation through this medium to be terse and discouraging intelligent discourse.

<label>What are you doing? <input name=status maxlength=140></label>