If you drag an emoji family with a string size of 11 into an input with maxlength=10, one of the children will disappear.

Except in Safari, whose maxlength implementation seems to treat all emoji as length 1. This means that the maxlength attribute is not fully interoperable between browsers.

I filed a WebKit bug: https://bugs.webkit.org/show_bug.cgi?id=252900

252900 – HTML maxlength attribute treats emoji of string length 11 as length 1

@simevidas

Kinda wondering what the rules are: CodePoints, bytes? What if the page is UTF32 or ASCII? (Hopefully that insanity is gone)

@DevWouter @simevidas As I understand the spec, it’s “code units”, ie, 2-byte UTF-16 units, for historical or compatibility reasons probably. Wouldn’t make sense IMO if you started in a modern “codepoint” world. https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#attr-fe-maxlength
HTML Standard

@ujay68 @simevidas

Thanks to your link I did some digging and I came to the same conclusion. It even says that JavaScript strings are UTF-16. However a quick check in javascript on both Firefox and safari and the JS implementation is the same.

Kinda wierd that HTML5 spec suggest UTF-8. (also mastodon counts 👩‍👩‍👧‍👧 as a single character)

@DevWouter @simevidas Yes, JavaScript strings have been UTF-16 since the beginning of time. I think that’s where many of the compatibility issues come from. The Go language, eg, has a more modern approach combining UTF-8 byte sequences and codepoints for characters (“runes”).
Introduction to character encoding in .NET

Learn about character encoding and decoding in .NET.