How would I go about decoding each codepoint in this \u escaped JavaScript unicode string, containing a bunch of emojis, but in Ruby? I keep getting `invalid codepoint 0xD83D in UTF-8`. It however decodes perfectly fine in Chrome DevTools JS console and Node.js.

"\uD83D\uDE80"

#ruby #utf8

@postmodern I recognize that surrogate codepoint prefix (0xD8) from some Unicode work I was doing a couple of years ago. Javascript uses UTF-16 internally, so JavaScript strings use surrogate pairs for any codepoints over 0xffff. The simplest approach might be to just use JSON.parse, e.g: JSON.parse '"\\ud83d\\ude01"' # => "😁"
@nick_evans now I just need to figure out the other UTF-8 / JSON related bug, how to embed binary Strings into generated JSON. Apparently JSON is UTF-8 by default, so it will convert any binary string to UTF-8, attempt to escape special chars, and then hit an invalid byte sequence.
@postmodern JSON is a silly encoding! 😂 Mandates UTF-8, but also mandates surrogate pairs! I don't think binary data is natively possible in JSON. You just need to use Base64 (or some other UTF-8 safe encoding) for anything like that.
@nick_evans @postmodern I had not come across this surprise in JSON, fortunately, because it also allows you to write emoji without escaping. However, I've had to deal with software which (incorrectly) exports XML with escaped surrogate code points (like `��`) but only imports correctly escaped characters like (`🚀`).