https://www.fontspace.com/unicode/analyzer looks like a great site for breaking down Unicode text (h/t @alice)
https://www.babelstone.co.uk/Unicode/whatisit.html is similar but a bit less pretty
@adamhotep If you can demonstrate a way to reliably get binary input on a web page, I'd gladly throw together a single page tool to do the breakdown.
Unfortunately… I don't think it's that simple. Web page encoding, browser choice of content encoding for submitted data, and all text access in JS is of the Unicode code point characters, not the underlying binary data.
🤔 Could run through and generate the UTF-8 manually, just to highlight…
Dang. Now I've got a new night project.
/xe2/x82/xac vs € vs € vs ac;, etc). Consuming binary would be cumbersome, and while you could do it via base64, I don't really see the point.@adamhotep Weirdly, in most places (practically everywhere) I never bother to encode. My HTML files explicitly declare their UTF-8 encoding, so… why?
" " ← non-breaking space, for example. ⌥␣ on a macOS keyboard. Even my CSS icons have all largely switched to name tables, letting you use "user" as the actual named glyph… "character". (Might actually be a ligature? There's absolutely a proper name table in the font, though.)
Most compact would be without secondary encoding/escaping if possible.
@adamhotep As an example of how far this can go, some programming languages offer _extensible_ encoding/decoding:
def 📢(✉️): print(✉️)
📢("✋ 🌏")
Yes; that's valid Python. It even prints out "hello world". Looks like a joke. Is not joke. https://pypi.org/project/emoji-encoding/