Mastodawn

danibanez Apr 4, 2023

is there a site like https://float.exposed for utf-8? Like where you paste in a UTF-8 string and see how it's broken up into Unicode code points?

Float Exposed

Floating point format explorer – binary representations of common floating point formats.

Show thread

Julia Evans Apr 3, 2023

https://www.fontspace.com/unicode/analyzer looks like a great site for breaking down Unicode text (h/t @alice)

https://www.babelstone.co.uk/Unicode/whatisit.html is similar but a bit less pretty

Unicode Text Analyzer | FontSpace

Find out the real characters in a string of text. Great for finding hidden or similar Unicode codepoints!

fontspace

Show thread

r12a Apr 3, 2023

@b0rk @alice
Try perhaps also https://r12a.github.io/uniview/?charlist=%E1%B0%A3%E1%B0%A6%E1%B0%A3%E1%B0%A4%E1%B0%A7%E1%B0%B3%E1%B0%B6%E1%B0%80%E1%B0%A6 . Benefit is that there are images for all characters in Unicode except Han/Tangut/Korean. But you can also do much more analysis on the characters. hth

UniViewSVG 15

Show thread

Deborah Pickett Apr 3, 2023

@ri You made that page? I’ve been using (versions of) that page for years! Thank you! @b0rk @alice

Show thread

Jef Poskanzer Apr 3, 2023

@b0rk @alice I have one that is specialized for generating programming-language representations. http://acme.com/unicode/decode.html

De-Unicode

Show thread

Adam Katz Apr 3, 2023

@b0rk @alice
That Font Space page is great! Too bad it doesn't do a byte count or per-byte breakdown (UTF-8).

Show thread

Rev. GothAlice Apr 3, 2023

@adamhotep If you can demonstrate a way to reliably get binary input on a web page, I'd gladly throw together a single page tool to do the breakdown.

Unfortunately… I don't think it's that simple. Web page encoding, browser choice of content encoding for submitted data, and all text access in JS is of the Unicode code point characters, not the underlying binary data.

🤔 Could run through and generate the UTF-8 manually, just to highlight…

Dang. Now I've got a new night project.

@b0rk

Show thread

Adam Katz Apr 3, 2023

@alice
I was just thinking of a simple byte count and hex dump (with options for /xe2/x82/xac vs € vs € vs &#20ac;, etc). Consuming binary would be cumbersome, and while you could do it via base64, I don't really see the point.

Show thread

Rev. GothAlice Apr 3, 2023

@adamhotep Weirdly, in most places (practically everywhere) I never bother to encode. My HTML files explicitly declare their UTF-8 encoding, so… why?

" " ← non-breaking space, for example. ⌥␣ on a macOS keyboard. Even my CSS icons have all largely switched to name tables, letting you use "user" as the actual named glyph… "character". (Might actually be a ligature? There's absolutely a proper name table in the font, though.)

Most compact would be without secondary encoding/escaping if possible.

Show thread

Rev. GothAlice Apr 3, 2023

@adamhotep As an example of how far this can go, some programming languages offer _extensible_ encoding/decoding:

def 📢(✉️): print(✉️)

📢("✋ 🌏")

Yes; that's valid Python. It even prints out "hello world". Looks like a joke. Is not joke. https://pypi.org/project/emoji-encoding/

emoji-encoding

Module providing Emoji encoding for Python

PyPI

Show thread

Adam Katz Apr 3, 2023

@alice
I work in email, so there's lots of ASCII and quoted-printable encoding, sometimes frivolously so as a form of obfuscation.

Show thread

Julia Evans Apr 3, 2023

@alice @adamhotep someone pointed me to this https://mothereff.in/utf-8 but i'd really like it to highlight the utf-8 bytes and explain what code point each byte sequence corresponds to

UTF-8 encoder/decoder

An online, on-the-fly UTF-8 encoder/decoder.

Show thread

Eana Hufwe (mastodon.social)Apr 3, 2023

@b0rk
https://www.babelstone.co.uk/Unicode/whatisit.html for web
https://github.com/arp242/uni for local

What Unicode character is this ?

Show thread

amos Apr 3, 2023

@b0rk not that I'm aware - I go over it in https://fasterthanli.me/articles/working-with-strings-in-rust but none of it is interactive. I'm getting more into interactive viz lately but you'll probably beat me to this one! (we should really have a coordinated index at some point)

Working with strings in Rust

There's a question that always comes up when people pick up the Rust programming language: why are there two string types? Why is there String , and &str ? My Declarati...

fasterthanli.me

Show thread

Julia Evans Apr 3, 2023

@fasterthanlime what interactive visualizations have you been working on?

Show thread

Rev. GothAlice Apr 3, 2023

@b0rk UTF-8 specific… that's harder to recall.

https://www.fontspace.com/unicode/analyzer is one option for general; it hand-waves the encoding bit, a bit, though.

It's a… shockingly simple variable width integer encoding, so even an online hex editor with the right "template" (or such) applied to it could theoretically work.

Unicode Text Analyzer | FontSpace

Find out the real characters in a string of text. Great for finding hidden or similar Unicode codepoints!

fontspace

Show thread

Genders: ♾️, 🟪⬛🟩; Soni L.Apr 3, 2023

@b0rk there's unicode.link

Show thread

slerpy Apr 3, 2023

@b0rk not a website, but I can recommend this CLI tool

https://github.com/lunasorcery/utf8info

GitHub - lunasorcery/utf8info: Reads UTF-8 on stdin and prints out the raw Unicode codepoints. Useful for seeing exactly what a string consists of.

Reads UTF-8 on stdin and prints out the raw Unicode codepoints. Useful for seeing exactly what a string consists of. - GitHub - lunasorcery/utf8info: Reads UTF-8 on stdin and prints out the raw Uni...

GitHub

Show thread

AT Apr 3, 2023

@b0rk https://r12a.github.io/app-analysestring/ by @ri! (Includes Unicode properties & links to the UCD)

Analyse string tool

Tool to analyse what characters are in a string and list information about them.

Show thread

zahirtezcan Apr 3, 2023

@b0rk This with cyberchef may be of help https://gchq.github.io/CyberChef/#recipe=Escape_Unicode_Characters('%5C%5Cu',true,4,true)&input=RkVSRMSwIHZlIMWfw7xyZWthc8Sx

CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

Show thread

Laura DeCicco Apr 3, 2023

@b0rk Another option I haven't seen in your replies:

https://apps.timwhitlock.info/unicode/inspect

Unicode character inspector

Examine Unicode characters in UTF-8 encoded strings

apps.timwhitlock.info

Show thread

Ell 🏳️‍⚧️Apr 3, 2023

@b0rk Others have mentioned @ri's tools, but the full list is at http://r12a.github.io/applist, there are several tools for inspecting, breaking down, converting, and finding Unicode characters. The code converter has long been one of my favorite tools on the internet. Fairly utilitarian, but very functional!

r12a >> apps

Small web apps written in html and javascript.

Show thread

Codepoints.net Apr 3, 2023

@b0rk https://codepoints.net/analyze (by me) might be helpful.
This part of the site is brand-new, might still have some bugs, and I’m planning on expanding it.

The one thing that I try to make sure is to have a glyph rendered as often as possible for every code point, so that people know what it looks like instead of just tofu.