Mastodawn

It's not the languages that have the interaction with #Unicode. The vulnerability in the languages is their ability to take code that is constructed at runtime in string form and interpret and execute it.

It's the text editors, IDEs, and pagers (that display the commit diffs) hiding the #PrivateUseArea (and, yes, unassigned code point) characters by rendering them as zero width.

But quite a lot of them don't. The code snippet in the article doesn't actually look like the screenshot given.

In the likes of text editors such as NeoVIM and VIM, and pagers such as less, more, most, and console-tty37-viewer, these characters are either emitted as narrow-width glyphs, which at minimum displays as mystery strings of replacement characters, or turned into reverse video hexadecimal code point values.

@otfrom

Show thread

JdeBP 5d ago

@otfrom @simon_brooke

Given that the article provides two distinct bogus expansions of the PUA initialism when it comes to Unicode, there's certainly a whiff of verified-by-bullshit-generator about the article.

Plus, of course, there's the facts that (a) PUA glyphs are not zero-width, (b) programmers like to use 'nerd' fonts, and (c) even without that they'll show up as replacement characters in things like terminal emulators.

#Unicode #PrivateUseArea #ArsTechnica #AIslop #journalism