Mastodawn

"The code points represent every letter of the US alphabet when fed to computers, but their output is completely invisible to humans. People reviewing code or using static analysis tools see only whitespace or blank lines. To a #JavaScript interpreter, the code points translate into executable code."

Sigh.

https://arstechnica.com/security/2026/03/supply-chain-attack-using-invisible-code-hits-github-and-other-repositories/

Supply-chain attack using invisible code hits GitHub and other repositories

Unicode that's invisible to the human eye was largely abandoned—until attackers took notice.

Ars Technica

Show thread

Chip Butty 5d ago

@simon_brooke as this is from Ars, do we actually know it happened? (I think I need to filter them after their Gen AI disinfo issues)

Show thread

Simon Brooke 5d ago

@otfrom I don't know, but it is entirely plausible to me that JavaScript, at least, would behave in this way.

And, indeed, the language I've been working on in odd moments would also be somewhat vulnerable to this attack, since it allows symbols in UTF characters.

Show thread

Chip Butty 5d ago

@simon_brooke it sounds plausible, just that the source article there has burned their credibility

Show thread

JdeBP 5d ago

@otfrom @simon_brooke

Given that the article provides two distinct bogus expansions of the PUA initialism when it comes to Unicode, there's certainly a whiff of verified-by-bullshit-generator about the article.

Plus, of course, there's the facts that (a) PUA glyphs are not zero-width, (b) programmers like to use 'nerd' fonts, and (c) even without that they'll show up as replacement characters in things like terminal emulators.

#Unicode #PrivateUseArea #ArsTechnica #AIslop #journalism

Show thread

Simon Brooke 5d ago

@JdeBP @otfrom I have a lot of more useful things I ought to be doing today, but I am tempted to see if I can write a proof-of-concept attack, simply so that we can find out what languages (and thus what projects) are vulnerable.

There are something like 135,000 code points in these 'private use areas', but I have a hunch that the problem is much wider; UTF-32 has a potential four billion code points, and very many of those are unassigned. What happens if you use them?

Show thread

JdeBP

@simon_brooke

It's not the languages that have the interaction with #Unicode. The vulnerability in the languages is their ability to take code that is constructed at runtime in string form and interpret and execute it.

It's the text editors, IDEs, and pagers (that display the commit diffs) hiding the #PrivateUseArea (and, yes, unassigned code point) characters by rendering them as zero width.

But quite a lot of them don't. The code snippet in the article doesn't actually look like the screenshot given.

In the likes of text editors such as NeoVIM and VIM, and pagers such as less, more, most, and console-tty37-viewer, these characters are either emitted as narrow-width glyphs, which at minimum displays as mystery strings of replacement characters, or turned into reverse video hexadecimal code point values.

@otfrom

Show thread

🔏 Matthias Wiesmann 13h ago

@JdeBP @simon_brooke @otfrom Also I suspect one compounding factor is that many web-editors will do some processing on ASCII letter for code formatting and syntax highlighting, but will basically forward all the “weird stuff” to the web-browser.

The same issue appears with bidi injections:

https://wiesmann.codiferes.net/wordpress/archives/20031

Understanding Bidi Injections

🔁 Most injection attacks follow the same pattern: a character or a sequence with a special meaning is not properly handled in user provided data, and ends up being interpreted the wrong way…

Thias の blog