RE: https://mastodon.social/@arstechnica/116223713911650708

This is kind of neat.

I'm reminded however of a cheap old 'solution'.

The size of the source file != the amount of rendered bytes on your screen, you might have a problem.

original research with links to compromised repos: https://www.aikido.dev/blog/glassworm-returns-unicode-attack-github-npm-vscode

#infosec #supplychain

Just cat ing the file shows there's something wrong alone, but most #developers from my experience aren't really looking at the changes themselves, only the changelog if that.

Also a strange code change in a package.json file.. odd choice because to me that stands out. Again, devs (in enterprise based setups i've been at) aren't usually looking at that these days, only care about rabidly releasing features with the help of AI.

Lets see if my idea works, size of file vs whats on screen with some other simple techniques.

#infosec #supplychain

So far the tally is one indicator, code in a package.json file.

Another indicator in vim within a terminal with #nerdfonts shows something off too. It is subtle and I'm not sure that would have shown up if I didn't have a nerdfont type in play.

#infosec #supplychain

I thought strings could help. Had to use some switches i never expected to exist or ever use :) Problem is the switches I used only limits strings to 7-bit encoding for bytes, which will drop legitimate utf-8 characters / emoji's etc. When i use the -eS that counts for utf-8, it still includes the PUA range.

So, its not as easy as I thought with simple default tools (not that I know all the options and tools!) . I guess any adequate protection to this exploit case would be to limit the range of characters allowed in your source code or included source code.

So while my solution "sort of" works, its not practical for today's worldly developers, and especially if you want to include other regional language strings in your code.

Good exercise though and a reminder of utf-8/unicode usage as well!

#infosec #supplychain