@aral adding arbitrary Unicode-high constants (like 0xFE00 and 0xE0100) to text buffers to obfuscate payloads is something that malicious actors have been doing for a long time (on top of tricks like hiding code 10k spaces after the end of a line).
The thing is this is exactly where automated reviews could be useful.
Even a simple grep test to check if your code runs eval() should at the very least trigger a warning in npm.
The presence of non-readable Unicode characters in text files (again, something that grep+sed can easily spot) should also be a big red flag.
And they could definitely train a small AI reviewer to identify common patterns (lines with too many spaces, cheap source code obfuscation techniques or execution via eval() of stuff that comes from arbitrary externally-controlled buffers). It’s not like we’ve never seen these things in the past 3 decades or so, we’d have plenty of material to train a good model.
Extensions uploaded to Mozilla for example go through a similar pipeline of checks before being approved.
The fact that the npm (and even pip) registries don’t run these basic automated checks, and packages are taken down only when shit has already hit the fan, is really disturbing.