A popular Python library just became a backdoor to your entire machine

https://www.xda-developers.com/popular-python-library-backdoor-machine/

It's one of the most popular Python libraries for interacting with large language models [...] It has over 40,000 stars on GitHub, and it's an important dependency in a lot of AI tooling. It's also been compromised on PyPI, and the malicious versions are stealing everything they can find on your machine.

Sorry but... 🍿

A popular Python library just became a backdoor to your entire machine

Supply chain attacks feel like they're becoming more and more common.

XDA
@Khrys Wait, what? Python has a place you can install the Python equivalent of LD_PRELOAD code that gets injected into every program, and packages from their package manager can just drop stuff in there? Who came up with that shit and why isn't it fixed??

@dalias There's a fun story about embedding Python https://www.postgresql.org/docs/current/plpython.html . "meaning it does not offer any way of restricting what users can do in it" is load-bearing.

That it's categorically impossible to prevent Python from doing fopen(), or any other thing, is just wild, and was understood decades back. It's a design feature.

Chapter 44. PL/Python β€” Python Procedural Language

Chapter 44. PL/Python β€” Python Procedural Language Table of Contents 44.1. PL/Python Functions 44.2. Data Values 44.2.1. Data Type Mapping 44.2.2. Null, …

PostgreSQL Documentation
@davidfetter That's different and expected. It's not an embedded language. But the package manager should not allow a package installed with the intent just to use it in certain programs (which might all run in an isolated privilege domain) to install backdoor code that runs in all Python programs in any privilege domain.
@dalias @davidfetter a python package, can run arbitrary code on installation. So it's already too late then

@joshix @davidfetter OK, that's already a much deeper flaw that's shared with a lot of package managers.

But it still should mandate manual review whenever someone attempts to publish a new package version that adds/changes any code that would run at install time into a nominally-curated package repo.

@dalias @joshix that's a trivial and easily circumvented attack. Hit a dependency that's standard, and you're in. Hit several that are commonly used together, and finding just exactly what happened can get way harder, as can getting an accurate idea that the attack is actually over.

@joshix @dalias yep. Also, there's no way to prevent it from loading packages, or from doing anything else the shell that spawned it could do.

Python's popularity stems from...well, it's complicated, but there's a gigantic ecosystem now, and that's important when schedules set by people who have no part in writing, testing, or supporting code force people doing those things to cut corners. That usually works even after the first bug causes out-of-SLA downtime, irretrievable data loss, etc. It fails catastrophically when there are attackers.

@dalias @Khrys it's an hook from the 'site' module, which is what implements support for user installable package locations, and can be disabled completely. if your threat model allows malware to be installed to those locations, you are already compromised anyway. the hook isn't great sure β€” it's an old design that's difficult to replace without major downstream breakage β€” but there are many other ways you can amplify the attack, regardless.

@dalias Oh this is so awesome! Thanks to this I learned that it is possible to Rickroll a Python user (i.e. anyone who launches anything that is run using Python) at very unexpected times just by placing 83 bytes into the right file in the user's home directory.

Yes, this is a horrible misfeature.

@Khrys