So You Want to Solve Python Packaging: A Practical Guide

First, the technical: Python is used by vastly different groups of people, some that don't identify as "developers". Those groups often have disparate expectations about how packaging should work. Some don't even know what a package is.

Some don't even know they're using Python! Here's some examples: Python's in the Linux Standard Base and bunch of critical Linux stuff is written in Python. Distros gotta package those & their deps into their package database (deb/rpm).

Most distros want nothing to do with the language-specific package manager. They want to manage everything though rpm/deb/portage/whatever and they don't want you fucking around with system packages. Ever got burned by Python included with macOS? Yeah, same deal

So OS vendors want Python to be invisible to the user. They want it for system purposes, and they want to distribute python apps, scripts, and packages on their own terms. Cool. Let's pick another group: academics and researchers.

They want to do their research. They don't want to program Python. They want to work their data, create visualizations, and very importantly: they want fellow researchers to be able to use their code. These folks don't really want to think about packaging.

The packages they use, however, are complex fucking monsters. They're a mix of C, C++, FORTAN, Haskell, Julia, and god knows what else. They don't want to waste time installing build tools and compiling these things. Their packages need to be precompiled and ready to go.
Precompilation is *hard*, especially for high-performance libraries. You can't just distribute a build will all the fancy vector extensions enabled cause someone on a different processor won't be able to use it. You wanna see a nightmare? Look at TensorFlow.
Fundamentally these users do not want to think about this shit, and they're a *huge* group of users. You know who does think about this shit? Web developers, and every time someone comes along with "Python packaging sucks and someone should fix it" they're a web dev.

That's because web devs have different expectations. They *expect* to work with a packaging tool. They expect to find and install dependencies. They don't expect to work with a ton of native dependencies. They don't have the same problems.

This only scratches the surface of the technical complexity here. The reason there are so many tools for managing Python dependencies is because Python is not a monoculture and different folks need different things.

But let's assume for a moment that you can overcome those technical challenges. You can create a tool and workflow that works for the vast majority. Now you have to deal with people. You gotta convince a bunch of unpaid volunteers that you're right and that they should help.
You gotta convince a bunch of unpaid volunteers maintaining existing tools to give up their projects for your solution. Projects they built from the ground up for their own use case. You gotta write several PEPs and get them accepted.
You gotta deal with the politics: The PyPA which is completely volunteer and has all the responsibility of maintaining existing tools and practically no real authority or resources. They aren't a unified body, more of a loose collection of people that chat sometimes.
You gotta deal with the Python Core team and the steering council. They have consistently abdicated the details of packaging to the community. They aren't, at this time, very interesting in taking over packaging and telling the community how to manage their dependencies.
You gotta deal with downstream distributors and major users. Linux distros, Apple, Google, AWS, Anaconda, and so many more. Google's using Bazel to build their Python projects, good luck with that one!
You gotta deal with the users and the stans. Wanna know why I stopped working on Python packaging? I got harassed for *months* because KR picked a fight with Reddit right when I dared to include pipenv on http://packaging.python.org. Fuck that.
Python Packaging User Guide — Python Packaging User Guide

The Python Packaging User Guide (PyPUG) is a collection of tutorials and guides for packaging Python software.

So you want to fix Python packaging: you fucking can't. get lost.

@stargirl Rephrasing @corbin, I'd say that "Python Packaging" is actually solved. Mostly. Everything-pyproject.toml (maybe with PEP-725) is comprehensive for what's truly in scope for "python packaging". There will necessarily be some native stuff that is absolutely out of scope for "python packaging", in fact most of it. There will always have to be a "system layer" under "python packages".

Nixpkgs is one consumer that integrates PEP-compliant "python packages" with its "native" stuff

@stargirl @corbin Nobody asked for this opinion, but I think what needs solving instead is python imports. Specifically, first class multi-tenancy support which would remove the urge for writing things like https://discuss.python.org/t/allowing-multiple-versions-of-same-python-package-in-pythonpath/2219
Allowing Multiple Versions of Same Python Package in PYTHONPATH

TLDR; I wanted to get feedback on a potential feature that may be added to nixpkgs that allows multiple versions of the same python package to be installed in the same PYTHONPATH. This is a general approach that is not specific to nixpkgs and could be used in other package managers. The only nix specific part is the tooling to allow for the building of these specialized packages. All of the materials/demo is in this repo https://github.com/costrouc/python-multiple-versions. Sorry discourse preve...

Discussions on Python.org

@nobody @stargirl I'll note that this suffers from Zooko's triangle, a trilemma in nominal logic: https://en.wikipedia.org/wiki/Zooko%27s_triangle

Imagine every Python module were content-addressed somehow, and packages too. This only kicks the can down the road, because you'd probably rather import from a petname than from a SHA-256 hash, so somebody must now set up a petname-to-hash map, and all of the existing political issues of the Cheeseshop reappear upon the maintenance of that map. (This is why politics is an inevitable part of maintaining large ports trees too.)

Zooko's triangle - Wikipedia

@corbin @stargirl Um to be clear, the intention would not be to make developers write `import from package_name_${hash}`, but to generate hints for the import system at build/install time
@corbin @stargirl Maybe I should elaborate: imagine you have a project that uses protobuf-python, and you want to use it together with tensorflow. Chances are, tensorflow's outdated protobuf does not work for you. Rather trying to patch your project or patch tensorflow for compatibility with the older/newer protobuf, I'd want a solution where we instruct python to resolve `import protobuf` into one "package" when the import happens under `${prefix}/tensorflow`, and another package elsewhere

@corbin @stargirl There's two obvious challenges:

1) multi-tenancy support in python import cache (needs implementing),

2) potential symbol collision issues for native libraries (potentially solved by dlmopen, I suspect, but maybe not on windows and what do I know actually)

@corbin @stargirl Same for e.g. combining packages that use pydantic1 and pydantic2

@nobody @stargirl Sure. Those hints would be populated from a mapping either maintained via Cheeseshop or via local choices. Either way, there's politics going on; each individual mapping is a policy choice. In terms of Zooko's triangle, this would be the "centralized" option, which means that the import keyword would now functionally depend on whatever build/runtime configuration you've imagined.

Not that this is a bad thing! Check out PEP 302 for the hooks you'd need to prototype an actual implementation of your idea. Allen Short explained how to use PEP 302 here https://web.archive.org/web/20180411011138/http://washort.twistedmatrix.com/2011/01/introducing-exocet.html and I used it to add hot-reload plugin functionality to a Python Minecraft server.

Introducing Exocet

Last time I talked about the deficiencies of Python's module system. Now I'd like to talk about a solution to them. There are two questions...

@corbin @stargirl Thanks I'll read that. I've no idea what Cheeshop is though, but if we had this multi-tenancy support I'd just go and implement a setup hook for Nixpkgs to generate these hints. I wouldn't really care for much beyond that

@corbin @stargirl Nice, I'm not sure how I missed this. I suppose the PEP-302 example you shared is exactly the "local choices" implementation. I also now noticed this: https://github.com/mitsuhiko/multiversion. Similarly local.

> means that the import keyword would now functionally depend on whatever build/runtime configuration you've imagined.

Well it does today too

GitHub - mitsuhiko/multiversion: A hack that allows you to use different versions of the same library in the same Python process without clashes

A hack that allows you to use different versions of the same library in the same Python process without clashes - mitsuhiko/multiversion

GitHub
@corbin @nobody @stargirl petname as display only solves this problem (among others)