Scheduled Maintenance: We are aware of an issue with Google, AOL, and Yahoo services as email providers which are blocking new registrations. We are trying to fix the issue and we have several internal and external support tickets in process to resolve the issue. Please see: viewtopic.php?t=158230

 

 

 

[Python] Should packages be externally managed in a container image?

Programming languages, Coding, Executables, Package Creation, and Scripting.
Post Reply
Message
Author
superlazyname
Posts: 2
Joined: 2024-04-19 12:38

[Python] Should packages be externally managed in a container image?

#1 Post by superlazyname »

Hello,

Quick summary
I have some questions about why the EXTERNALLY-MANAGED file is in libpython3.11-stdlib https://packages.debian.org/bookworm/am ... b/filelist
- Based on the information I was able to find, there is a case to be made that this file makes sense for a desktop install of Debian, since desktop debian seems to ship with system utilities written in python
- I did some research, and from what I was able to find, this file does not seem to make sense for container images
- Was having the Docker image pull in libpython3.11-stdlib (and thus making the container externally managed) a deliberate design decision by Debian or an accidental / convenient carryover from desktop installs because both happen to use the same package?
- If it was a deliberate design decision to put that file in container images (not desktop installs), can you help me understand what problem Debian was trying to solve with that?
- If I use apt for C/C++ dependencies and python-is-python3, python3-pip, and python3-venv, and then use pip for installing all other python packages, am I going to break anything?
- Semi-related, I noticed some inconsistencies in system python scripts that I found in a live system in /usr/bin, it seems like the distro is not consistent on whether it thinks system python code should be ran in a venv or not. Those cases might be bugs.

Possible motivations for marking the base environment externally managed

To be honest I have not heard about this before, so I did some research to try and understand what problem this was trying to solve.

I did some research, and here's what I found out for why a distro might want to mark a base environment as externally managed per PEP 668:
- Some distros use python for internal system tools
- It's possible that somebody could install a package using pip that breaks a system utility
- There was some debate on whether PEP 668 makes sense for container images. It appears that the author of PEP 668 had the view that it did apply to containers, but the official guidance seems to leave it up to distro maintainers
- The opinion in favor of marking container image base python environments as externally managed appears to be this: Even in a container image that runs one piece of code only (e.g. a web server) it's considered a best practice to use a venv, because that code might in theory call a system utility that is written in python. That system utility might not work if a package was installed.

From what I have been able to gather, Debian has supported PEP 668.
- It would appear that the pypa developers have delegated the decision to distro maintainers

Further reading

NOTE: I am not any of the users on this Github issue list, this is the first time I've posted anything about this anywhere.

- https://github.com/pypa/pip/issues/10556
- https://discuss.python.org/t/pep-668-ma ... aged/10302

Container use case

What I typically do with Docker images:
- I usually copy some python scripts into them and install their packages, then run the python code as a web server, etc...
- Some of these python packages are very obscure and probably not something debian really wants to bother with adding to their apt repos, and sometimes we make internal packages that are also too niche to bother submitting to Debian's apt repos
- I occasionally install apt packages if needed but it's rather rare
- I use apt to install python-is-python3, python3-pip, and python3-venv
- I use apt for C/C++/etc. dependencies (like libblas, etc...) and I use pip for python dependencies (requests, numpy, etc...)
- I do not use apt to install any python packages after the initial setup. Sometimes python packages use the C/C++ packages, sometimes python packages come bundled with compiled source.

For my testing, I'm using debian:12 from https://hub.docker.com/_/debian
- As far as I can tell, the debian:12 container image from Docker Hub does not have any version of python installed in it at all, I tried which python, python2, python3, and got no results
- I did a grep for "python" in /usr/bin and it came back with no results
- The act of installing python using e.g. apt-get install python3-pip will pull in libpython3.11-stdlib

Here's what I'm not understanding:
- If there's no python installed in the debian base image to start with, then there are no core system utilities that use Python
- If there's no system utilities in the debian image using python, is there any concern of breaking anything by installing packages to the system directories?
- Is it that the debian image doesn't currently use python for system utilities, but it might want to in the future?
- If I use apt for C/C++ dependencies (after installing python-is-python3, python3-pip, and python3-venv) and use pip for installing all other python packages, am I going to break anything?

It seems like if I use apt for C/C++ dependencies, and pip for python dependencies, the only way I could possibly get in trouble is if I installed an apt package that just happened to have some utility written in python that expected the distro's exact environment, is that correct? If that's the case, is there a particular package that the Debian maintainers have ran into, that is known to cause problems?

My concerns about venv:
- Basically I'm scared that if I go with venv in containers, I'm replacing a problem I may run into (but have not ran into yet, the possible issue of breaking a system utility) with a different problem where I might fall between the cracks of documented behavior
- I'm a bit concerned about file permissions, like one of the people in that github issue, I run container images in OpenShift which runs container images with a random uid / gid every time, does somebody know for sure that venv will work if it's created by e.g. root and used by uid 242542?
- I'm not sure it really solves the problem of isolating the "stuff running on the system" from "the system". If I install C/C++ dependencies using apt, it's not like the app I'm running as a whole is really "portable" or isolated from the system anyway.
- It would appear that advocating for venv is the same thing as setting a dividing line between "the system" and "stuff running on the system", but the reality is, all of these parts are tightly coupled, if a python library in a venv needs a specific version of libsomething-dev, installed, there's not really much of a separation
- The "activate" script isn't viable in Dockerfiles because it only affects the current shell, I can't "activate" it in the Dockerfile and have that stick when the container image is ran, because it's a different shell. I have to manually add the venv's directory to the front of PATH

Why it's not so great to add the venv's bin directory to PATH:
- Sometimes python packages have command line utilities, a good example of this is pip.
- In some OSes (e.g. Windows) those command line utilities live in a directory named "Scripts", and in Windows, whenever I specify which python directory is "active", I add and remove those "Scripts" directories too.
- Rather unfortunately, on Linux, python utilities go in /usr/bin along with all the other important system utilities like ls, etc
- That means that, if I activate a venv, and run my script in that venv, and it then tries to run "somecommand", which isn't a utility installed in the vm, it will search the system path (/usr/bin) and find that installed in the system, and potentially do something bad instead of giving an error
- I think this can be worked around by doing "python -m somecommand" instead of "somecommand", which is what I would do in this case, but that doesn't help me with all the other packages that might just do "somecommand"
- It's also no help if e.g. command-a (in the venv) runs command-b (not in the venv, uses the system one), and breaks because it has different packages

My guesses as to why the container image is set up like this:
- It's accidental: The debian container uses the same libpython3.11-stdlib as the desktop version of Debian, and the container pulls in the same file
- It's deliberate: Debian at some point in the future wants to distribute more python system utilities, or this is a way of encouraging all python developers to use apt to install all python packages, instead of pip

Why it matters:
- If this was accidental or some carryover from Debian desktop, I'll just delete that file because it does not apply to container users.
- If this was a deliberate design decision, I want to know what Debian was concerned about.

Possible solutions:
- Maybe there could be an alternate version of libpython3.11-stdlib (and all the other things that python3-pip pulls in) that does not have this file? Or maybe something like python3-desktop that does, that only the cd install pulls in?
- I would totally understand if the developers of Debian would prefer to prioritize the desktop use case instead of the container use case. If the Debian maintainers agree that there's not a whole lot of value in having externally managed python packages for a container, but it's not worth making a whole separate "container stack" of python packages, I'll just delete it. I know wrangling package dependencies can be a pain. I would totally understand if the debian maintainers do not want to maintain a totally separate stack of python related apt packages based on two distinct "base packages" (e.g. python-externally-managed vs python-container).

To make it easier for people to find this post, here's the error most people search for.

Code: Select all

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.

    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.

    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.

    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
Semi-related: Debian desktop python utility shebang inconsistencies

If it matters, for my testing I ran these commands in the live environment of debian-live-12.4.0-amd64-mate.iso

I started up the live image of Debian from the CD, and sure enough, there is python3 installed on it, and a couple packages installed too (pip isn't installed but I see some entries in /usr/lib/python3/dist-packages), so I can see that there is a case to be made that the version of Debian that comes from the CD installer has "externally managed" python packages.

The following scripts in /usr/bin have shebangs of #!/usr/bin/env python3 which would pull in a venv's python instead of the system python. I did a grep for "python" in /usr/bin and found these, there may be more
- mesa-overlay-control.py
- pdb3 (this is probably OK, I would want pdb3 to run with the python in PATH)
- pygettext3
- route1
- rrsync

Conversely, these scripts use shebangs that are hard coded to /usr/bin/python3, which is what I would expect if python packages are considered "externally managed". You could make the case that these utilities could in theory be broken if packages were installed into the system python:
- orca
- py3clean
- py3versions
- pydoc3

These inconsistencies might just be bugs or judgment calls on a case by case basis (pdb for example makes sense), I'm not really familiar with these utilities.

Thanks in advance!

EDIT: I would like to add a bit more information about how other container images deal with this

python:3.11-bookworm: Compiles python from source

https://hub.docker.com/_/python/

It looks like they, among others I've found on the internet, e.g. the team responsible for devcontainer, have just opted to build python from source on top of Debian.
- This does allow users to run pip without the possibility of breaking system packages, since this version of python lives in /usr/local/
- I'm not sure about this as a big picture solution. It seems a little weird to have every app dev use a version of python compiled from source (instead of an apt package) just so they can install packages.

I may end up just using python:3.11-bookworm instead of using debian:12. I'm not thrilled with this solution but I feel better about it than venvs.

RedHat's solution for UBI: Using ENV to activate the venv

https://catalog.redhat.com/software/con ... dockerfile

This might be a little more palatable as a solution than building from source?
- Their Dockerfile creates a venv and sets its permissions
- It sets ENV, BASH_ENV, and PROMPT_COMMAND (???) to activate the venv before executing any command

I tried this container image with every edge case I could think of, and it appears to be about as reliable as I could hope. shebangs are respected.
- The comments in this Dockerfile (by RedHat) say that this approach will work with OpenShift, and it's random uids, but I did not test this personally.
- I'm a bit reassured by the way the Dockerfile is set up, even though the ENV / BASH_ENV / PROMPT_COMMAND method seems frankly a bit hacky and I feel like there's some edge cases that might surprise somebody. Maybe this will be a good option for OpenShift users.

Another workaround

If the Debian maintainers agree that the container image doesn't need to be externally managed, maybe the live CD could install an extra package for python, something like python-externally-managed, and that could be left off of the container image? Just an idea.

Other thoughts

This is maybe a bit abstract, but maybe the name of the "python" package is overloaded, at this point. Assuming the thing Debian is trying to avoid is somebody installing a python package from pip that breaks another apt package, it seems like we've got two distinct use cases for python,
- People who want to program with python
- People who want to install something that uses python

For python programmers, the setup in Debian is rather confusing. It sounds like "we have a python3 package, you can install it, but you can't do anything with it. Don't touch it.". Python devs aren't expecting python to be installed as a normal dependency like libflac-dev, something you install once for a program and forget about. It might be hard to get the word out about that. So far nobody I know has even heard about running a venv in a container image.

Edge case not solved by venv: 'system' code that's executed with "python script.py"

Let's say you had a script like this named systempython.py

Code: Select all

#!/usr/bin/python
print('I am a system python script, I want to be ran by /usr/bin/python, don't run me in the venv')
- If you ran this script using ./systempython.py from a venv, it would work perfectly
- If you ran this script using "python systempython.py" from a venv, it would use the venv's environment, because when it resolves "python" it's going to pick the first one in PATH, which is the venv's python.

That means, well, any python code distributed by apt, that calls python scripts at command line, must run them like './systempython.py' and never like 'python systempython.py'.

Cross platform python devs may be tempted to do 'python systempython.py' because as far as I know Windows via CMD or powershell can't do ./systempython.py but it can do 'python systempython.py'. People in Windows running bash probably could, but that's a rather obscure use case. It might be necessary to edit community packages before letting them get into apt, adding shebangs and changing os.system / subprocess calls to make them use system python instead of whatever shows up in PATH first.

This might be me just worrying about nothing, maybe there's already something in the apt package review process for stuff like this.
Last edited by superlazyname on 2024-04-19 19:28, edited 1 time in total.

superlazyname
Posts: 2
Joined: 2024-04-19 12:38

Re: [Python] Should packages be externally managed in a container image?

#2 Post by superlazyname »

Hello, any updates?

- Could somebody explain the reasoning behind marking packages as externally managed in the Debian image?
- Is what the python:3.11-bookworm image does the recommended best practice? (Compiling python from source and having that be a totally separate installation, not done through apt)
- Are the inconsistent shebangs a bug or intended behavior?
- Do the devs review apt packages in the official repos and edit packages that do something like "python systemscript.py" so that they use the system python instead of a venv?

I would like to hear the developers' opinions on this, thanks,

Aki
Global Moderator
Global Moderator
Posts: 3078
Joined: 2014-07-20 18:12
Location: Europe
Has thanked: 76 times
Been thanked: 416 times

Re: [Python] Should packages be externally managed in a container image?

#3 Post by Aki »

Hello,

I'm not a Debian developer, but perhaps I can give you some hints.
superlazyname wrote: 2024-05-01 14:23 Could somebody explain the reasoning behind marking packages as externally managed in the Debian image?
The reason is likely to be found here: The EXTERNALLY-MANAGED file [1] in the python3.11 source package [2] reports the following:

Code: Select all

[externally-managed]
Error=To install Python packages system-wide, try apt install
 python3-xyz, where xyz is the package you are trying to
 install.

 If you wish to install a non-Debian-packaged Python package,
 create a virtual environment using python3 -m venv path/to/venv.
 Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
 sure you have python3-full installed.

 If you wish to install a non-Debian packaged Python application,
 it may be easiest to use pipx install xyz, which will manage a
 virtual environment for you. Make sure you have pipx installed.
 
superlazyname wrote: 2024-05-01 14:23- Are the inconsistent shebangs a bug or intended behavior?
Do the devs review apt packages in the official repos and edit packages that do something like "python systemscript.py" so that they use the system python instead of a venv?
It depends on the upstream developer and debian developer/maintainer involved in the packaging of each debian package. If the upstream source code is modified, it is documented in the Debian source code
superlazyname wrote: 2024-05-01 14:23 I would like to hear the developers' opinions on this, thanks,
Further information in: If you strictly need a Debian Developer about python, you can ask in the following mailing list: Hope this helps.

---
[1] https://sources.debian.org/src/python3.11/3.11.9-1/debian/EXTERNALLY-MANAGED.in/
[2] https://tracker.debian.org/pkg/python3.11
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
⠈⠳⣄⠀

Post Reply