Python Package Management
Sachin Verma
in pythonMost new developers/users of python want to jumpstart and code their desired functionality. Many do not take time to understand how python packages are installed/updated on their system. But, if one really wants to stay away from python package installation troubles and want to distribute their modules to the world reliably, one should consider going through the basics of python package management.
Well, having said that, i must admit that python package management is slightly complex as there are a number of tools/utilities that are in use currently depending on the python version/platform you are using.
Python user base can be broadly categorized as:
- Users: anyone who is just using the python packages to write some application or an automation test setup doing continuous integration.
- Developers: some one who is creating a module/functionality and wants to share it with others.
How does python distribution look like when installed on the filesystem?
- on modern *nix systems, standard python installation goes to the directory
/usr/lib/pythonx.y
. - Any additional packages you download would go to the directory
/usr/local/lib/pythonx.y/site-packages
or/usr/local/lib/pythonx.y/dist-packages
- If a user wants to install packages for herself, she could use the
--user
flag with thepip
command. and the package would be installed in$HOME/.local/lib/pythonx.y/site-packages/
How can I install new python packages on my machine?
- High level package managers like
pip
andeasy_install
can be used to install additional python packages. Or, if you have the source archive of the python package with you, then you could simple dopython setup.py install
pip
is the recommended tool for doing installation for following reasons:- It has the ability to Install and Uninstall packages.
- it can resolve all the dependencies for the package being installed and would automatically install all the dependencies.
How difficult is it to install/use a python module?
- Not at all difficult. You can place a python module in any directory and then all you have to do is to add this path to the
PYTHONPATH
environment variable. This environment variable stores a list of paths on your system where it needs to search for a module. Now, as soon as you include a module from your python code,python runtime
on your system would read this environment variable and would be able to load it for your program.
- Installing/Using a module in an ad-hoc manner as in the previous paragraph is fine for local usage. But, if we want to deploy the same functionality/module on thousands of other systems around then we need to organize the package distribution and create a standard to use or discard a module from a system.
What constitutes a package?
A python package consists of the following elements:
- python source files
- Data/Resources if any
- shared libraries if any
- metadata
What should a package management system accomplish?
For any Package management system the primary goals are:
- package must be installable/usable on any target system.
- package dependencies should be easily tracked
- It should be easier for other package managers to repackage your package.
Above goals pose a tough challenge for any package management system.
Python modules can be of following types:
pure python
: This code is platform independent written only in pythonpython extension
: Some functionality of your package could be coded in lower level languages likeC/C++
forCPython
orjava
forJython
.
Packaging, building and distributing a pure python
code is not so difficult as the code can be compiled and run on any platform which provides a python Virtual Machine implementation. But, the process is not so simple when we need to do the same for python extensions
.
For instance, if i were to write a python extension
in C, i have following options:
- Option 1
- Bundle the
C
source files along with the python package. - Any user who choses to install this package would have to download the compilation/build tools necessary to compile my
C
source code file for that platform.
- Bundle the
- Option 2
- C file could be pre-compiled and packaged alongwith the python package and stored on an index server like PyPI.
- The limitation of this approach is that we would need to create multiple such packages with C program compiled for a lot of different platforms.
- Advantage of this approach is that the target systems would not need to install C compilation/build tools and wait for the compiled libraries.
Evolution of Python Package Management:
distutils
is born- Initial package management toolset which allowed users to build, install and distribute python packages.
- Available as part of python standard library.
- Although quite capable, distutils was not able to keep pace with the evolving requirements as it was part of the standard python library. Anything, which is part of a standard library faces stiff resistance for change requests.
setuptools
arrives on the scene- Reluctance of standard library packaging tool
distutils
to incorporate new features led to development of a 3rd party package management toolset calledsetuptools
. - Added support for dependency management while installing modules.
- Supports
.egg
package format - Added a module
easy_install
which is a high level package manager allowing you to fetch packages from PyPI server including the dependencies
- Reluctance of standard library packaging tool
distribute
(fork ofsetuptools
)- among other reasons, distribute was created to expedite work on easy_install
distutils2
Succeedsdistutils
- created to add important features to standard packaging tool.
- Abandoned later on.
setuptools
makes a comebackdistribute
project abandons development and merges back withsetuptools
.setuptools
is now the main packaging tool.
distlib
in pipeline- Once complete, it may become part of python standard library replacing
distutils
- Once complete, it may become part of python standard library replacing
Currently, as a developer you must use one of the following two options to package your modules:
distutils
:- Standard, part of python standard library, so always available, albeit with less features.
setuptools
:- recommended tool to package your modules.
- lots of features including support of pkg_resource and pip.
- active development.
- Wheel format adopted.
How to publish your package
- At minimum, you need to write a
setup.py
python script in which you prepare the blueprint of your package:- You define the source files (python or C/C++/Java etc in case of extension modules) that would be part of the package.
- you define the metadata for your python package like
name
,version
,dependencies
etc. - you define rules about how your package should be laid out on the target system.
- you have to specify in the
setup.py
script that whether you want to usesetuptools
ordistutils
for creating the required package. - python packaging ecosystem supports a number of targets (or outputs) for your package.
- you could run a target
sdist
which would produce a source distribution. This distribution can then be unpacked and built from sources provided on the target system. - you could create a
bdist
which produces a binary distribution of your package. You must remember that the binary distributions are specific for particular platforms.bdist_wheel
is becoming a popular installation format for the benefits described in the next section. - you can do a lot of cool things, generate rpm's and various other formats.
- you could run a target
- Once the package file has been generated locally, it can be uploaded to the
PyPI
server. - Once uploaded to
PyPI
, the PyPI server would create an entry for the package in the database along with an entry on a web page describing the package.
Python Distribution Containers:
Python defines 2 standard container formats which allow to pack metadata along with your modules:
- Eggs Format:
- Proposed during early years of python packaging standard development, succeeded by Wheels format described next.
- simple zip archive of project files and metadata about the project.
- Package in
.eggs
format can be imported in your code while still being in the zip format. - can include compiled extensions of python files (
.pyc
) - Once installed, it creates egg-info directory which hosts all the relevant metadata for the package.
- you could package source distributions to the PyPI server using eggs.
- Wheels Format:
- New container format for packaging.
- It is a binary only installation format. This implies that you do not need to use setup.py on your target system.
- It has very fast installation time as we do not need to build or compile any python extension modules. All we need is to copy the prebuilt modules for the target.
- It allows caching - This makes it possible to do quick installs in the
virtualenv
setups. - Compliant to the new PEP standard of meta-data. It creates
dist-info
directory inside thesite-packages
directory to install the metadata.
Looking Ahead
Existence of so many different utilities to manage python packages has been a source of confusion for everyone. Considering the features of tools available today, it would be a safe bet to use setuptools
or distutils
for doing all the packaging stuff. For installation/uninstallation using different sources of python packages or the cheeseshop
a.k.a PyPI
, it is recommended that you use pip
.
Although you are free to choose the distribution format as per your requirement, but use of wheels
format is recommended if it suits your deployment requirements.
setup.py
still remains our gateway for packaging of python modules. setup.cfg
introduced in distutils2
is a great way of decoupling metadata from the setup python script. But, it is still undergoing improvement as the standard evolves.
← previous: Gapminder Demo next: Understanding Docker Container ecosystem →