-
Notifications
You must be signed in to change notification settings - Fork 0
Packaging
In the open source world, as most packages can be just installed (without need of a payment, sign up...), it is common to have tools to make it easier for users to install any package for a system.
The most well known example are Linux distributions, where one can simply install a package by writing in the terminal something like dnf install <some-package>
, apt-get install <some-package>
or similar. Also, graphical package managers are common.
The Python community developed several package systems over the years, being the more popular one today pip
. Python is a multiplatform programming language, and the package ecosystem is huge. Fot this reasons it was helpful to have its own packaging system, instead of packaging for every Linux distribution, and also for propietary operating systems like MacOS (brew) or Windows.
This does not mean that Python packages can't be provided as .rpm
, .deb
, .msi
... for those platforms. Popular projects usually exist as packages for those systems. But for smaller projects, it's usually enough to have a package in pip.
While pip has been serving well the Python community for the last years, it has a major limitation. It is a packaging system for Python packages. This means, that if our Python project depends on a non-Python library, we will only be able to package the Python library with pip, not the non-Python dependency. Imagine for example Pillow, the most common Python package for image manipulation. It can depend on libraries like libjpeg
for performing operations in .jpeg
files. In this cases, pip will work as expected if libjpeg
is available in the system, but it will simply fail if it doesn't.
While most of the Python ecosystem is not strongly affected by this, in the data world, most projects depend on non-Python dependencies (think of numpy, scipy, pandas...). For this reason, the PyData community has been moving from pip to Anaconda, a package manager designed with this problem in mind.
In practice, this means that if we are developing a Python project related to data engineering, data science... and we want to let users install it easily, we should provide packages for pip and Anaconda.