diff --git a/content/dependencies.rst b/content/dependencies.rst
index 0081116f..9097e571 100644
--- a/content/dependencies.rst
+++ b/content/dependencies.rst
@@ -6,7 +6,7 @@ Dependency management
.. questions::
- Do you expect your code to work in one year? Five? What if it
- uses ``numpy`` or ``tensorflow`` or ``random-github-package`` ?
+ uses ``numpy`` or ``pytorch`` or ``random-github-package`` ?
- How can my collaborators easily install my code with all the necessary dependencies?
- How can I make it easy for my others (and me in future) to reproduce my results?
- How can I work on two (or more) projects with different and conflicting dependencies?
@@ -19,8 +19,8 @@ Dependency management
- Simplify the use and reuse of scripts and projects
-How do you track dependencies of your project?
-----------------------------------------------
+What even is a dependency?
+--------------------------
* **Dependency**: Reliance on a external component. In this case, a
separately installed software package such as ``numpy``.
@@ -52,26 +52,24 @@ PyPI (The Python Package Index) and conda ecosystem
PyPI (The Python Package Index) and conda are popular packaging/dependency
management tools:
-- When you run ``pip install`` you typically install from `PyPI
- `__, but you can also ``pip install`` from a GitHub
- repository and similar.
+- When you use ``pip`` or ``uv`` you typically install from `PyPI
+ `__, but you can also install packages from
+ source code provided in repositories in e.g. Github.
-- When you run ``conda install`` you typically install from `Anaconda Cloud
+- When you use ``conda`` you typically install from `Anaconda Cloud
`__ where there are conda channels maintained
by Anaconda Inc. and by various communities.
-
Why are there two ecosystems?
.. admonition:: PyPI
- - **Installation tool:** ``pip``
+ - **Installation tool:** `pip `__, `uv `__
- **Summary:** PyPI is traditionally used for Python-only packages or
for Python interfaces to external libraries. There are also packages
that have bundled external libraries (such as numpy).
- - **Amount of packages:** Huge number. Old versions are supported for
- a long time.
+ - **Amount of packages:** Huge number.
- **How libraries are handled:** If your code depends on external
libraries or tools, these things need to be either included in the
pip-package or provided via some other installation system (like
@@ -84,14 +82,13 @@ Why are there two ecosystems?
.. admonition:: Conda
- - **Installation tool:** ``conda`` or ``mamba``
+ - **Installation tool:** `conda `__, `mamba `__
- **Summary:** Conda aims to be a more general package distribution tool
- and it tries to provide not only the Python packages, but also libraries
- and tools needed by the Python packages. Most scientific software written
- in Python uses external libraries to speed up calculations and installing
- these libraries can often become complicated without conda.
- - **Amount of packages:** Curated list of packages in defaults-channel, huge
- number in community managed channels. Other packages can be installed via pip.
+ and it tries to provide not only the Python packages, but also other
+ libraries and tools needed by the Python packages.
+ - **Amount of packages:** Huge number in conda-forge and in other community
+ channels. Curated versions in licensed channels. Other packages can be
+ installed via pip.
- **How libraries are handled:** Required libraries are installed as separate
conda packages.
- **Pros:**
@@ -104,27 +101,39 @@ Why are there two ecosystems?
Conda ecosystem explained
-------------------------
-.. warning::
+.. figure:: img/dependencies/conda-ecosystem.png
+ :alt: Figure that shows non-free and free parts of the conda ecosystem. Non-free side has Anaconda Inc., Anaconda repository, Anaconda channels, Miniconda and Anaconda distribution. Free side has Community, conda-forge and Miniforge.
+
+ Figure 1: Conda ecosystem visualized
- Anaconda has recently changed its licensing terms, which affects its
- use in a professional setting. This caused uproar among academia
- and Anaconda modified their position in
- `this article `__.
+.. admonition:: Licensing and conda
- Main points of the article are:
+ Conda was originally created by Anaconda Inc. and they provide their
+ own licensed packages. At the same time a big open-source community
+ provides most of the packages. Thus it is good to know what is free
+ and open source and what is under licenses.
- - conda (installation tool) and community channels (e.g. conda-forge)
- are free to use.
- - Anaconda repository and **Anaconda's channels in the community repository**
- are free for universities and companies with fewer than 200 employees.
- Non-university research institutions and national laboratories need
- licenses.
- - Miniconda is free, when it does not download Anaconda's packages.
- - Miniforge is not related to Anaconda, so it is free.
+ It is highly recommended to use Miniforge to create the environments
+ and to use conda-forge as the main channel for software. You can add
+ `nodefaults` to channel list to disable Anaconda's repositories.
- For ease of use on sharing environment files, we recommend using
- Miniforge to create the environments and using conda-forge as the main
- channel that provides software.
+ **Free:**
+ - conda and mamba (installation tools)
+ - community channels (e.g. conda-forge)
+ - Miniforge
+ **Licensed:**
+ - Anaconda distribution
+ - Anaconda repository (`repo.anaconda.com `__)
+ - Anaconda's channels in the community repository (anaconda.org)
+ are free in some cases.
+ - Miniconda is free, when it does not download Anaconda's packages.
+
+ All of these are licensed under Anaconda Inc. and free in some cases. See
+ `Academic Policy `__,
+ `Terms of Service `__
+ and
+ `Non-profit & Research Policy `__
+ for more information.
- Package repositories:
@@ -162,13 +171,10 @@ Conda ecosystem explained
- Package managers:
- - `conda `__ is a package and environment management system
+ - `conda `__ is a package and environment management system
used by Anaconda. It is an open source project maintained by Anaconda Inc..
- - `mamba `__ is a drop in
- replacement for conda. It used be much faster than conda due to better
- dependency solver but nowadays conda
- `also uses the same solver `__.
- It still has some UI improvements.
+ - `mamba `__ is a drop in
+ replacement for conda with additional UI features.
Exercise 2
----------
@@ -202,16 +208,17 @@ An **environment** is a basically a folder that contains a Python
interpreter and other Python packages in a folder structure similar
to the operating system's folder structure.
-These environments can be created by the
-`venv-module `__ in base
-Python, by a pip package called
-`virtualenv `_
-or by conda/mamba.
+These environments can be created by:
+
+- `venv `__-module in base Python
+- `uv `__
+- `conda `__ / `mamba `__.
+- pip package called `virtualenv `__
Using these environments is highly recommended because they solve the
following problems:
-- Installing environments won't modify system packages.
+- Installing packages in environments won't modify system packages.
- You can install specific versions of packages into them.
@@ -268,6 +275,12 @@ Creating Python environments
$ conda activate my-environment
+ or
+
+ .. code-block:: console
+
+ $ source activate my-environment
+
.. callout:: conda activate versus source activate
``conda activate`` will only work if you have run ``conda init``
@@ -318,11 +331,16 @@ Creating Python environments
``activate``.
- **Linux/Mac OSX**:
+
.. code-block:: console
$ source my-environment/bin/activate
- - **Windows**: most likely you can find it in the Scripts folder.
+ - **Windows**:
+
+ .. code-block:: console
+
+ $ .venv\Scripts\activate
Now the environment should be active. You can then install packages
listed in ``requirements.txt`` with
@@ -576,6 +594,106 @@ Exercise 4
Export the environment you previously created.
+How to communicate the dependencies as part of a report/thesis/publication
+--------------------------------------------------------------------------
+
+Each notebook or script or project which depends on libraries should come with
+either a ``requirements.txt`` or a ``environment.yml``, unless you are creating
+and distributing this project as Python package (see next section).
+
+- Attach a ``requirements.txt`` or a ``environment.yml`` to your thesis.
+- Even better: put ``requirements.txt`` or a ``environment.yml`` in your Git repository along your code.
+- Even better: also binderize your analysis pipeline (more about that in a later session).
+
+
+.. _version_pinning:
+
+Version pinning for package creators
+------------------------------------
+
+We will talk about packaging in a different session but when you create a library and package
+projects, you express dependencies either in ``pyproject.toml`` (or ``setup.py``)
+(PyPI) or ``meta.yaml`` (conda).
+
+These dependencies will then be used by either other libraries (who in turn
+write their own ``setup.py`` or ``pyproject.toml`` or ``meta.yaml``) or by
+people directly (filling out ``requirements.txt`` or a ``environment.yml``).
+
+Now as a library creator you have a difficult choice. You can either pin versions very
+narrowly like here (example taken from ``setup.py``):
+
+.. code-block:: python
+ :emphasize-lines: 3-6
+
+ # ...
+ install_requires=[
+ 'numpy==1.19.2',
+ 'matplotlib==3.3.2'
+ 'pandas==1.1.2'
+ 'scipy==1.5.2'
+ ]
+ # ...
+
+or you can define a range or keep them undefined like here (example taken from
+``setup.py``):
+
+.. code-block:: python
+ :emphasize-lines: 3-6
+
+ # ...
+ install_requires=[
+ 'numpy',
+ 'matplotlib'
+ 'pandas'
+ 'scipy'
+ ]
+ # ...
+
+Should we pin the versions here or not?
+
+- Pinning versions here would be good for reproducibility.
+
+- However pinning versions may make it difficult for this library to be used in a project alongside other
+ libraries with conflicting version dependencies.
+
+- Therefore **as library creator make the version requirements as wide as possible**.
+
+ - Set minimum version when you know of a reason: ``>=2.1``
+
+ - Sometimes set maximum version to next major version (``<4``) (when
+ you currently use ``3.x.y``) when you expect issues with next
+ major version.
+
+- As the "end consumer" of libraries, define your dependencies as narrowly as possible.
+
+
+Common issues
+-------------
+
+Here are couple of common issues that arise for new users of environments.
+
+1. **Global installs:** Installing packages with ``pip install --user``.
+ This installs packages to your home directory and makes them globally
+ available. This can cause major problems because these packages override
+ packages installed in the environments.
+2. **Environments using lots of storage space:** Python packages that contain
+ libraries can take a lot of space, which can cause quota problems when
+ you're installing Python environments in systems with limited storage
+ space. By default packages are cached to your home folder
+ (see these documentations for `pip `__,
+ `conda `__,
+ and `uv `__).
+ Conda and uv reuse packages across multiple environments (if you create
+ another environment with the same packages, it won't take more space).
+ For these tools it is important that the cache and environments are
+ stored in the same filesystem. Pip only caches downloads and self-built
+ packages, it won't reuse them across environments.
+3. **Environments creating huge numbers of files:** Python environments
+ can have huge numbers of files. Some systems (like shared HPC systems) do
+ not like that the are lots of small files in the storage system. You can
+ use `containers `__ to put
+ the environement into a single file to solve these problems.
+
Additional tips and tricks
--------------------------
@@ -645,7 +763,7 @@ Additional tips and tricks
Packages available in GitHub or other repositorios
can be given as a URL in ``requirements.txt``.
- For example, to install a development version of the
+ For example, to install a development version of the
`black code formatter `__, one can
write the following ``requirement.txt``.
@@ -663,96 +781,26 @@ Additional tips and tricks
download the zip archive of the repository.
-How to communicate the dependencies as part of a report/thesis/publication
---------------------------------------------------------------------------
-
-Each notebook or script or project which depends on libraries should come with
-either a ``requirements.txt`` or a ``environment.yml``, unless you are creating
-and distributing this project as Python package (see next section).
-
-- Attach a ``requirements.txt`` or a ``environment.yml`` to your thesis.
-- Even better: put ``requirements.txt`` or a ``environment.yml`` in your Git repository along your code.
-- Even better: also binderize your analysis pipeline (more about that in a later session).
-
-
-.. _version_pinning:
-
-Version pinning for package creators
-------------------------------------
-
-We will talk about packaging in a different session but when you create a library and package
-projects, you express dependencies either in ``pyproject.toml`` (or ``setup.py``)
-(PyPI) or ``meta.yaml`` (conda).
-
-These dependencies will then be used by either other libraries (who in turn
-write their own ``setup.py`` or ``pyproject.toml`` or ``meta.yaml``) or by
-people directly (filling out ``requirements.txt`` or a ``environment.yml``).
-
-Now as a library creator you have a difficult choice. You can either pin versions very
-narrowly like here (example taken from ``setup.py``):
-
-.. code-block:: python
- :emphasize-lines: 3-6
-
- # ...
- install_requires=[
- 'numpy==1.19.2',
- 'matplotlib==3.3.2'
- 'pandas==1.1.2'
- 'scipy==1.5.2'
- ]
- # ...
-
-or you can define a range or keep them undefined like here (example taken from
-``setup.py``):
-
-.. code-block:: python
- :emphasize-lines: 3-6
-
- # ...
- install_requires=[
- 'numpy',
- 'matplotlib'
- 'pandas'
- 'scipy'
- ]
- # ...
-
-Should we pin the versions here or not?
-
-- Pinning versions here would be good for reproducibility.
-
-- However pinning versions may make it difficult for this library to be used in a project alongside other
- libraries with conflicting version dependencies.
-
-- Therefore **as library creator make the version requirements as wide as possible**.
-
- - Set minimum version when you know of a reason: ``>=2.1``
-
- - Sometimes set maximum version to next major version (``<4``) (when
- you currently use ``3.x.y``) when you expect issues with next
- major version.
-
-- As the "end consumer" of libraries, define your dependencies as narrowly as possible.
-
See also
--------
Other tools for dependency management:
-- `Poetry `__: dependency management and packaging
-- `Pipenv `__: dependency management, alternative to Poetry
-- `pyenv `__: if you need different Python versions for different projects
-- `micropipenv `__: lightweight tool to "rule them all"
-- `mamba `__: a drop in replacement for
- conda that does installations faster.
-- `miniforge `__: Miniconda alternative with
- conda-forge as the default channel and optionally mamba as the default installer.
+- `uv `__: Tool for managing multiple Python
+ versions and environments.
+- `Poetry `__: Environment and package creation
+ tool.
+- `Pipenv `__: Environment creation tool.
+- `pyenv `__: Tool for installing multiple
+ different Python versions.
- `micromamba `__:
- tiny version of Mamba as a static C++ executable. Does not need base environment or
- Python for installing an environment.
-- `pixi `__: a package management tool which builds upon the foundation of the conda ecosystem.
+ tiny version of Mamba as a static C++ executable. Does not need base
+ environment or Python for installing an environment.
+- `micropipenv `__: Small tool
+ that can install dependencies from multiple different environment formats.
+- `pixi `__ & `prefix.dev `__: A package
+ ecosystem that install all sorts of packages using the conda ecosystem.
Other resources:
diff --git a/content/img/dependencies/conda-ecosystem.graphml b/content/img/dependencies/conda-ecosystem.graphml
new file mode 100644
index 00000000..ea0c1758
--- /dev/null
+++ b/content/img/dependencies/conda-ecosystem.graphml
@@ -0,0 +1,207 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Non-free
+
+
+
+
+
+
+
+
+
+
+
+ Free
+
+
+
+
+
+
+
+
+
+
+
+ Anaconda distribution
+
+
+
+
+
+
+
+
+
+
+
+ Anaconda repository
+
+
+
+
+
+
+
+
+
+
+
+ Anaconda channels
+
+
+
+
+
+
+
+
+
+
+
+ conda-forge
+
+
+
+
+
+
+
+
+
+
+
+ Anaconda Inc.
+
+
+
+
+
+
+
+
+
+
+
+ Community
+
+
+
+
+
+
+
+
+
+
+
+ Miniforge
+
+
+
+
+
+
+
+
+
+
+
+ Miniconda
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/content/img/dependencies/conda-ecosystem.png b/content/img/dependencies/conda-ecosystem.png
new file mode 100644
index 00000000..c30774ef
Binary files /dev/null and b/content/img/dependencies/conda-ecosystem.png differ