-
Notifications
You must be signed in to change notification settings - Fork 0
Custom Conda Environment on HPC Without Using Containers (Worked Example)
This guide provides a worked example of how to set up a custom conda environment on the HPC (High-Performance Computing) system, without relying on containers or your $HOME directory. The example uses Miniforge, a minimal installer for conda, to create an isolated environment for software development or installation. This setup is particularly useful for managing dependencies and ensuring that your software does not interfere with other installations on the HPC.
This approach is especially useful for:
- Installing packages not available through the default Conda channels.
- Managing your own software stack.
- Maintaining clean, isolated environments for different projects.
We will use Miniforge for this example.
Note
ℹ️ Why Miniforge?
Miniforge is a minimal Conda installer that supports community-driven packaging and defaults to conda-forge, a reliable, open-source package repository.
→ What’s the difference between Anaconda, Conda, Miniconda, Mamba, Mambaforge, Micromamba?
I have added two sections to this guide:
1. Custom Conda Environment on HPC Without Using Containers (Miniforge Example):
- This section provides a step-by-step guide to setting up a custom conda environment on the HPC without using containers. It uses Miniforge as the base installer and demonstrates how to install packages, configure channels, and create a wrapper script for easy access.
2. Test the Miniforge Installation to Install a Package from GitHub:
- This section tests the Miniforge installation by installing a package from GitHub. It demonstrates how to use the conda environment for software development and installation without affecting your
$HOMEdirectory or other installations on the HPC. It also shows how to create a wrapper script for the installed package, allowing for easy access and management of the software.
From your workstation, log into the HPC head node:
ssh $(whoami)@hpc.nbi.ac.ukFrom the HPC head node, connect to the software node by typing either software or ssh software23. If prompted, enter your password.
softwareThe worked example is installed in the /ei/software/testing/ tree. You may pick a scratch space or project directory where you want Conda to be installed. This will ensure that the installation does not interfere with your $HOME directory or other installations on the HPC.
Create a bash variable for the base Miniforge installation directory. This will be used throughout the guide to refer to the installation path.
install_base=/ei/software/testing/python_miniforge/25.3.0-3_py3.12_exampleInstall Miniforge in the specified directory. This will create a new directory structure under /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64. This installation will take about 3 minutes to complete.
# create the directory and navigate to it
mkdir -p ${install_base}/src
cd ${install_base}/src
# Download Miniforge for Linux and install it to the base (~3 minutes)
wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -p ${install_base}/x86_64After the installation is complete, you need to activate the conda base environment. This will set up the necessary environment variables and paths for using conda and its packages.
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
$ eval "$(${install_base}/x86_64/bin/conda shell.bash hook)"
# check if mamba and conda are in the path
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ which mamba
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/mamba
(base) $ whereis conda
conda: /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/conda /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/condabin/condaHere we will check the conda details, update the channels to include the Pixi package manager, and ensure that the conda configuration is set up correctly.
# check conda info
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda info
active environment : base
active env location : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64
shell level : 1
user config file : /hpc-home/kaithakg/.condarc
populated config files : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc
conda version : 25.3.0
conda-build version : not installed
python version : 3.12.10.final.0
solver : libmamba (default)
virtual packages : __archspec=1=icelake
__conda=25.3.0=0
__glibc=2.34=0
__linux=5.14.0=0
__unix=0=0
base environment : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64 (writable)
conda av data dir : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/pkgs
/hpc-home/kaithakg/.conda/pkgs
envs directories : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/envs
/hpc-home/kaithakg/.conda/envs
platform : linux-64
user-agent : conda/25.3.0 requests/2.32.3 CPython/3.12.10 Linux/5.14.0-503.26.1.el9_5.x86_64 almalinux/9.6 glibc/2.34 solver/libmamba conda-libmamba-solver/25.3.0 libmambapy/2.1.1
UID:GID : 9404:3658
netrc file : None
offline mode : False
offline mode : FalseImportant
As you can see below, I do not have a .condarc in my $HOME eventhough conda info shows a user config file. I do not want to have a .condarc in my $HOME directory, as I want to keep the conda configuration in the installation base directory. This is a good practice to avoid conflicts with other conda installations or configurations.
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ cat /hpc-home/kaithakg/.condarc
cat: /hpc-home/kaithakg/.condarc: No such file or directory# check the current conda channels, miniforge has conda-forge channel by default
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show channels
channels:
- conda-forge/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ cat ${install_base}/x86_64/.condarc
channels:
- conda-forge# add bioconda channel to the conda config
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config -p ${install_base}/x86_64 --add channels bioconda
(base) $ cat ${install_base}/x86_64/.condarc
channels:
- bioconda
- conda-forge
# you can add others - nvidia and pytorch, like so
(base) $ conda config -p ${install_base}/x86_64 --add channels nvidia
(base) $ conda config -p ${install_base}/x86_64 --add channels pytorch/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show channels
channels:
- bioconda
- conda-forgeGreat! we have added the bioconda channel to our conda configuration.
Important
The Pixi package manager is a new package manager that currently being used to manage packages on the HPC. It is a replacement for the Anaconda channels that were previously used. The Pixi package manager is managed by Prefix, the organisation behind the Pixi package manager. You can find more information about Prefix and the Pixi package manager in the RC Documentation.
# Before adding channel_alias
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show-sources
==> /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc <==
channels:
- bioconda
- conda-forge/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ vim ${install_base}/x86_64/.condarc
(base) $ cat ${install_base}/x86_64/.condarc
channels:
- bioconda
- conda-forge
channel_alias:
https://repo.prefix.dev# After adding channel_alias
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda config --show-sources
==> /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc <==
channel_alias: https://repo.prefix.dev
channels:
- bioconda
- conda-forgeWe have now added the Pixi package manager channel alias to our conda configuration. This will allow us to install packages from the Pixi package manager. We can see that the channel alias is set to https://repo.prefix.dev, which is the Pixi package manager URL.
# Now check conda info
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ conda info
active environment : base
active env location : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64
shell level : 1
user config file : /hpc-home/kaithakg/.condarc
populated config files : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/.condarc
conda version : 25.3.0
conda-build version : not installed
python version : 3.12.10.final.0
solver : libmamba (default)
virtual packages : __archspec=1=icelake
__conda=25.3.0=0
__glibc=2.34=0
__linux=5.14.0=0
__unix=0=0
base environment : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64 (writable)
conda av data dir : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/etc/conda
conda av metadata url : None
channel URLs : https://repo.prefix.dev/bioconda/linux-64
https://repo.prefix.dev/bioconda/noarch
https://repo.prefix.dev/conda-forge/linux-64
https://repo.prefix.dev/conda-forge/noarch
package cache : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/pkgs
/hpc-home/kaithakg/.conda/pkgs
envs directories : /ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/envs
/hpc-home/kaithakg/.conda/envs
platform : linux-64
user-agent : conda/25.3.0 requests/2.32.3 CPython/3.12.10 Linux/5.14.0-503.26.1.el9_5.x86_64 almalinux/9.6 glibc/2.34 solver/libmamba conda-libmamba-solver/25.3.0 libmambapy/2.1.1
UID:GID : 9404:3658
netrc file : None
offline mode : FalseNow that we have set up the conda environment and added the Pixi package manager channel alias, we can install packages using mamba, which is a faster version of conda.
For this example, we will install some common packages that are often used in bioinformatics.
Important
If you installing custom software from GitHub for example, this is where you would install all the dependencies for your software. You can generally find all the dependencies in the setup.py file or requirements.txt file or environment.yml file or pypropyproject.toml in the GitHub repository of the software you are trying to install.
It takes about 5 minutes to install the packages listed below.
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/src
(base) $ which mamba
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/mamba
(base) $ mamba install -y git tabulate numpy pandas
...
...
Transaction finished
Create a wrapper script to source the conda environment and set the PATH variable. This will allow you to easily activate the environment without needing to type the full path each time. The wrapper script will be placed in the /ei/software/testing/bin directory. It can also placed in your projects directory, if you have one, or any other directory that is in your PATH.
cd /ei/software/testing/bin
(base) $ cat > python_miniforge-25.3.0-3_py3.12_example
#!/bin/bash
tool="python_miniforge/25.3.0-3_py3.12_example"
location="/ei/software/testing"
echo "${tool} is sourced from ${location} location"
export PATH="${location}/${tool}/x86_64/bin:$PATH"We can now source this script to activate the conda environment and set the PATH variable. This will allow us to use the installed packages without needing to type the full path each time.
The following section shows how to source the wrappar script and activate the conda environment to install a package from GitHub.
Now that we have set up the conda environment and created a wrapper script, we can test the installation by installing a package from GitHub. This will demonstrate how to use the conda environment for software development and installation without affecting your $HOME directory.
We will install a custom package using the conda environment. This package can be anything you want, but for this example, we will install a simple Python vizgen_data_transfer package, which is a Python wrapper for managing data transfer processes related to Vizgen projects.
This package will be installed in the /ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64 directory, which is a custom location for the package.
This allows you to manage your software installations without cluttering your $HOME directory or interfering with other installations on the HPC. It also allows you to easily share your software with others by simply sharing the installation directory.
# Activate the conda environment
$ source /ei/software/testing/bin/python_miniforge-25.3.0-3_py3.12_example
python_miniforge/25.3.0-3_py3.12_example is sourced from /ei/software/testing locationHere we will clone the vizgen_data_transfer repository from GitHub, create a wheel package, and install it in the custom location we specified earlier. This will allow us to use the package without needing to install it in our $HOME directory or the base conda environment.
mkdir -p /ei/software/testing/vizgen_data_transfer/0.1.0_example/src && \
cd /ei/software/testing/vizgen_data_transfer/0.1.0_example/src && \
git clone https://github.com/EI-CoreBioinformatics/vizgen_data_transfer.git
# Change to the cloned directory and create a wheel package and install it custom location
cd vizgen_data_transfer
# Check the pip location, it should point to the Miniforge installation
which pip
/ei/software/testing/python_miniforge/25.3.0-3_py3.12_example/x86_64/bin/pip
version=0.1.0_example && \
pip wheel -w dist . && \
pip install dist/*whl --prefix=/ei/software/testing/vizgen_data_transfer/${version}/x86_64
As before, we will create a wrapper script to easily source/activate the vizgen_data_transfer package. This wrapper will set the necessary environment variables and paths for the package to work correctly. The script will be placed in the /ei/software/testing/bin directory, similar to the Miniforge wrapper script.
# Create a wrapper script for the vizgen_data_transfer package
cd /ei/software/testing/bin
cat > vizgen_data_transfer-0.1.0_example
#!/bin/bash
source /ei/software/testing/bin/python_miniforge-25.3.0-3_py3.12_example
export PATH=/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/bin:$PATH
export PYTHONPATH=/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/lib/python3.12/site-packages
echo "vizgen_data_transfer/0.1.0_example is sourced from /ei/software/testing location"
Now that we have created the wrapper script for the vizgen_data_transfer package, we can test it by sourcing the script and running the package command.
# Open a new terminal and source the vizgen_data_transfer package
$ source /ei/software/testing/bin/vizgen_data_transfer-0.1.0_example
python_miniforge/25.3.0-3_py3.12_example is sourced from /ei/software/testing location
vizgen_data_transfer/0.1.0_example is sourced from /ei/software/testing location
$ which vizgen_data_transfer
/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/bin/vizgen_data_transfer
$ vizgen_data_transfer -h
usage: vizgen_data_transfer [-h] [--copy_type COPY_TYPE [COPY_TYPE ...]] [--threads THREADS] [--disk] [--vizgen_config VIZGEN_CONFIG] [--debug] run_id
Script for Vizgen data transfer
positional arguments:
run_id Provide run name, for example: 202310261058_VZGEN1_VMSC10202
options:
-h, --help show this help message and exit
--copy_type COPY_TYPE [COPY_TYPE ...]
Provide copy type, for example: raw_data, analysis, output (default: ['raw_data', 'analysis', 'output'])
--threads THREADS Number of threads to use for copying (default: 8)
--disk Enable this option if run has to be copied from the Windows external Hard disk 'G:\Vizgen data Z drive' instead of the default Z: Drive on the analysis machine [default:False]
--vizgen_config VIZGEN_CONFIG
Path to vizgen config file [default:/ei/software/testing/vizgen_data_transfer/0.1.0_example/x86_64/lib/python3.12/site-packages/vizgen_data_transfer/etc/.vizgen_config.toml]
--debug Enable this option for debugging [default:False]
Contact: Gemy George Kaithakottil ([email protected])This confirms that the vizgen_data_transfer package is installed and working correctly.
In this guide, we have successfully set up a custom conda environment using Miniforge on the HPC. We have installed packages from the Pixi package manager and created a wrapper script to easily activate the environment. Additionally, we demonstrated how to install a package from GitHub and create a wrapper script for it. This setup allows for efficient software development and installation on the HPC without affecting your $HOME directory.
- Induction
- HPC Best practice
- Job Arrays - RC documentation
- Methods to Improve I/O Performance - RC documentation
- Customising your bash profile for ease and efficiency
- Customise bash profile: Logging Your Command History Automatically
- Using the ei-gpu partition on the Earlham Institute computing cluster
- Using the GPUs at EI
- HPC Job Summary Tool
- EI Cloud (CyVerse)
- Git and GitHub
- Worked examples
- Job Arrays
- Using Parabricks on the GPUs
- dependencies
- Software installations
- Workflow management system
- Transfers
- Local (mounting HPC storage)
- Remote - <1gb (ood)
- Remote - <50gb (nbi drop off)
- Remote - No limit (globus)
- mv command