|
| 1 | +[](){#ref-software-ml} |
| 2 | +# Machine learning applications and frameworks |
| 3 | + |
| 4 | +CSCS supports a wide range of machine learning (ML) applications and frameworks on its systems. |
| 5 | +Most ML workloads are containerized to ensure portability, reproducibility, and ease of use across environments. |
| 6 | + |
| 7 | +Users can choose between running containers, using provided uenv software stacks, or building custom Python environments tailored to their needs. |
| 8 | + |
| 9 | +## Running machine learning applications with containers |
| 10 | + |
| 11 | +Containerization is the recommended approach for ML workloads on Alps, as it simplifies software management and maximizes compatibility with other systems. |
| 12 | + |
| 13 | +* Users are encouraged to build their own containers, starting from popular sources such as the [Nvidia NGC Catalog](https://catalog.ngc.nvidia.com/containers), which offers a variety of pre-built images optimized for HPC and ML workloads. |
| 14 | +Examples include: |
| 15 | + * [PyTorch NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) |
| 16 | + * [TensorFlow NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow) |
| 17 | +* For frequently changing dependencies, consider creating a virtual environment (venv) mounted into the container. |
| 18 | + |
| 19 | +Helpful references: |
| 20 | + |
| 21 | +* Running containers on Alps: [Container Engine Guide][ref-container-engine] |
| 22 | +* Building custom container images: [Container Build Guide][ref-build-containers] |
| 23 | + |
| 24 | +## Using provided uenv software stacks |
| 25 | + |
| 26 | +Alternatively, CSCS provides pre-configured software stacks ([uenvs][ref-uenv]) that can serve as a starting point for machine learning projects. |
| 27 | +These environments provide optimized compilers, libraries, and selected ML frameworks. |
| 28 | + |
| 29 | +Available ML-related uenvs: |
| 30 | + |
| 31 | +* [PyTorch][ref-uenv-pytorch] — available on [Clariden][ref-cluster-clariden] and [Daint][ref-cluster-daint] |
| 32 | + |
| 33 | +To extend these environments with additional Python packages, it is recommended to create a Python Virtual Environment (venv). |
| 34 | +See this [PyTorch venv example][ref-uenv-pytorch-venv] for details. |
| 35 | + |
| 36 | +!!! note |
| 37 | + While many Python packages provide pre-built binaries for common architectures, some may require building from source. |
| 38 | + |
| 39 | +## Building custom Python environments |
| 40 | + |
| 41 | +Users may also choose to build entirely custom software stacks using Python package managers such as `uv` or `conda`. |
| 42 | +Most ML libraries are available via the [Python Package Index (PyPI)](https://pypi.org/). |
| 43 | + |
| 44 | +To ensure optimal performance on CSCS systems, we recommend starting from an environment that already includes: |
| 45 | + |
| 46 | +* CUDA, cuDNN |
| 47 | +* MPI, NCCL |
| 48 | +* C/C++ compilers |
| 49 | + |
| 50 | +This can be achieved either by: |
| 51 | + |
| 52 | +* building a [custom container image][ref-build-containers] based on a suitable ML-ready base image, |
| 53 | +* or starting from a provided uenv (e.g., [PrgEnv GNU][ref-uenv-prgenv-gnu] or [PyTorch uenv][ref-uenv-pytorch]), |
| 54 | + |
| 55 | +and extending it with a virtual environment. |
| 56 | + |
0 commit comments