|
1 | 1 | [](){#ref-software-ml} |
2 | 2 | # Machine Learning Applications and Frameworks |
3 | 3 |
|
4 | | -## Containerized Machine Learning Applications |
| 4 | +## Overview |
5 | 5 |
|
6 | | -CSCS supports a variety of machine learning applications and frameworks on its |
7 | | -systems. Typically, machine learning applications are containerized to ensure |
8 | | -compatibility and ease of use across different environments. |
| 6 | +CSCS supports a wide range of machine learning (ML) applications and frameworks |
| 7 | +on its systems. Most ML workloads are containerized to ensure portability, |
| 8 | +reproducibility, and ease of use across environments. |
9 | 9 |
|
10 | | -CSCS does not provide any specific machine learning container images, but users |
11 | | -can create their own containers using popular base container registries, such |
12 | | -as [Nvidia's NGC Catalog](https://catalog.ngc.nvidia.com/containers). These |
13 | | -containers can be run on Alps, allowing users to leverage the |
14 | | -high-performance computing resources available for their machine learning |
15 | | -tasks. |
| 10 | +Users can choose between running containers, using provided uenv software |
| 11 | +stacks, or building custom Python environments tailored to their needs. |
16 | 12 |
|
17 | | -* Jobs using containers can be easily set up and submitted using the [container |
18 | | - engine][ref-container-engine]. |
19 | | -* To build images, see the [guide to building container images on |
20 | | - Alps][ref-build-containers]. |
| 13 | +## Running Machine Learning Applications with Containers |
21 | 14 |
|
22 | | -## Uenv Software Stacks |
| 15 | +Containerization is the recommended approach for ML workloads on Alps, as it |
| 16 | +simplifies software management and maximizes compatibility with other systems. |
| 17 | + |
| 18 | +* CSCS does not provide ready-to-use ML container images |
| 19 | +* Users are encouraged to build their own containers, starting from popular |
| 20 | + sources such as the [Nvidia NGC |
| 21 | + Catalog](https://catalog.ngc.nvidia.com/containers) |
| 22 | + |
| 23 | +Helpful references: |
| 24 | + |
| 25 | +* Running containers on Alps: [Container Engine Guide][ref-container-engine] |
| 26 | +* Building custom container images: [Container Build |
| 27 | + Guide][ref-build-containers] |
| 28 | + |
| 29 | +## Using Provided uenv Software Stacks |
| 30 | + |
| 31 | +Alternatively, CSCS provides pre-configured software stacks ([uenvs][ref-uenv]) |
| 32 | +that can serve as a starting point for machine learning projects. These |
| 33 | +environments provide optimized compilers, libraries, and selected ML |
| 34 | +frameworks. |
| 35 | + |
| 36 | +Available ML-related uenvs: |
| 37 | + |
| 38 | +* [PyTorch][ref-uenv-pytorch] — available on [Clariden][ref-cluster-clariden] |
| 39 | + and [Daint][ref-cluster-daint] |
| 40 | + |
| 41 | +To extend these environments with additional Python packages, it is recommended |
| 42 | +to create a Python Virtual Environment (venv). See this [PyTorch venv |
| 43 | +example][ref-uenv-pytorch-venv] for details. |
| 44 | + |
| 45 | +!!! note |
| 46 | + While many Python packages provide pre-built binaries for common |
| 47 | + architectures, some may require building from source. |
| 48 | + |
| 49 | +## Building Custom Python Environments |
| 50 | + |
| 51 | +Users may also choose to build entirely custom software stacks using Python |
| 52 | +package managers such as `pip` or `conda`. Most ML libraries are available via |
| 53 | +the [Python Package Index (PyPI)](https://pypi.org/). |
| 54 | + |
| 55 | +To ensure optimal performance on CSCS systems, we recommend starting from an |
| 56 | +environment that already includes: |
| 57 | + |
| 58 | +* CUDA, cuDNN |
| 59 | +* MPI, NCCL |
| 60 | +* c/c++ compilers |
| 61 | + |
| 62 | +This can be achieved either by: |
| 63 | + |
| 64 | +* Building a [custom container image][ref-build-containers] based on a suitable |
| 65 | + ML-ready base image. |
| 66 | +* Starting from a provided uenv (e.g., [PrgEnv GNU][ref-uenv-prgenv-gnu] or |
| 67 | + [PyTorch uenv][ref-uenv-pytorch]) and extending it with a virtual |
| 68 | + environment. |
23 | 69 |
|
24 | | -CSCS provides a base [PyTorch uenv][ref-uenv-pytorch] that is available on the |
25 | | -[Clariden][ref-cluster-clariden] and [Daint][ref-cluster-daint] cluster. |
|
0 commit comments