Skip to content

Commit cfe1733

Browse files
committed
reworked introduction
1 parent e373d06 commit cfe1733

File tree

2 files changed

+62
-17
lines changed

2 files changed

+62
-17
lines changed

docs/software/ml/index.md

Lines changed: 61 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,69 @@
11
[](){#ref-software-ml}
22
# Machine Learning Applications and Frameworks
33

4-
## Containerized Machine Learning Applications
4+
## Overview
55

6-
CSCS supports a variety of machine learning applications and frameworks on its
7-
systems. Typically, machine learning applications are containerized to ensure
8-
compatibility and ease of use across different environments.
6+
CSCS supports a wide range of machine learning (ML) applications and frameworks
7+
on its systems. Most ML workloads are containerized to ensure portability,
8+
reproducibility, and ease of use across environments.
99

10-
CSCS does not provide any specific machine learning container images, but users
11-
can create their own containers using popular base container registries, such
12-
as [Nvidia's NGC Catalog](https://catalog.ngc.nvidia.com/containers). These
13-
containers can be run on Alps, allowing users to leverage the
14-
high-performance computing resources available for their machine learning
15-
tasks.
10+
Users can choose between running containers, using provided uenv software
11+
stacks, or building custom Python environments tailored to their needs.
1612

17-
* Jobs using containers can be easily set up and submitted using the [container
18-
engine][ref-container-engine].
19-
* To build images, see the [guide to building container images on
20-
Alps][ref-build-containers].
13+
## Running Machine Learning Applications with Containers
2114

22-
## Uenv Software Stacks
15+
Containerization is the recommended approach for ML workloads on Alps, as it
16+
simplifies software management and maximizes compatibility with other systems.
17+
18+
* CSCS does not provide ready-to-use ML container images
19+
* Users are encouraged to build their own containers, starting from popular
20+
sources such as the [Nvidia NGC
21+
Catalog](https://catalog.ngc.nvidia.com/containers)
22+
23+
Helpful references:
24+
25+
* Running containers on Alps: [Container Engine Guide][ref-container-engine]
26+
* Building custom container images: [Container Build
27+
Guide][ref-build-containers]
28+
29+
## Using Provided uenv Software Stacks
30+
31+
Alternatively, CSCS provides pre-configured software stacks ([uenvs][ref-uenv])
32+
that can serve as a starting point for machine learning projects. These
33+
environments provide optimized compilers, libraries, and selected ML
34+
frameworks.
35+
36+
Available ML-related uenvs:
37+
38+
* [PyTorch][ref-uenv-pytorch] — available on [Clariden][ref-cluster-clariden]
39+
and [Daint][ref-cluster-daint]
40+
41+
To extend these environments with additional Python packages, it is recommended
42+
to create a Python Virtual Environment (venv). See this [PyTorch venv
43+
example][ref-uenv-pytorch-venv] for details.
44+
45+
!!! note
46+
While many Python packages provide pre-built binaries for common
47+
architectures, some may require building from source.
48+
49+
## Building Custom Python Environments
50+
51+
Users may also choose to build entirely custom software stacks using Python
52+
package managers such as `pip` or `conda`. Most ML libraries are available via
53+
the [Python Package Index (PyPI)](https://pypi.org/).
54+
55+
To ensure optimal performance on CSCS systems, we recommend starting from an
56+
environment that already includes:
57+
58+
* CUDA, cuDNN
59+
* MPI, NCCL
60+
* c/c++ compilers
61+
62+
This can be achieved either by:
63+
64+
* Building a [custom container image][ref-build-containers] based on a suitable
65+
ML-ready base image.
66+
* Starting from a provided uenv (e.g., [PrgEnv GNU][ref-uenv-prgenv-gnu] or
67+
[PyTorch uenv][ref-uenv-pytorch]) and extending it with a virtual
68+
environment.
2369

24-
CSCS provides a base [PyTorch uenv][ref-uenv-pytorch] that is available on the
25-
[Clariden][ref-cluster-clariden] and [Daint][ref-cluster-daint] cluster.

docs/software/ml/pytorch.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@ There are two ways to access the software provided by the uenv, once it has been
284284

285285
[Check out the guide for using Spack with uenv][ref-building-uenv-spack].
286286

287+
[](){#ref-uenv-pytorch-venv}
287288
## Adding Python packages on top of the uenv
288289

289290
Uenvs are read-only, and cannot be modified. However, it is possible to add Python packages on top of the uenv using virtual environments.

0 commit comments

Comments
 (0)