From 7d84318086478d165ce25648244e1f8b8248e25c Mon Sep 17 00:00:00 2001 From: Mikael Simberg Date: Wed, 9 Jul 2025 09:09:10 +0200 Subject: [PATCH 1/6] Fix more spelling, add more words to whitelist --- .github/actions/spelling/allow.txt | 5 +++++ .github/actions/spelling/block-delimiters.list | 4 ++++ .github/actions/spelling/patterns.txt | 17 ++++++++++++----- docs/clusters/eiger.md | 2 +- docs/guides/storage.md | 6 +++--- docs/services/cicd.md | 7 +++---- docs/software/ml/pytorch.md | 2 +- docs/storage/filesystems.md | 2 +- 8 files changed, 30 insertions(+), 15 deletions(-) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 6b920419..8f42dbd9 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -109,12 +109,14 @@ cuda customised dcomex diagonalisation +dimms dockerhub dotenv eiger epyc filesystems fontawesome +gdrcopy gitlab gpu groundstate @@ -122,15 +124,18 @@ ijulia inodes iopsstor jfrog +jupyter lexer libfabric miniconda mpi mps multitenancy +nanotron netrc nsight numa +nvdashboard nvidia octicons oom diff --git a/.github/actions/spelling/block-delimiters.list b/.github/actions/spelling/block-delimiters.list index 9a0a87ca..a4d751d5 100644 --- a/.github/actions/spelling/block-delimiters.list +++ b/.github/actions/spelling/block-delimiters.list @@ -5,3 +5,7 @@ # ignore code blocks ``` ``` + +# ignore indented code blocks + ``` + ``` diff --git a/.github/actions/spelling/patterns.txt b/.github/actions/spelling/patterns.txt index 6f65d370..9352f85a 100644 --- a/.github/actions/spelling/patterns.txt +++ b/.github/actions/spelling/patterns.txt @@ -1,18 +1,25 @@ -# Recognized as "Firec" and "REST" with the regular rules, so in patterns.txt -# instead of allow.txt +# Recognized as separate words (e.g. "Firec" and "REST") with the regular rules, +# so in patterns.txt instead of allow.txt FirecREST RESTful +IPyParallel # markdown figure -^!\[.*\]\(.*\)$ + ^!\[.*\]\(.*\)$ # Most obvious URLs https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*) -# Markdown references (definition and use) +# Markdown references and URLs (definition and use) ^\[\]\(\){#[a-z-]+}$ -\]\(#[a-z-]+\) +\]\([^\s]+\) \]\[[a-z-]+\] +# Markdown URLs + # Inline code \`[^\`]+\` + +# kebab-case and snake_case words +[a-z]+-[a-z-]+ +[a-z]+_[a-z_]+ diff --git a/docs/clusters/eiger.md b/docs/clusters/eiger.md index cb8d239a..2b5e5bed 100644 --- a/docs/clusters/eiger.md +++ b/docs/clusters/eiger.md @@ -37,7 +37,7 @@ Eiger is an Alps cluster that provides compute nodes and file systems designed t Eiger consists of multicore [AMD Epyc Rome][ref-alps-zen2-node] compute nodes: please note that the total number of available compute nodes on the system might vary over time. See the [Slurm documentation][ref-slurm-partitions-nodecount] for information on how to check the number of nodes. -Additionally, there are four login nodes with hostnames `eiger-ln00[1-4]`. +Additionally, there are four login nodes with host names `eiger-ln00[1-4]`. ### Storage and file systems diff --git a/docs/guides/storage.md b/docs/guides/storage.md index 17048328..34cc6e06 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -124,12 +124,12 @@ Its performance is roughly the same on [Capstor][ref-alps-capstor] and [Iopsstor This data is globally synchronized, which means Lustre is not well suited to handling many small files, see the discussion on [how to handle many small files][ref-guides-storage-small-files]. The data itself is subdivided in blocks of size `` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST). -The blocksize and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories ihneriting them from their parent directory. +The block size and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories ihneriting them from their parent directory. The `lfs getstripe ` command can be used to get information on the stripe settings of a path. For directories and empty files `lfs setstripe --stripe-count --stripe-size ` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout -!!! tip "A blocksize of 4MB gives good throughput, without being overly big..." +!!! tip "A block size of 4MB gives good throughput, without being overly big..." ... so it is a good choice when reading a file sequentially or in large chunks, but if one reads shorter chunks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase. See the [Lustre documentation](https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace) for more information. @@ -149,7 +149,7 @@ With it it is possible to create a Progressive file layout switching `--stripe-c ### Iopsstor vs Capstor [Iopsstor][ref-alps-iopsstor] uses SSD as OST, thus random access is quick, and the performance of the single OST is high. -[Capstor][ref-alps-capstor] on another hand uses harddisks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger. +[Capstor][ref-alps-capstor] on another hand uses hard disks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger. See for example the [ML filesystem guide][ref-mlp-storage-suitability]. [](){#ref-guides-storage-small-files} diff --git a/docs/services/cicd.md b/docs/services/cicd.md index 7a684395..0932d418 100644 --- a/docs/services/cicd.md +++ b/docs/services/cicd.md @@ -994,7 +994,7 @@ The default is `none`, and you must explicitly set it to `fetch` or `clone` to ##### `CSCS_CUDA_MPS` Optional variable, default is `NO` -Enable running with nvidia-mps-server, which allows multiple ranks sharing the same GPU. +Enable running with `nvidia-mps-server`, which allows multiple ranks sharing the same GPU. ##### `USE_MPI` Optional variable, default is `AUTO` @@ -1202,7 +1202,7 @@ Loads the view of a uenv. ##### `CSCS_CUDA_MPS` Optional variable, default is `NO` -Enable running with nvidia-mps-server, which allows multiple ranks sharing the same GPU. +Enable running with `nvidia-mps-server`, which allows multiple ranks sharing the same GPU. #### Example jobs ```yaml @@ -1405,8 +1405,7 @@ A couple of projects which use this CI setup. Please have a look there for more advanced usage: * [dcomex-framework](https://github.com/DComEX/dcomex-framework): entry point is `ci/prototype.yml` -* [mars](https://bitbucket.org/zulianp/mars/src/development/): two pipelines, with entry points `ci/gitlab/cscs/gpu/gitlab- -daint.yml` and `ci/gitlab/cscs/mc/gitlab-daint.yml` +* [mars](https://bitbucket.org/zulianp/mars/src/development/): two pipelines, with entry points `ci/gitlab/cscs/gpu/gitlab-daint.yml` and `ci/gitlab/cscs/mc/gitlab-daint.yml` * [sparse_accumulation](https://github.com/lab-cosmo/sparse_accumulation): entry point is `ci/pipeline.yml` * [gt4py](https://github.com/GridTools/gt4py): entry point is `ci/cscs-ci.yml` * [SIRIUS](https://github.com/electronic-structure/SIRIUS): entry point is `ci/cscs-daint.yml` diff --git a/docs/software/ml/pytorch.md b/docs/software/ml/pytorch.md index c9e4d390..a641182a 100644 --- a/docs/software/ml/pytorch.md +++ b/docs/software/ml/pytorch.md @@ -383,7 +383,7 @@ srun bash -c " 6. Disable GPU support in MPICH, as it [can lead to deadlocks](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html#inter-gpu-communication-with-cuda-aware-mpi) when using together with nccl. 7. Avoid writing JITed binaries to the (distributed) file system, which could lead to performance issues. 8. These variables should always be set for correctness and optimal performance when using NCCL, see [the detailed explanation][ref-communication-nccl]. -9. `RANK` and `LOCAL_RANK` are set per-process by the Slurmjob launcher. +9. `RANK` and `LOCAL_RANK` are set per-process by the Slurm job launcher. 10. Activate the virtual environment created on top of the uenv (if any). Please follow the guidelines for [python virtual environments with uenv][ref-guides-storage-venv] to enhance scalability and reduce load times. diff --git a/docs/storage/filesystems.md b/docs/storage/filesystems.md index 36ebc7cb..57ed49cc 100644 --- a/docs/storage/filesystems.md +++ b/docs/storage/filesystems.md @@ -124,7 +124,7 @@ Please ensure that you move important data to a file system with backups, for ex ## Store Store is a large, medium-performance, storage on the [Capstor][ref-alps-capstor] Lustre file system for sharing data within a project, and for medium term data storage. -See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best preformance out of the filesystem. +See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best performance out of the filesystem. Space on Store is allocated per-project, with a path created for each project. To accomodate the different customers and projects on Alps, the project paths are organised as follows: From ebc32b0db92a862f91863fdfb7452212e6119bb6 Mon Sep 17 00:00:00 2001 From: Mikael Simberg Date: Wed, 9 Jul 2025 09:12:47 +0200 Subject: [PATCH 2/6] Wrap package names in backticks for monospace --- docs/software/ml/pytorch.md | 420 ++++++++++++++++++------------------ 1 file changed, 210 insertions(+), 210 deletions(-) diff --git a/docs/software/ml/pytorch.md b/docs/software/ml/pytorch.md index a641182a..04ac1054 100644 --- a/docs/software/ml/pytorch.md +++ b/docs/software/ml/pytorch.md @@ -23,221 +23,221 @@ The PyTorch uenv is versioned according to the PyTorch version it provides. | Package | Version | |---------------------|------------------| - | abseil-cpp | 20240722.0 | - | alsa-lib | 1.2.3.2 | - | autoconf | 2.72 | - | automake | 1.16.5 | - | aws-ofi-nccl | 1.14.0 | - | berkeley-db | 18.1.40 | - | bison | 3.8.2 | - | boost | 1.86.0 | - | bzip2 | 1.0.8 | - | ca-certificates-mozilla | 2023-05-30 | - | cmake | 3.30.5 | - | cpuinfo | 2024-09-26 | - | cray-gtl | 8.1.32 | - | cray-mpich | 8.1.32 | - | cray-pals | 1.3.2 | - | cray-pmi | 6.1.15 | - | cuda | 12.6.0 | - | cudnn | 9.2.0.82-12 | - | curl | 8.10.1 | - | cutensor | 2.0.1.2 | - | diffutils | 3.10 | - | eigen | 3.4.0 | - | elfutils | 0.191 | - | expat | 2.6.4 | - | faiss | 1.8.0 | - | ffmpeg | 5.1.4 | - | fftw | 3.3.10 | - | findutils | 4.9.0 | - | flac | 1.4.3 | - | fmt | 11.0.2 | - | fp16 | 2020-05-14 | - | fxdiv | 2020-04-17 | - | gawk | 4.2.1 | - | gcc | 13.3.0 | - | gcc-runtime | 13.3.0 | - | gdb | 15.2 | - | gdbm | 1.23 | - | gettext | 0.22.5 | - | git | 2.47.0 | - | glibc | 2.31 | - | gloo | 2023-12-03 | - | gmake | 4.4.1 | - | gmp | 6.3.0 | - | gmp | 6.3.0 | - | gnuconfig | 2024-07-27 | - | googletest | 1.12.1 | - | gperftools | 2.16 | - | hdf5 | 1.14.5 | - | hwloc | 2.11.1 | - | hydra | 4.2.1 | - | krb5 | 1.21.3 | - | libaio | 0.3.113 | - | libarchive | 3.7.6 | - | libbsd | 0.12.2 | - | libedit | 3.1-20240808 | - | libfabric | 1.15.2.0 | - | libffi | 3.4.6 | - | libgit2 | 1.8.0 | - | libiconv | 1.17 | - | libidn2 | 2.3.7 | - | libjpeg-turbo | 3.0.3 | - | libmd | 1.0.4 | - | libmicrohttpd | 0.9.50 | - | libogg | 1.3.5 | - | libpciaccess | 0.17 | - | libpng | 1.6.39 | - | libsigsegv | 2.14 | - | libssh2 | 1.11.1 | - | libtool | 2.4.6 | - | libtool | 2.4.7 | - | libunistring | 1.2 | - | libuv | 1.48.0 | - | libvorbis | 1.3.7 | - | libxcrypt | 4.4.35 | - | libxml2 | 2.13.4 | - | libyaml | 0.2.5 | - | lz4 | 1.10.0 | - | lzo | 2.10 | - | m4 | 1.4.19 | - | magma | master | - | meson | 1.5.1 | - | mpc | 1.3.1 | - | mpfr | 4.2.1 | - | nasm | 2.16.03 | - | nccl | 2.26.2-1 | - | nccl-tests | 2.13.6 | - | ncurses | 6.5 | - | nghttp2 | 1.63.0 | - | ninja | 1.12.1 | - | numactl | 2.0.18 | - | nvtx | 3.1.0 | - | openblas | 0.3.28 | - | openssh | 9.9p1 | - | openssl | 3.4.0 | - | opus | 1.5.2 | - | osu-micro-benchmarks | 7.5 | - | patchelf | 0.17.2 | - | pcre | 8.45 | - | pcre2 | 10.44 | - | perl | 5.40.0 | - | pigz | 2.8 | - | pkgconf | 2.2.0 | - | protobuf | 3.28.2 | - | psimd | 2020-05-17 | - | pthreadpool | 2023-08-29 | - | python | 3.13.0 | - | python-venv | 1.0 | - | rdma-core | 31.0 | - | re2c | 3.1 | - | readline | 8.2 | - | rust | 1.81.0 | - | rust-bootstrap | 1.81.0 | - | sentencepiece | 0.1.99 | - | sleef | 3.6.0_2024-03-20 | - | sox | 14.4.2 | - | sqlite | 3.46.0 | - | swig | 4.1.1 | - | tar | 1.34 | - | texinfo | 7.1 | - | util-linux-uuid | 2.40.2 | - | util-macros | 1.20.1 | - | valgrind | 3.23.0 | - | xpmem | 2.9.6 | - | xz | 5.4.6 | - | yasm | 1.3.0 | - | zlib-ng | 2.2.1 | - | zstd | 1.5.6 | + | `abseil-cpp` | 20240722.0 | + | `alsa-lib` | 1.2.3.2 | + | `autoconf` | 2.72 | + | `automake` | 1.16.5 | + | `aws-ofi-nccl` | 1.14.0 | + | `berkeley-db` | 18.1.40 | + | `bison` | 3.8.2 | + | `boost` | 1.86.0 | + | `bzip2` | 1.0.8 | + | `ca-certificates-mozilla` | 2023-05-30 | + | `cmake` | 3.30.5 | + | `cpuinfo` | 2024-09-26 | + | `cray-gtl` | 8.1.32 | + | `cray-mpich` | 8.1.32 | + | `cray-pals` | 1.3.2 | + | `cray-pmi` | 6.1.15 | + | `cuda` | 12.6.0 | + | `cudnn` | 9.2.0.82-12 | + | `curl` | 8.10.1 | + | `cutensor` | 2.0.1.2 | + | `diffutils` | 3.10 | + | `eigen` | 3.4.0 | + | `elfutils` | 0.191 | + | `expat` | 2.6.4 | + | `faiss` | 1.8.0 | + | `ffmpeg` | 5.1.4 | + | `fftw` | 3.3.10 | + | `findutils` | 4.9.0 | + | `flac` | 1.4.3 | + | `fmt` | 11.0.2 | + | `fp16` | 2020-05-14 | + | `fxdiv` | 2020-04-17 | + | `gawk` | 4.2.1 | + | `gcc` | 13.3.0 | + | `gcc-runtime` | 13.3.0 | + | `gdb` | 15.2 | + | `gdbm` | 1.23 | + | `gettext` | 0.22.5 | + | `git` | 2.47.0 | + | `glibc` | 2.31 | + | `gloo` | 2023-12-03 | + | `gmake` | 4.4.1 | + | `gmp` | 6.3.0 | + | `gmp` | 6.3.0 | + | `gnuconfig` | 2024-07-27 | + | `googletest` | 1.12.1 | + | `gperftools` | 2.16 | + | `hdf5` | 1.14.5 | + | `hwloc` | 2.11.1 | + | `hydra` | 4.2.1 | + | `krb5` | 1.21.3 | + | `libaio` | 0.3.113 | + | `libarchive` | 3.7.6 | + | `libbsd` | 0.12.2 | + | `libedit` | 3.1-20240808 | + | `libfabric` | 1.15.2.0 | + | `libffi` | 3.4.6 | + | `libgit2` | 1.8.0 | + | `libiconv` | 1.17 | + | `libidn2` | 2.3.7 | + | `libjpeg-turbo` | 3.0.3 | + | `libmd` | 1.0.4 | + | `libmicrohttpd` | 0.9.50 | + | `libogg` | 1.3.5 | + | `libpciaccess` | 0.17 | + | `libpng` | 1.6.39 | + | `libsigsegv` | 2.14 | + | `libssh2` | 1.11.1 | + | `libtool` | 2.4.6 | + | `libtool` | 2.4.7 | + | `libunistring` | 1.2 | + | `libuv` | 1.48.0 | + | `libvorbis` | 1.3.7 | + | `libxcrypt` | 4.4.35 | + | `libxml2` | 2.13.4 | + | `libyaml` | 0.2.5 | + | `lz4` | 1.10.0 | + | `lzo` | 2.10 | + | `m`4 | 1.4.19 | + | `magma` | master | + | `meson` | 1.5.1 | + | `mpc` | 1.3.1 | + | `mpfr` | 4.2.1 | + | `nasm` | 2.16.03 | + | `nccl` | 2.26.2-1 | + | `nccl-tests` | 2.13.6 | + | `ncurses` | 6.5 | + | `nghttp2` | 1.63.0 | + | `ninja` | 1.12.1 | + | `numactl` | 2.0.18 | + | `nvtx` | 3.1.0 | + | `openblas` | 0.3.28 | + | `openssh` | 9.9p1 | + | `openssl` | 3.4.0 | + | `opus` | 1.5.2 | + | `osu-micro-benchmarks` | 7.5 | + | `patchelf` | 0.17.2 | + | `pcre` | 8.45 | + | `pcre2` | 10.44 | + | `perl` | 5.40.0 | + | `pigz` | 2.8 | + | `pkgconf` | 2.2.0 | + | `protobuf` | 3.28.2 | + | `psimd` | 2020-05-17 | + | `pthreadpool` | 2023-08-29 | + | `python` | 3.13.0 | + | `python-venv` | 1.0 | + | `rdma-core` | 31.0 | + | `re2c` | 3.1 | + | `readline` | 8.2 | + | `rust` | 1.81.0 | + | `rust-bootstrap` | 1.81.0 | + | `sentencepiece` | 0.1.99 | + | `sleef` | 3.6.0_2024-03-20 | + | `sox` | 14.4.2 | + | `sqlite` | 3.46.0 | + | `swig` | 4.1.1 | + | `tar` | 1.34 | + | `texinfo` | 7.1 | + | `util-linux-uuid` | 2.40.2 | + | `util-macros` | 1.20.1 | + | `valgrind` | 3.23.0 | + | `xpmem` | 2.9.6 | + | `x`z | 5.4.6 | + | `yasm` | 1.3.0 | + | `zlib-ng` | 2.2.1 | + | `zstd` | 1.5.6 | ??? info "Python packages exposed via the `default` view" | Package | Version | |---------------------|------------------| - | aniso8601 | 9.0.1 | - | annotated-types | 0.7.0 | - | apex | 0.1 | - | appdirs | 1.4.4 | - | astunparse | 1.6.3 | - | blinker | 1.6.2 | - | certifi | 2023.7.22 | - | charset-normalizer | 3.3.0 | - | click | 8.1.7 | - | coverage | 7.2.6 | - | Cython | 3.0.11 | - | docker-pycreds | 0.4.1 | - | donfig | 0.8.1.post1 | - | einops | 0.8.0 | - | faiss | 1.8.0 | - | filelock | 3.12.4 | - | flash_attn | 2.6.3 | - | Flask | 2.3.2 | - | Flask-RESTful | 0.3.9 | - | fsspec | 2024.5.0 | - | gitdb | 4.0.9 | - | GitPython | 3.1.40 | - | huggingface_hub | 0.26.2 | - | idna | 3.4 | - | importlib_metadata | 7.0.1 | - | iniconfig | 2.0.0 | - | itsdangerous | 2.1.2 | - | Jinja2 | 3.1.4 | - | joblib | 1.2.0 | - | lightning-utilities | 0.11.2 | - | MarkupSafe | 2.1.3 | - | mpmath | 1.3.0 | - | networkx | 3.1 | - | nltk | 3.9.1 | - | numcodecs | 0.15.0 | - | numpy | 2.1.2 | - | nvtx | 0.2.5 | - | packaging | 24.1 | - | pillow | 11.0.0 | - | pip | 23.1.2 | - | platformdirs | 3.10.0 | - | pluggy | 1.5.0 | - | protobuf | 5.28.2 | - | psutil | 7.0.0 | - | pybind11 | 2.13.6 | - | pydantic | 2.10.1 | - | pydantic_core | 2.27.1 | - | pytest | 8.2.1 | - | pytest-asyncio | 0.23.5 | - | pytest-cov | 4.0.0 | - | pytest-mock | 3.10.0 | - | pytest-random-order | 1.0.4 | - | pytz | 2023.3 | - | PyYAML | 6.0.2 | - | regex | 2022.8.17 | - | requests | 2.32.3 | - | safetensors | 0.4.5 | - | sentencepiece | 0.1.99 | - | sentry-sdk | 2.22.0 | - | setproctitle | 1.1.10 | - | setuptools | 69.2.0 | - | six | 1.16.0 | - | smmap | 5.0.0 | - | sympy | 1.13.1 | - | tiktoken | 0.4.0 | - | tokenizers | 0.21.0 | - | torch | 2.6.0 | - | torchaudio | 2.6.0a0+d883142 | - | torchmetrics | 1.5.2 | - | torchvision | 0.21.0 | - | tqdm | 4.66.3 | - | transformer_engine | 2.3.0.dev0+dd4c17d | - | transformers | 4.48.3 | - | triton | 3.2.0+gitc802bb4f | - | typing_extensions | 4.12.2 | - | urllib3 | 2.1.0 | - | versioneer | 0.29 | - | wandb | 0.19.9 | - | Werkzeug | 3.0.4 | - | wheel | 0.41.2 | - | wrapt | 1.15.0 | - | zarr | 3.0.1 | - | zipp | 3.17.0 | + | `aniso8601` | 9.0.1 | + | `annotated-types` | 0.7.0 | + | `apex` | 0.1 | + | `appdirs` | 1.4.4 | + | `astunparse` | 1.6.3 | + | `blinker` | 1.6.2 | + | `certifi` | 2023.7.22 | + | `charset-normalizer` | 3.3.0 | + | `click` | 8.1.7 | + | `coverage` | 7.2.6 | + | `Cython` | 3.0.11 | + | `docker-pycreds` | 0.4.1 | + | `donfig` | 0.8.1.post1 | + | `einops` | 0.8.0 | + | `faiss` | 1.8.0 | + | `filelock` | 3.12.4 | + | `flash_attn` | 2.6.3 | + | `Flask` | 2.3.2 | + | `Flask-RESTful` | 0.3.9 | + | `fsspec` | 2024.5.0 | + | `gitdb` | 4.0.9 | + | `GitPython` | 3.1.40 | + | `huggingface_hub` | 0.26.2 | + | `idna` | 3.4 | + | `importlib_metadata` | 7.0.1 | + | `iniconfig` | 2.0.0 | + | `itsdangerous` | 2.1.2 | + | `Jinja2` | 3.1.4 | + | `joblib` | 1.2.0 | + | `lightning-utilities` | 0.11.2 | + | `MarkupSafe` | 2.1.3 | + | `mpmath` | 1.3.0 | + | `networkx` | 3.1 | + | `nltk` | 3.9.1 | + | `numcodecs` | 0.15.0 | + | `numpy` | 2.1.2 | + | `nvtx` | 0.2.5 | + | `packaging` | 24.1 | + | `pillow` | 11.0.0 | + | `pip` | 23.1.2 | + | `platformdirs` | 3.10.0 | + | `pluggy` | 1.5.0 | + | `protobuf` | 5.28.2 | + | `psutil` | 7.0.0 | + | `pybind11` | 2.13.6 | + | `pydantic` | 2.10.1 | + | `pydantic_core` | 2.27.1 | + | `pytest` | 8.2.1 | + | `pytest-asyncio` | 0.23.5 | + | `pytest-cov` | 4.0.0 | + | `pytest-mock` | 3.10.0 | + | `pytest-random-order` | 1.0.4 | + | `pytz` | 2023.3 | + | `PyYAML` | 6.0.2 | + | `regex` | 2022.8.17 | + | `requests` | 2.32.3 | + | `safetensors` | 0.4.5 | + | `sentencepiece` | 0.1.99 | + | `sentry-sdk` | 2.22.0 | + | `setproctitle` | 1.1.10 | + | `setuptools` | 69.2.0 | + | `six` | 1.16.0 | + | `smmap` | 5.0.0 | + | `sympy` | 1.13.1 | + | `tiktoken` | 0.4.0 | + | `tokenizers` | 0.21.0 | + | `torch` | 2.6.0 | + | `torchaudio` | 2.6.0a0+d883142 | + | `torchmetrics` | 1.5.2 | + | `torchvision` | 0.21.0 | + | `tqdm` | 4.66.3 | + | `transformer_engine` | 2.3.0.dev0+dd4c17d | + | `transformers` | 4.48.3 | + | `triton` | 3.2.0+gitc802bb4f | + | `typing_extensions` | 4.12.2 | + | `urllib3` | 2.1.0 | + | `versioneer` | 0.29 | + | `wandb` | 0.19.9 | + | `Werkzeug` | 3.0.4 | + | `wheel` | 0.41.2 | + | `wrapt` | 1.15.0 | + | `zarr` | 3.0.1 | + | `zipp` | 3.17.0 | [](){#ref-uenv-pytorch-how-to-use} From c64ed68be868943a7ab7cf301454fdeb37dbe8a8 Mon Sep 17 00:00:00 2001 From: Mikael Simberg Date: Wed, 9 Jul 2025 09:18:58 +0200 Subject: [PATCH 3/6] More typos and whitelist --- .github/actions/spelling/allow.txt | 7 +++++++ .github/actions/spelling/patterns.txt | 3 +++ docs/guides/storage.md | 4 ++-- docs/software/ml/pytorch.md | 4 ++-- docs/storage/filesystems.md | 4 ++-- 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index 8f42dbd9..c4ecf700 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -121,6 +121,12 @@ gitlab gpu groundstate ijulia +julia +linalg +linux +nccl +osts +quantumespresso inodes iopsstor jfrog @@ -202,3 +208,4 @@ xattr xattrs youtube zstd +hdf diff --git a/.github/actions/spelling/patterns.txt b/.github/actions/spelling/patterns.txt index 9352f85a..24de08a9 100644 --- a/.github/actions/spelling/patterns.txt +++ b/.github/actions/spelling/patterns.txt @@ -23,3 +23,6 @@ https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0 # kebab-case and snake_case words [a-z]+-[a-z-]+ [a-z]+_[a-z_]+ + +# versions +[0-9]+\.[.0-9]+(\+[0-9a-z]+)? diff --git a/docs/guides/storage.md b/docs/guides/storage.md index 34cc6e06..5a0b4ee2 100644 --- a/docs/guides/storage.md +++ b/docs/guides/storage.md @@ -111,7 +111,7 @@ To set up a default so all newly created folders and dirs inside or your desired ``` !!! info - For more information read the setfacl man page: `man setfacl`. + For more information read the `setfacl` man page: `man setfacl`. [](){#ref-guides-storage-lustre} ## Lustre tuning @@ -124,7 +124,7 @@ Its performance is roughly the same on [Capstor][ref-alps-capstor] and [Iopsstor This data is globally synchronized, which means Lustre is not well suited to handling many small files, see the discussion on [how to handle many small files][ref-guides-storage-small-files]. The data itself is subdivided in blocks of size `` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST). -The block size and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories ihneriting them from their parent directory. +The block size and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories inheriting them from their parent directory. The `lfs getstripe ` command can be used to get information on the stripe settings of a path. For directories and empty files `lfs setstripe --stripe-count --stripe-size ` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout diff --git a/docs/software/ml/pytorch.md b/docs/software/ml/pytorch.md index 04ac1054..84aa68db 100644 --- a/docs/software/ml/pytorch.md +++ b/docs/software/ml/pytorch.md @@ -146,7 +146,7 @@ The PyTorch uenv is versioned according to the PyTorch version it provides. | `util-macros` | 1.20.1 | | `valgrind` | 3.23.0 | | `xpmem` | 2.9.6 | - | `x`z | 5.4.6 | + | `xz` | 5.4.6 | | `yasm` | 1.3.0 | | `zlib-ng` | 2.2.1 | | `zstd` | 1.5.6 | @@ -378,7 +378,7 @@ srun bash -c " The `MASTER_ADDR`, `MASTER_PORT` and `WORLD_SIZE` variables are used to determine the address and port of the master node. Additionally we also need `RANK` and `LOCAL_RANK` but these must be set per-process, see below. 4. Enable more graceful exception handling, see [PyTorch documentation](https://pytorch.org/docs/stable/torch_nccl_environment_variables.html) -5. Set the Trition home to a local path (e.g. `/dev/shm`) to avoid writing to the (distributed) file system. +5. Set the Triton home to a local path (e.g. `/dev/shm`) to avoid writing to the (distributed) file system. This is important for performance, as writing to the Lustre file system can be slow due to the amount of small files and potentially many processes accessing it. 6. Disable GPU support in MPICH, as it [can lead to deadlocks](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html#inter-gpu-communication-with-cuda-aware-mpi) when using together with nccl. 7. Avoid writing JITed binaries to the (distributed) file system, which could lead to performance issues. diff --git a/docs/storage/filesystems.md b/docs/storage/filesystems.md index 57ed49cc..88358157 100644 --- a/docs/storage/filesystems.md +++ b/docs/storage/filesystems.md @@ -89,7 +89,7 @@ See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get t All users on Alps get their own Scratch path, `/capstor/scratch/cscs/$USER`, which is pointed to by the variable `$SCRATCH` on the [HPC Platform][ref-platform-hpcp] and [Climate and Weather Platform][ref-platform-cwp] clusters Eiger, Daint and Santis. !!! info "`$SCRATCH` on MLP points to Iopsstor" - On the machine learning platform (MLP) systems [clariden][ref-cluster-clariden] and [bristen][ref-cluster-bristen] the `$SCRATCH` variable points to storage on [Iopstore][ref-alps-iopsstor]. + On the machine learning platform (MLP) systems [clariden][ref-cluster-clariden] and [bristen][ref-cluster-bristen] the `$SCRATCH` variable points to storage on [Iopsstor][ref-alps-iopsstor]. See the [MLP docs][ref-mlp-storage] for more information. ### Cleanup and expiration @@ -337,7 +337,7 @@ In addition to the automatic deletion of old files, if occupancy exceeds 60% the ??? question "What do messages like `mkdir: cannot create directory 'test': Disk quota exceeded` mean?" You have run out of quota on the target file system. Consider deleting unneeded files, or moving data to a different file system. - Specifcially, if you see this message when using [Home][ref-storage-home], which has a relatively small 50 GB limit, consider moving the data to your project's [Store][ref-storage-store] path. + Specifically, if you see this message when using [Home][ref-storage-home], which has a relatively small 50 GB limit, consider moving the data to your project's [Store][ref-storage-store] path. !!! todo FAQ question: [writing with specific group access](https://confluence.cscs.ch/spaces/KB/pages/276955350/Writing+on+project+if+you+belong+to+more+than+one+group) From 199856ddfe410182f95ab68aaff6b2fac941b233 Mon Sep 17 00:00:00 2001 From: Mikael Simberg Date: Wed, 9 Jul 2025 12:11:06 +0200 Subject: [PATCH 4/6] More typos and whitelist --- .github/actions/spelling/allow.txt | 89 ++++++++++++++++--- .../actions/spelling/block-delimiters.list | 4 + .github/actions/spelling/patterns.txt | 3 +- docs/accounts/account-create.md | 2 +- docs/alps/hardware.md | 2 +- docs/guides/mlp_tutorials/index.md | 2 +- docs/guides/mlp_tutorials/llm-finetuning.md | 10 +-- docs/guides/mlp_tutorials/llm-inference.md | 14 +-- .../mlp_tutorials/llm-nanotron-training.md | 8 +- docs/platforms/mlp/index.md | 6 +- docs/software/container-engine/edf.md | 6 +- .../container-engine/resource-hook.md | 2 +- docs/software/sciapps/lammps.md | 2 +- docs/software/uenv/deploy.md | 4 +- docs/storage/transfer.md | 8 +- mkdocs.yml | 2 +- 16 files changed, 115 insertions(+), 49 deletions(-) diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt index c4ecf700..26e4d59a 100644 --- a/.github/actions/spelling/allow.txt +++ b/.github/actions/spelling/allow.txt @@ -1,9 +1,9 @@ ACLs ACR AMD -AWS Alpstein Balfrin +Besard Broyden CFLAGS CHARMM @@ -17,17 +17,16 @@ Ceph Containerfile DNS Dockerfiles -EDF -EDFs -EDFs +Dufourspitze EMPA ETHZ Ehrenfest Errigal FFT +Fawzi Fock +Foket GAPW -GCC GGA GPFS GPG @@ -39,29 +38,41 @@ GTL Gaussian Google HDD +HDDs HPC HPCP HPE HSN Hartree +Invernizzi Jax Jira Keycloak +Kwasniewski LAMMPS +LAPACK LDA +LLM +LLMs LOCALID LUMI Libc Linaro Linux +MDS +MDSs MFA MLP MNDO MPICH +Malvoisin MeteoSwiss NAMD NICs NVMe +Nordend +OSS +OSSs OTP OTPs PASC @@ -71,8 +82,10 @@ PID PMPI POSIX Parrinello +Pintarelli Piz Plesset +Podladchikov Pulay RCCL RDMA @@ -83,22 +96,25 @@ Roothaan SSHService STMV Scopi +Signalkuppe TOTP UANs UserLab -VASP -Waldur Wannier XDG +Zumsteinspitz aarch aarch64 acl +artifactory autodetection +aws baremetal biomolecular bristen bytecode capstor +chatbot clariden concretise concretizer @@ -112,47 +128,79 @@ diagonalisation dimms dockerhub dotenv +dropbear +edf +edfs eiger epyc +fftw filesystems fontawesome +gcc gdrcopy +github gitlab +gpt gpu groundstate +gsl +hdf +huggingface +hwloc +iframe ijulia -julia -linalg -linux -nccl -osts -quantumespresso inodes iopsstor jfrog +jobreport +juhpc +julia +juliaup jupyter +kokkos lexer libfabric +linalg +linux +matlab +meteo miniconda +mkl mpi mps multitenancy nanotron +nccl +netlib netrc nsight numa +nvcr nvdashboard nvidia +nwp octicons +ofi +omlin +omp oom +osts +osu +papi +pme +pmi podman preinstalled +prerelease +prereleases prgenv prioritisation +prioritise prioritised proactively pyfirecrest pytorch +quantumespresso quickstart rocm runtime @@ -162,6 +210,7 @@ sbatch screenshot slurm smartphone +sourced sphericart squashfs srun @@ -188,24 +237,36 @@ torchaudio torchvision treesitter trilinos +trl uarch uenv uenvs uids +utkin vCluster vClusters +valgrind +vasp +vboost venv versioned versioning +waldur +wandb webhooks webinar webpage website wikipedia +wikitext +wlcg workaround workflows xattr xattrs +xcb +xfer +xname +xpmem youtube zstd -hdf diff --git a/.github/actions/spelling/block-delimiters.list b/.github/actions/spelling/block-delimiters.list index a4d751d5..c8103a90 100644 --- a/.github/actions/spelling/block-delimiters.list +++ b/.github/actions/spelling/block-delimiters.list @@ -9,3 +9,7 @@ # ignore indented code blocks ``` ``` + +# ignore embedded iframes +