You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/clusters/eiger.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,7 +37,7 @@ Eiger is an Alps cluster that provides compute nodes and file systems designed t
37
37
Eiger consists of multicore [AMD Epyc Rome][ref-alps-zen2-node] compute nodes: please note that the total number of available compute nodes on the system might vary over time.
38
38
See the [Slurm documentation][ref-slurm-partitions-nodecount] for information on how to check the number of nodes.
39
39
40
-
Additionally, there are four login nodes with hostnames`eiger-ln00[1-4]`.
40
+
Additionally, there are four login nodes with host names`eiger-ln00[1-4]`.
Copy file name to clipboardExpand all lines: docs/guides/storage.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,12 +124,12 @@ Its performance is roughly the same on [Capstor][ref-alps-capstor] and [Iopsstor
124
124
This data is globally synchronized, which means Lustre is not well suited to handling many small files, see the discussion on [how to handle many small files][ref-guides-storage-small-files].
125
125
126
126
The data itself is subdivided in blocks of size `<blocksize>` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST).
127
-
The blocksize and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories ihneriting them from their parent directory.
127
+
The block size and number of OSTs to use is defined by the striping settings, which are applied to a path, with new files and directories ihneriting them from their parent directory.
128
128
The `lfs getstripe <path>` command can be used to get information on the stripe settings of a path.
129
129
For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout.
130
130
The simplest way to have the correct layout is to copy to a directory with the correct layout
131
131
132
-
!!! tip "A blocksize of 4MB gives good throughput, without being overly big..."
132
+
!!! tip "A block size of 4MB gives good throughput, without being overly big..."
133
133
... so it is a good choice when reading a file sequentially or in large chunks, but if one reads shorter chunks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase.
134
134
See the [Lustre documentation](https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace) for more information.
135
135
@@ -149,7 +149,7 @@ With it it is possible to create a Progressive file layout switching `--stripe-c
149
149
### Iopsstor vs Capstor
150
150
151
151
[Iopsstor][ref-alps-iopsstor] uses SSD as OST, thus random access is quick, and the performance of the single OST is high.
152
-
[Capstor][ref-alps-capstor] on another hand uses harddisks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger.
152
+
[Capstor][ref-alps-capstor] on another hand uses hard disks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger.
153
153
See for example the [ML filesystem guide][ref-mlp-storage-suitability].
Copy file name to clipboardExpand all lines: docs/services/cicd.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -994,7 +994,7 @@ The default is `none`, and you must explicitly set it to `fetch` or `clone` to
994
994
##### `CSCS_CUDA_MPS`
995
995
Optional variable, default is `NO`
996
996
997
-
Enable running with nvidia-mps-server, which allows multiple ranks sharing the same GPU.
997
+
Enable running with `nvidia-mps-server`, which allows multiple ranks sharing the same GPU.
998
998
999
999
##### `USE_MPI`
1000
1000
Optional variable, default is `AUTO`
@@ -1202,7 +1202,7 @@ Loads the view of a uenv.
1202
1202
##### `CSCS_CUDA_MPS`
1203
1203
Optional variable, default is `NO`
1204
1204
1205
-
Enable running with nvidia-mps-server, which allows multiple ranks sharing the same GPU.
1205
+
Enable running with `nvidia-mps-server`, which allows multiple ranks sharing the same GPU.
1206
1206
1207
1207
#### Example jobs
1208
1208
```yaml
@@ -1405,8 +1405,7 @@ A couple of projects which use this CI setup.
1405
1405
Please have a look there for more advanced usage:
1406
1406
1407
1407
* [dcomex-framework](https://github.com/DComEX/dcomex-framework): entry point is `ci/prototype.yml`
1408
-
* [mars](https://bitbucket.org/zulianp/mars/src/development/): two pipelines, with entry points `ci/gitlab/cscs/gpu/gitlab-
1409
-
daint.yml` and `ci/gitlab/cscs/mc/gitlab-daint.yml`
1408
+
* [mars](https://bitbucket.org/zulianp/mars/src/development/): two pipelines, with entry points `ci/gitlab/cscs/gpu/gitlab-daint.yml` and `ci/gitlab/cscs/mc/gitlab-daint.yml`
1410
1409
* [sparse_accumulation](https://github.com/lab-cosmo/sparse_accumulation): entry point is `ci/pipeline.yml`
1411
1410
* [gt4py](https://github.com/GridTools/gt4py): entry point is `ci/cscs-ci.yml`
1412
1411
* [SIRIUS](https://github.com/electronic-structure/SIRIUS): entry point is `ci/cscs-daint.yml`
Copy file name to clipboardExpand all lines: docs/software/ml/pytorch.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -383,7 +383,7 @@ srun bash -c "
383
383
6. Disable GPU support in MPICH, as it [can lead to deadlocks](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html#inter-gpu-communication-with-cuda-aware-mpi) when using together with nccl.
384
384
7. Avoid writing JITed binaries to the (distributed) file system, which could lead to performance issues.
385
385
8. These variables should always be set for correctness and optimal performance when using NCCL, see [the detailed explanation][ref-communication-nccl].
386
-
9.`RANK` and `LOCAL_RANK` are set per-process by the Slurmjob launcher.
386
+
9.`RANK` and `LOCAL_RANK` are set per-process by the Slurm job launcher.
387
387
10. Activate the virtual environment created on top of the uenv (if any).
388
388
Please follow the guidelines for [python virtual environments with uenv][ref-guides-storage-venv] to enhance scalability and reduce load times.
Copy file name to clipboardExpand all lines: docs/storage/filesystems.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,7 +124,7 @@ Please ensure that you move important data to a file system with backups, for ex
124
124
## Store
125
125
126
126
Store is a large, medium-performance, storage on the [Capstor][ref-alps-capstor] Lustre file system for sharing data within a project, and for medium term data storage.
127
-
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best preformance out of the filesystem.
127
+
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best performance out of the filesystem.
128
128
129
129
Space on Store is allocated per-project, with a path created for each project.
130
130
To accomodate the different customers and projects on Alps, the project paths are organised as follows:
0 commit comments