Skip to content

Commit ffa006a

Browse files
committed
Integrated suggestions by @msimberg, cleanups
1 parent 8951a59 commit ffa006a

File tree

3 files changed

+16
-12
lines changed

3 files changed

+16
-12
lines changed

docs/alps/storage.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ HPC storage is provided by independent clusters, composed of servers and physica
1919

2020
Capstor and Iopsstor are on the same Slingshot network as Alps, while VAST is on the CSCS Ethernet network.
2121

22+
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best performance out of the filesystem.
23+
2224
The mounts, and how they are used for Scratch, Store, and Home file systems that are mounted on clusters are documented in the [file system docs][ref-storage-fs].
2325

2426
[](){#ref-alps-capstor}

docs/guides/storage.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -116,25 +116,28 @@ To set up a default so all newly created folders and dirs inside or your desired
116116
[](){#ref-guides-storage-lustre}
117117
## Lustre Tuning
118118
[Capstor][ref-alps-capstor] and [Iopsstor][ref-alps-iopsstor] are both [lustre](https://lustre.org) filesystem.
119-
Lustre is an open-source, parallel file system used in HPC systems.
119+
120120
As shown in the schema below
121121

122122
![Lustre architecture](/images/storage/lustre.png)
123123

124-
Lustre uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,..
125-
This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both Capstor and Iopsstor. The section below discusses [how to handle many small files][ref-guides-storage-small-files]
124+
Lustre uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,...
125+
Its performance is roughly the same on [Capstor][ref-alps-capstor] and [Iopsstor][ref-alps-iopsstor].
126+
This data is globally synchronized, which means that handling many small files is not especially suited for Lustre, see the discussion on [how to handle many small files][ref-guides-storage-small-files].
126127

127128
The data itself is subdivided in blocks of size `<blocksize>` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST).
128-
The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout
129+
The blocksize and number of OSTs to use is defined by the striping settings.
130+
A new file or directory ihnerits them from its parent directory.
131+
The `lfs getstripe <path>` command can be used to get information on the actual stripe settings.
132+
For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout
129133

130-
A blocksize of 4MB gives good throughput, without being overly big, so it is a good choice when reading a file sequentially or in large chuncks, but if one reads shorter chuncks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase.
134+
A blocksize of 4MB gives good throughput, without being overly big, so it is a good choice when reading a file sequentially or in large chunks, but if one reads shorter chunks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase.
131135
https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace
132136

133137
!!! example "Settings for large files"
134138
```console
135139
lfs setstripe --stripe-count -1 --stripe-size 4M <big_files_dir>`
136140
```
137-
138141
Lustre also supports composite layouts, switching from one layout to another at a given size `--component-end` (`-E`).
139142
With it it is possible to create a Progressive file layout switching `--stripe-count` (`-c`), `--stripe-size` (`-S`), so that fewer locks are required for smaller files, but load is distributed for larger files.
140143

@@ -145,17 +148,15 @@ With it it is possible to create a Progressive file layout switching `--stripe-c
145148

146149
### Iopsstor vs Capstor
147150

148-
[Iopsstor][ref-alps-iopsstor] uses SSD as OST, thus random access is quick, and the performance of the single OST is high. [Capstor][ref-alps-capstor] on another hand uses harddisks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger.
149-
150-
!!! Note
151-
ML model training normally has better performance if reading from iopsstor (random access, difficult to predict access pattern). Checkpoint can be done to capstor (very good for contiguous access).
151+
[Iopsstor][ref-alps-iopsstor] uses SSD as OST, thus random access is quick, and the performance of the single OST is high.
152+
[Capstor][ref-alps-capstor] on another hand uses harddisks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger. See for example the [ML filesystem suitability][ref-mlp-storage-suitability].
152153

153154
[](){#ref-guides-storage-small-files}
154155
## Many small files vs. HPC File Systems
155156

156157
Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.
157158

158-
In some cases, and if enough memory is available it might be worth to unpack/repack the small files to local in memory filesystems like `/dev/shmem/$USER` or `/tmp`, which are *much* faster, or to use a squashfs filesystem that is stored as a single large file on lustre.
159+
In some cases, and if enough memory is available it might be worth to unpack/repack the small files to in-memory filesystems like `/dev/shm/$USER` or `/tmp`, which are *much* faster, or to use a squashfs filesystem that is stored as a single large file on Lustre.
159160

160161
Workloads that do not play nicely with Lustre include:
161162

docs/platforms/mlp/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,14 @@ Use scratch to store datasets that will be accessed by jobs, and for job output.
5252
Scratch is per user - each user gets separate scratch path and quota.
5353

5454
* The environment variable `SCRATCH=/iopsstor/scratch/cscs/$USER` is set automatically when you log into the system, and can be used as a shortcut to access scratch.
55-
* There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`.
55+
* There is an additional scratch path mounted on [Capstor][ref-alps-capstor] at `/capstor/scratch/cscs/$USER`.
5656

5757
!!! warning "scratch cleanup policy"
5858
Files that have not been accessed in 30 days are automatically deleted.
5959

6060
**Scratch is not intended for permanent storage**: transfer files back to the capstor project storage after job runs.
6161

62+
[](){#ref-mlp-storage-suitability}
6263
!!! note "file system suitability"
6364
The Capstor scratch filesystem is based on HDDs and is optimized for large, sequential read and write operations.
6465
We recommend using Capstor for storing **checkpoint files** and other **large, contiguous outputs** generated by your training runs.

0 commit comments

Comments
 (0)