Skip to content

Commit 8951a59

Browse files
committed
Added refs, integrated with storage guide
1 parent 1af02bc commit 8951a59

File tree

5 files changed

+43
-33
lines changed

5 files changed

+43
-33
lines changed

docs/guides/lustre-tuning.md

Lines changed: 0 additions & 32 deletions
This file was deleted.

docs/guides/storage.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,10 +113,50 @@ To set up a default so all newly created folders and dirs inside or your desired
113113
!!! info
114114
For more information read the setfacl man page: `man setfacl`.
115115

116+
[](){#ref-guides-storage-lustre}
117+
## Lustre Tuning
118+
[Capstor][ref-alps-capstor] and [Iopsstor][ref-alps-iopsstor] are both [lustre](https://lustre.org) filesystem.
119+
Lustre is an open-source, parallel file system used in HPC systems.
120+
As shown in the schema below
121+
122+
![Lustre architecture](/images/storage/lustre.png)
123+
124+
Lustre uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,..
125+
This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both Capstor and Iopsstor. The section below discusses [how to handle many small files][ref-guides-storage-small-files]
126+
127+
The data itself is subdivided in blocks of size `<blocksize>` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST).
128+
The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout
129+
130+
A blocksize of 4MB gives good throughput, without being overly big, so it is a good choice when reading a file sequentially or in large chuncks, but if one reads shorter chuncks in random order it might be better to reduce the size, the performance will be smaller, but the performance of your application might actually increase.
131+
https://doc.lustre.org/lustre_manual.xhtml#managingstripingfreespace
132+
133+
!!! example "Settings for large files"
134+
```console
135+
lfs setstripe --stripe-count -1 --stripe-size 4M <big_files_dir>`
136+
```
137+
138+
Lustre also supports composite layouts, switching from one layout to another at a given size `--component-end` (`-E`).
139+
With it it is possible to create a Progressive file layout switching `--stripe-count` (`-c`), `--stripe-size` (`-S`), so that fewer locks are required for smaller files, but load is distributed for larger files.
140+
141+
!!! example "Good default settings"
142+
```console
143+
lfs setstripe -E 4M -c 1 -E 64M -c 4 -E -1 -c -1 -S 4M <base_dir>
144+
```
145+
146+
### Iopsstor vs Capstor
147+
148+
[Iopsstor][ref-alps-iopsstor] uses SSD as OST, thus random access is quick, and the performance of the single OST is high. [Capstor][ref-alps-capstor] on another hand uses harddisks, it has a larger capacity, and it also have many more OSS, thus the total bandwidth is larger.
149+
150+
!!! Note
151+
ML model training normally has better performance if reading from iopsstor (random access, difficult to predict access pattern). Checkpoint can be done to capstor (very good for contiguous access).
152+
153+
[](){#ref-guides-storage-small-files}
116154
## Many small files vs. HPC File Systems
117155

118156
Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.
119157

158+
In some cases, and if enough memory is available it might be worth to unpack/repack the small files to local in memory filesystems like `/dev/shmem/$USER` or `/tmp`, which are *much* faster, or to use a squashfs filesystem that is stored as a single large file on lustre.
159+
120160
Workloads that do not play nicely with Lustre include:
121161

122162
* Configuration and compiling applications.

docs/platforms/mlp/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ Scratch is per user - each user gets separate scratch path and quota.
6363
The Capstor scratch filesystem is based on HDDs and is optimized for large, sequential read and write operations.
6464
We recommend using Capstor for storing **checkpoint files** and other **large, contiguous outputs** generated by your training runs.
6565
In contrast, Iopstor uses high-performance NVMe drives, which excel at handling **IOPS-intensive workloads** involving frequent, random access. This makes it a better choice for storing **training datasets**, especially when accessed randomly during machine learning training.
66+
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best performance out of the filesystem.
6667

6768
### Scratch Usage Recommendations
6869

docs/storage/filesystems.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ Daily [snapshots][ref-storage-snapshots] for the last seven days are provided in
8484
## Scratch
8585

8686
The Scratch file system is a fast workspace tuned for use by parallel jobs, with an emphasis on performance over reliability, hosted on the [Capstor][ref-alps-capstor] Lustre filesystem.
87+
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best performance out of the filesystem.
8788

8889
All users on Alps get their own Scratch path, `/capstor/scratch/cscs/$USER`, which is pointed to by the variable `$SCRATCH` on the [HPC Platform][ref-platform-hpcp] and [Climate and Weather Platform][ref-platform-cwp] clusters Eiger, Daint and Santis.
8990

@@ -123,6 +124,7 @@ Please ensure that you move important data to a file system with backups, for ex
123124
## Store
124125

125126
Store is a large, medium-performance, storage on the [Capstor][ref-alps-capstor] Lustre file system for sharing data within a project, and for medium term data storage.
127+
See the [Lustre guide][ref-guides-storage-lustre] for some hints on how to get the best preformance out of the filesystem.
126128

127129
Space on Store is allocated per-project, with a path created for each project.
128130
To accomodate the different customers and projects on Alps, the project paths are organised as follows:

mkdocs.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,6 @@ nav:
112112
- guides/index.md
113113
- 'Internet Access on Alps': guides/internet-access.md
114114
- 'Storage': guides/storage.md
115-
- 'Lustre tuning': guides/lustre-tuning.md
116115
- 'Using the terminal': guides/terminal.md
117116
- 'MLP Tutorials':
118117
- guides/mlp_tutorials/index.md

0 commit comments

Comments
 (0)