Add lustre tuning guide #166

fawzi · 2025-06-24T12:16:05Z

Added the basic info and commands to improve lustre performance.
I did not describe things that we do not use (like pool), or that normally should not be used like index/offset.

github-actions · 2025-06-24T12:16:46Z

preview available: https://docs.tds.cscs.ch/166

fawzi · 2025-06-24T12:17:25Z

probably we should link to it from MLP and maybe other places too

github-actions · 2025-06-24T12:44:28Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-24T14:27:13Z

preview available: https://docs.tds.cscs.ch/166

msimberg

Thanks @fawzi, this is very nice to have.

It might be a good idea to at least cross-link (i.e. possibly both directions) with https://docs.cscs.ch/guides/storage/, https://docs.cscs.ch/storage/filesystems/, and https://docs.cscs.ch/alps/storage/ for better discoverability? The platform pages (e.g. https://docs.cscs.ch/platforms/hpcp/#file-systems-and-storage) also have some storage related content.

The info that you're adding is unique and new, but we might have to globally optimize a bit how we group all storage-related info.

msimberg · 2025-06-25T07:07:43Z

docs/guides/lustre-tuning.md

+# Lustre Tuning
+`/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem.


Suggested change

# Lustre Tuning

`/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem.

# Lustre tuning

[Capstor][ref-alps-capstor] and [Iopsstor][ref-alps-iopsstor] are both [Lustre](https://lustre.org) filesystems.

In general, do have a quick read of https://docs.cscs.ch/contributing/ for some guidelines on formatting etc.

Changed references, merged with storage guide, and added some cross references.

docs/guides/lustre-tuning.md

msimberg · 2025-06-25T07:09:52Z

docs/guides/lustre-tuning.md

+# Lustre Tuning
+`/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem.
+Lustre is an open-source, parallel file system used in HPC systems.
+As shown in ![Lustre architecture](/images/storage/lustre.png) uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,..


This image doesn't show up in the rendered output. Could you make sure that it's correctly rendered?

I think it is because it refers to the image in a place that will have the image only after it is merged, in local serving it works, or should I add the image explicitly somewhere?

msimberg · 2025-06-25T07:10:16Z

docs/guides/lustre-tuning.md

+`/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem.
+Lustre is an open-source, parallel file system used in HPC systems.
+As shown in ![Lustre architecture](/images/storage/lustre.png) uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,..
+This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor.


Suggested change

This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor.

This data is globally synchronized, which means that handling many small files is not especially suited for Lustre, and the perfomrance of that part is similar on both capstor and iopsstor.

here and elsewhere.

Suggested change

This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor.

This data is globally synchronized, which means that handling many small files is not well suited for lustre.

Leave out the second part? Iopsstor should handle small files slightly better than Capstor, no? I may very well also be mistaken in which case it's obviously good to leave it in in.

Some of this motivation overlaps with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems.

The performance of the metadata part is roughly the same on iopsstor and capstor, it depends on the number of metadata servers and users adding load to it. For a while backup was putting extra load on capstor. Datawriting of many small files might indeed be slightly better on iopsstor because SSD are faster

msimberg · 2025-06-25T07:12:44Z

docs/guides/lustre-tuning.md

+Lustre is an open-source, parallel file system used in HPC systems.
+As shown in ![Lustre architecture](/images/storage/lustre.png) uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,..
+This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor.
+With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.


Suggested change

With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.

With many small files, an in-memory filesystem like `/dev/shm/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.

?

There's a lot of overlap with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems. Can we link or avoid some duplication of the motivation etc.?

Actually, do you think this guide could fit inside the existing storage guide?

msimberg · 2025-06-25T07:14:54Z

docs/guides/lustre-tuning.md

+With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.
+
+The data itself is subdivided in blocks of size `<blocksize>` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST).
+The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout


Suggested change

The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout

The blocksize and number of OSTs to use is defined by the striping settings.

A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings.

For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout.

The simplest way to have the correct layout is to copy to a directory with the correct layout.

docs/guides/lustre-tuning.md

msimberg · 2025-06-25T07:17:42Z

docs/guides/lustre-tuning.md

+!!! example "Good default settings"
+    ```console
+    lfs setstripe -E 4M -c 1 -E 64M -c 4 -E -1 -c -1 -S 4M <base_dir>
+    ```


General question: do we know the defaults on capstor/ioppstor? It might be useful for users to know when they should actually care about changing the settings. Or are these "Good default settings" actually our defaults?

I had the same question, I saw some commands like this before but I honestly don't really get what will happen to the file in details, so if you/anyone tested it'd be great to explain it a bit, please ? :)
e.g. does it create 64MB stripes? is this the optimal size? (I recall getting better results reading files a bit larger, but no idea how that translates)

and @msimberg says, if this is a good definition, can we please make it the default?
does writing require different values vs. reading?

QPM might good a time to talk to storage and find someone to run these tests?

Thanks and thank you for submitting this too!

The definition means first 4M use just one OST (less locks, less overhead), until 64M 4 OST, then all OST and 4MB blocksize, I tested it, but not heavily. I discussed already with @mpasserini , but he is not keen on changing the things especially as capstor might be replaced for MLP soon. I am willing to push users that are willing to set things, and if we then have more test coverage really change the default

docs/guides/lustre-tuning.md

github-actions · 2025-06-25T09:27:48Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:45:48Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:50:04Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:50:06Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:50:25Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:51:00Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:51:05Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:51:07Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T12:51:18Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T13:01:44Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T13:06:47Z

preview available: https://docs.tds.cscs.ch/166

github-actions · 2025-06-25T16:04:09Z

preview available: https://docs.tds.cscs.ch/166

bcumming

Thanks Fawzi.

These are a good starting point for us to develop more detailed docs over time.

I made a few tweaks and pushed them, to speed up the process.

Add lustre tuning guide

43b73a2

fawzi requested review from RMeli, bcumming and msimberg as code owners June 24, 2025 12:16

Added iopsstor vs capstor notes

1e0814c

added Lustre tuning guide to index

1af02bc

msimberg requested changes Jun 25, 2025

View reviewed changes

Added refs, integrated with storage guide

8951a59

fawzi requested a review from mpasserini as a code owner June 25, 2025 09:27

Integrated suggestions by @msimberg, cleanups

ffa006a

fix image in preview

899bb7e

really fix image in preview

eeb39f4

polish the lustre guide

31df88f

bcumming approved these changes Jun 25, 2025

View reviewed changes

msimberg approved these changes Jun 25, 2025

View reviewed changes

bcumming merged commit e796cc0 into eth-cscs:main Jun 25, 2025
1 check passed

		# Lustre Tuning
		`/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem.

	This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor.
	This data is globally synchronized, which means that handling many small files is not especially suited for Lustre, and the perfomrance of that part is similar on both capstor and iopsstor.

	With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be much faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.
	With many small files, an in-memory filesystem like `/dev/shm/$USER` or "/tmp", if enough memory can be spared for it, can be much faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option.

Add lustre tuning guide #166

Add lustre tuning guide #166

Conversation

fawzi commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

fawzi commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 24, 2025

Uh oh!

msimberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

bcumming left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone