-
Notifications
You must be signed in to change notification settings - Fork 41
Add lustre tuning guide #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
preview available: https://docs.tds.cscs.ch/166 |
|
probably we should link to it from MLP and maybe other places too |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
msimberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fawzi, this is very nice to have.
It might be a good idea to at least cross-link (i.e. possibly both directions) with https://docs.cscs.ch/guides/storage/, https://docs.cscs.ch/storage/filesystems/, and https://docs.cscs.ch/alps/storage/ for better discoverability? The platform pages (e.g. https://docs.cscs.ch/platforms/hpcp/#file-systems-and-storage) also have some storage related content.
The info that you're adding is unique and new, but we might have to globally optimize a bit how we group all storage-related info.
docs/guides/lustre-tuning.md
Outdated
| # Lustre Tuning | ||
| `/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Lustre Tuning | |
| `/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem. | |
| # Lustre tuning | |
| [Capstor][ref-alps-capstor] and [Iopsstor][ref-alps-iopsstor] are both [Lustre](https://lustre.org) filesystems. |
In general, do have a quick read of https://docs.cscs.ch/contributing/ for some guidelines on formatting etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed references, merged with storage guide, and added some cross references.
docs/guides/lustre-tuning.md
Outdated
| # Lustre Tuning | ||
| `/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem. | ||
| Lustre is an open-source, parallel file system used in HPC systems. | ||
| As shown in  uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This image doesn't show up in the rendered output. Could you make sure that it's correctly rendered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is because it refers to the image in a place that will have the image only after it is merged, in local serving it works, or should I add the image explicitly somewhere?
docs/guides/lustre-tuning.md
Outdated
| `/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem. | ||
| Lustre is an open-source, parallel file system used in HPC systems. | ||
| As shown in  uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,.. | ||
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | |
| This data is globally synchronized, which means that handling many small files is not especially suited for Lustre, and the perfomrance of that part is similar on both capstor and iopsstor. |
here and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | |
| This data is globally synchronized, which means that handling many small files is not well suited for lustre. |
Leave out the second part? Iopsstor should handle small files slightly better than Capstor, no? I may very well also be mistaken in which case it's obviously good to leave it in in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this motivation overlaps with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of the metadata part is roughly the same on iopsstor and capstor, it depends on the number of metadata servers and users adding load to it. For a while backup was putting extra load on capstor. Datawriting of many small files might indeed be slightly better on iopsstor because SSD are faster
docs/guides/lustre-tuning.md
Outdated
| Lustre is an open-source, parallel file system used in HPC systems. | ||
| As shown in  uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,.. | ||
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | ||
| With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. | |
| With many small files, an in-memory filesystem like `/dev/shm/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of overlap with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems. Can we link or avoid some duplication of the motivation etc.?
Actually, do you think this guide could fit inside the existing storage guide?
docs/guides/lustre-tuning.md
Outdated
| With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. | ||
|
|
||
| The data itself is subdivided in blocks of size `<blocksize>` and is stored by Object Storage Servers (OSS) in one or more Object Storage Targets (OST). | ||
| The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout | |
| The blocksize and number of OSTs to use is defined by the striping settings. | |
| A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. | |
| For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. | |
| The simplest way to have the correct layout is to copy to a directory with the correct layout. |
docs/guides/lustre-tuning.md
Outdated
| !!! example "Good default settings" | ||
| ```console | ||
| lfs setstripe -E 4M -c 1 -E 64M -c 4 -E -1 -c -1 -S 4M <base_dir> | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: do we know the defaults on capstor/ioppstor? It might be useful for users to know when they should actually care about changing the settings. Or are these "Good default settings" actually our defaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question, I saw some commands like this before but I honestly don't really get what will happen to the file in details, so if you/anyone tested it'd be great to explain it a bit, please ? :)
e.g. does it create 64MB stripes? is this the optimal size? (I recall getting better results reading files a bit larger, but no idea how that translates)
and @msimberg says, if this is a good definition, can we please make it the default?
does writing require different values vs. reading?
QPM might good a time to talk to storage and find someone to run these tests?
Thanks and thank you for submitting this too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition means first 4M use just one OST (less locks, less overhead), until 64M 4 OST, then all OST and 4MB blocksize, I tested it, but not heavily. I discussed already with @mpasserini , but he is not keen on changing the things especially as capstor might be replaced for MLP soon. I am willing to push users that are willing to set things, and if we then have more test coverage really change the default
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
7 similar comments
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
|
preview available: https://docs.tds.cscs.ch/166 |
bcumming
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Fawzi.
These are a good starting point for us to develop more detailed docs over time.
I made a few tweaks and pushed them, to speed up the process.
Added the basic info and commands to improve lustre performance.
I did not describe things that we do not use (like pool), or that normally should not be used like index/offset.