-
Notifications
You must be signed in to change notification settings - Fork 41
Add lustre tuning guide #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
43b73a2
1e0814c
1af02bc
8951a59
ffa006a
899bb7e
eeb39f4
31df88f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,32 @@ | ||||||||||||
| # Lustre Tuning | ||||||||||||
| `/capstor/` and `/iopsstor` are both [lustre](https://lustre.org) filesystem. | ||||||||||||
| Lustre is an open-source, parallel file system used in HPC systems. | ||||||||||||
fawzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||
| As shown in  uses *metadata* servers to store and query metadata which is basically what is shown by `ls`: directory structure, file permission, modification dates,.. | ||||||||||||
|
||||||||||||
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | ||||||||||||
|
||||||||||||
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | |
| This data is globally synchronized, which means that handling many small files is not especially suited for Lustre, and the perfomrance of that part is similar on both capstor and iopsstor. |
here and elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This data is globally synchronized, which means that handling many small files is not especially suited for lustre, and the perfomrance of that part is similar on both capstor and iopsstor. | |
| This data is globally synchronized, which means that handling many small files is not well suited for lustre. |
Leave out the second part? Iopsstor should handle small files slightly better than Capstor, no? I may very well also be mistaken in which case it's obviously good to leave it in in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of this motivation overlaps with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The performance of the metadata part is roughly the same on iopsstor and capstor, it depends on the number of metadata servers and users adding load to it. For a while backup was putting extra load on capstor. Datawriting of many small files might indeed be slightly better on iopsstor because SSD are faster
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| With many small files, a local filesystems like `/dev/shmem/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. | |
| With many small files, an in-memory filesystem like `/dev/shm/$USER` or "/tmp", if enough memory can be spared for it, can be *much* faster, and offset the packing/unpacking work. Alternatively using a squashed filesystems can be a good option. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of overlap with https://docs.cscs.ch/guides/storage/#many-small-files-vs-hpc-file-systems. Can we link or avoid some duplication of the motivation etc.?
Actually, do you think this guide could fit inside the existing storage guide?
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The blocksize and number of OSTs to use is defined by the striping settings. A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. The simplest way to have the correct layout is to copy to a directory with the correct layout | |
| The blocksize and number of OSTs to use is defined by the striping settings. | |
| A new file or directory ihnerits them from its parent directory. The `lfs getstripe <path>` command can be used to get information on the actual stripe settings. | |
| For directories and empty files `lfs setstripe --stripe-count <count> --stripe-size <size> <directory/file>` can be used to set the layout. | |
| The simplest way to have the correct layout is to copy to a directory with the correct layout. |
fawzi marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question: do we know the defaults on capstor/ioppstor? It might be useful for users to know when they should actually care about changing the settings. Or are these "Good default settings" actually our defaults?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same question, I saw some commands like this before but I honestly don't really get what will happen to the file in details, so if you/anyone tested it'd be great to explain it a bit, please ? :)
e.g. does it create 64MB stripes? is this the optimal size? (I recall getting better results reading files a bit larger, but no idea how that translates)
and @msimberg says, if this is a good definition, can we please make it the default?
does writing require different values vs. reading?
QPM might good a time to talk to storage and find someone to run these tests?
Thanks and thank you for submitting this too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition means first 4M use just one OST (less locks, less overhead), until 64M 4 OST, then all OST and 4MB blocksize, I tested it, but not heavily. I discussed already with @mpasserini , but he is not keen on changing the things especially as capstor might be replaced for MLP soon. I am willing to push users that are willing to set things, and if we then have more test coverage really change the default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, do have a quick read of https://docs.cscs.ch/contributing/ for some guidelines on formatting etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed references, merged with storage guide, and added some cross references.