Skip to content

Commit ebd3a83

Browse files
Merge pull request #791 from AaltoSciComp/rkdarst/lustre
triton/usage/lustre: Major update
2 parents cc5c4a1 + b203339 commit ebd3a83

File tree

1 file changed

+44
-64
lines changed

1 file changed

+44
-64
lines changed

triton/usage/lustre.rst

Lines changed: 44 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
=========================
21
Storage: Lustre (scratch)
32
=========================
43

@@ -9,42 +8,40 @@ Storage: Lustre (scratch)
98
Lustre is scalable high performance file system created for HPC. It
109
allows MPI-IO but mainly it provides large storage capacity and high
1110
sequential throughput for cluster applications. Currently the total
12-
capacity is 2PB. The basic idea in Lustre is to spread data in each file
13-
over multiple storage servers. With large (larger than 1GB) files Lustre
14-
will significantly boost the performance.
15-
16-
Working with small files
17-
~~~~~~~~~~~~~~~~~~~~~~~~
11+
capacity is 5PB.
1812

19-
As Lustre is meant for large files, the performance with small (smaller
20-
than 10MB) files will not be optimal. If possible, try to avoid working
21-
with multiple small files.
13+
As you might expect, making a storage system capable of storing
14+
petabytes accessed at tens of gigabytes per second across hundreds of
15+
nodes and users simultaneously is quite a challenge. It works well,
16+
but there are tradeoffs. The basic idea in Lustre is to spread data
17+
in large files over multiple storage servers. Small files can be a
18+
problem, but Triton's scratch is adjusted to mitigate it somewhat.
19+
With large (larger than 1GB) files Lustre will significantly boost the
20+
performance.
2221

23-
**Note: Triton has a default stripe of 1 already, so it is by default
24-
optimized for small files (but it's still not that great). If you use
25-
large files, see below.**
22+
.. important::
2623

27-
If small files are needed (i.e. source codes) you can tell Lustre not to
28-
spread data over all the nodes. This will help in performance.
24+
More often than not, when "Triton is down", it's Lustre (scratch)
25+
being down. If you do normal-sized work this usually isn't a
26+
problem. If you do big data-intensive work, please pay attention
27+
to storage and ask for help early.
2928

30-
To see the striping for any given file or directory you can use
31-
following command to check status
3229

33-
::
34-
35-
lfs getstripe -d /scratch/path/to/dir
30+
Working with small files
31+
------------------------
3632

37-
You can not change the striping of an existing file, but you can change
38-
the striping of new files created in a directory, then copy the file to
39-
a new name in that directory.
33+
As Lustre is meant for large files, the performance with small (smaller
34+
than 10MB) files will not be optimal. If possible, try to avoid working
35+
with large numbers of small files. Large numbers is greater than
36+
thousands or tens of thousands.
4037

41-
::
38+
.. seealso::
4239

43-
lfs setstripe -c 1 /scratch/path/to/dir
44-
cp somefile /scratch/path/to/dir/newfile
40+
* :doc:`smallfiles`, a dedicated page on handling small files.
41+
* :doc:`localstorage`, a page explaining how to use compute node
42+
local drives to unpack archives with many small files to get
43+
better performance.
4544

46-
Working with lots of small files
47-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4845

4946
Large datasets which consist mostly of small (<1MB) files can be slow to
5047
process because of network overhead associated with individual files. If
@@ -53,49 +50,32 @@ drives <localstorage>` page, see the ``tar`` example
5350
over there or find some other way to compact your files together into
5451
one.
5552

56-
Working with large files
57-
~~~~~~~~~~~~~~~~~~~~~~~~
58-
59-
By default Lustre on Triton is configured to stripe a single file over a
60-
single OST. This provides the best performance for small files, serial
61-
programs, parallel programs where only one process is doing I/O, and
62-
parallel programs using a file-per-process file I/O pattern. However,
63-
when working with large files (>> 10 GB), particularly if they are
64-
accessed in parallel from multiple processes in a parallel application,
65-
it can be advantageous to stripe over several OST's. In this case the
66-
easiest way is to create a directory for the large file(s), and set the
67-
striping parameters for any files subsequently created in that
68-
directory:
69-
70-
::
71-
72-
cd $WRKDIR
73-
mkdir large_file
74-
lfs setstripe -c 4 large_file
7553

76-
The above creates a directory ``large_file`` and specifies that files
77-
created inside that directory will be striped over 4 OST's. For really
78-
really large files (hundreds of GB's) accessed in parallel from very
79-
large MPI runs, set the stripe count to "-1" which tells the system to
80-
stripe over all the available OST's.
54+
Working with large files
55+
------------------------
8156

82-
To reset back to the default settings, run
57+
By default Lustre on Triton is configured so that as files grow
58+
larger, they get `striped
59+
<https://en.wikipedia.org/wiki/Data_striping>`__ (split) over more
60+
storage servers. This way, small files only require one server to
61+
serve the file (reducing latency), while large files can be streamed
62+
over multiple disks.
8363

84-
::
64+
This page previously had instructions for how to adjust the striping
65+
of files yourself, but it is now automatic.
8566

86-
lfs setstripe -d path/to/directory
8767

8868
Lustre: common recommendations
89-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90-
91-
- Minimize use of ``ls -l`` and ``ls --color`` when possible
69+
------------------------------
9270

93-
Several excellent recommendations are at
71+
Triton's Lustre is much better than it was 10 years ago, but it's
72+
still worth thinking about the following things:
9473

95-
- https://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html
96-
- http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips.
74+
Minimize use of ``ls -l`` and ``ls --color`` when possible.
9775

98-
They are fully applicable to our case.
76+
Several excellent recommendations are at
77+
https://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html
78+
, they are fully applicable to our case.
9979

10080
Be aware, that being a high performance filesystem Lustre still has its
10181
own bottlenecks, and even non-proper a usage by a single user can get
@@ -104,5 +84,5 @@ avoid those potential situations. Common Lustre troublemakers are
10484
``ls -lR``, creating many small files, ``rm -rf``, small random i/o,
10585
heavy bulk i/o.
10686

107-
For advanced user, these slides can be interesting:
108-
https://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck.pdf
87+
For advanced user, these slides (from 2012) can be interesting:
88+
https://www.eofs.eu/wp-content/uploads/2024/02/06_daniel_kobras_s_c_lustre_fs_bottleneck.pdf

0 commit comments

Comments
 (0)