1- =========================
21Storage: Lustre (scratch)
32=========================
43
@@ -9,42 +8,40 @@ Storage: Lustre (scratch)
98Lustre is scalable high performance file system created for HPC. It
109allows MPI-IO but mainly it provides large storage capacity and high
1110sequential throughput for cluster applications. Currently the total
12- capacity is 2PB. The basic idea in Lustre is to spread data in each file
13- over multiple storage servers. With large (larger than 1GB) files Lustre
14- will significantly boost the performance.
15-
16- Working with small files
17- ~~~~~~~~~~~~~~~~~~~~~~~~
11+ capacity is 5PB.
1812
19- As Lustre is meant for large files, the performance with small (smaller
20- than 10MB) files will not be optimal. If possible, try to avoid working
21- with multiple small files.
13+ As you might expect, making a storage system capable of storing
14+ petabytes accessed at tens of gigabytes per second across hundreds of
15+ nodes and users simultaneously is quite a challenge. It works well,
16+ but there are tradeoffs. The basic idea in Lustre is to spread data
17+ in large files over multiple storage servers. Small files can be a
18+ problem, but Triton's scratch is adjusted to mitigate it somewhat.
19+ With large (larger than 1GB) files Lustre will significantly boost the
20+ performance.
2221
23- **Note: Triton has a default stripe of 1 already, so it is by default
24- optimized for small files (but it's still not that great). If you use
25- large files, see below. **
22+ .. important ::
2623
27- If small files are needed (i.e. source codes) you can tell Lustre not to
28- spread data over all the nodes. This will help in performance.
24+ More often than not, when "Triton is down", it's Lustre (scratch)
25+ being down. If you do normal-sized work this usually isn't a
26+ problem. If you do big data-intensive work, please pay attention
27+ to storage and ask for help early.
2928
30- To see the striping for any given file or directory you can use
31- following command to check status
3229
33- ::
34-
35- lfs getstripe -d /scratch/path/to/dir
30+ Working with small files
31+ ------------------------
3632
37- You can not change the striping of an existing file, but you can change
38- the striping of new files created in a directory, then copy the file to
39- a new name in that directory.
33+ As Lustre is meant for large files, the performance with small (smaller
34+ than 10MB) files will not be optimal. If possible, try to avoid working
35+ with large numbers of small files. Large numbers is greater than
36+ thousands or tens of thousands.
4037
41- ::
38+ .. seealso ::
4239
43- lfs setstripe -c 1 /scratch/path/to/dir
44- cp somefile /scratch/path/to/dir/newfile
40+ * :doc: `smallfiles `, a dedicated page on handling small files.
41+ * :doc: `localstorage `, a page explaining how to use compute node
42+ local drives to unpack archives with many small files to get
43+ better performance.
4544
46- Working with lots of small files
47- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4845
4946Large datasets which consist mostly of small (<1MB) files can be slow to
5047process because of network overhead associated with individual files. If
@@ -53,49 +50,32 @@ drives <localstorage>` page, see the ``tar`` example
5350over there or find some other way to compact your files together into
5451one.
5552
56- Working with large files
57- ~~~~~~~~~~~~~~~~~~~~~~~~
58-
59- By default Lustre on Triton is configured to stripe a single file over a
60- single OST. This provides the best performance for small files, serial
61- programs, parallel programs where only one process is doing I/O, and
62- parallel programs using a file-per-process file I/O pattern. However,
63- when working with large files (>> 10 GB), particularly if they are
64- accessed in parallel from multiple processes in a parallel application,
65- it can be advantageous to stripe over several OST's. In this case the
66- easiest way is to create a directory for the large file(s), and set the
67- striping parameters for any files subsequently created in that
68- directory:
69-
70- ::
71-
72- cd $WRKDIR
73- mkdir large_file
74- lfs setstripe -c 4 large_file
7553
76- The above creates a directory ``large_file `` and specifies that files
77- created inside that directory will be striped over 4 OST's. For really
78- really large files (hundreds of GB's) accessed in parallel from very
79- large MPI runs, set the stripe count to "-1" which tells the system to
80- stripe over all the available OST's.
54+ Working with large files
55+ ------------------------
8156
82- To reset back to the default settings, run
57+ By default Lustre on Triton is configured so that as files grow
58+ larger, they get `striped
59+ <https://en.wikipedia.org/wiki/Data_striping> `__ (split) over more
60+ storage servers. This way, small files only require one server to
61+ serve the file (reducing latency), while large files can be streamed
62+ over multiple disks.
8363
84- ::
64+ This page previously had instructions for how to adjust the striping
65+ of files yourself, but it is now automatic.
8566
86- lfs setstripe -d path/to/directory
8767
8868Lustre: common recommendations
89- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
90-
91- - Minimize use of ``ls -l `` and ``ls --color `` when possible
69+ ------------------------------
9270
93- Several excellent recommendations are at
71+ Triton's Lustre is much better than it was 10 years ago, but it's
72+ still worth thinking about the following things:
9473
95- - https://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html
96- - http://www.nics.tennessee.edu/computing-resources/file-systems/io-lustre-tips.
74+ Minimize use of ``ls -l `` and ``ls --color `` when possible.
9775
98- They are fully applicable to our case.
76+ Several excellent recommendations are at
77+ https://www.nas.nasa.gov/hecc/support/kb/Lustre-Best-Practices_226.html
78+ , they are fully applicable to our case.
9979
10080Be aware, that being a high performance filesystem Lustre still has its
10181own bottlenecks, and even non-proper a usage by a single user can get
@@ -104,5 +84,5 @@ avoid those potential situations. Common Lustre troublemakers are
10484``ls -lR ``, creating many small files, ``rm -rf ``, small random i/o,
10585heavy bulk i/o.
10686
107- For advanced user, these slides can be interesting:
108- https://www.eofs.eu/fileadmin/lad2012/06_Daniel_Kobras_S_C_Lustre_FS_Bottleneck .pdf
87+ For advanced user, these slides (from 2012) can be interesting:
88+ https://www.eofs.eu/wp-content/uploads/2024/02/06_daniel_kobras_s_c_lustre_fs_bottleneck .pdf
0 commit comments