Skip to content

Commit 102eb14

Browse files
authored
Merge pull request #5701 from garlick/kvs_archive
flux-archive: add new command for file broadcast
2 parents 21f5d30 + 61cae34 commit 102eb14

28 files changed

+1858
-1283
lines changed

doc/Makefile.am

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ MAN1_FILES_PRIMARY = \
2626
man1/flux-getattr.1 \
2727
man1/flux-dmesg.1 \
2828
man1/flux-dump.1 \
29-
man1/flux-filemap.1 \
29+
man1/flux-archive.1 \
3030
man1/flux-content.1 \
3131
man1/flux-config.1 \
3232
man1/flux-proxy.1 \

doc/man1/common/job-shell-options.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
- Set PMI service(s) for launched programs (:option:`off|simple|LIST`)
2424

2525
* - :option:`stage-in`
26-
- Copy files previously mapped with :man1:`flux-filemap` to
26+
- Copy files previously mapped with :man1:`flux-archive` to
2727
:envvar:`FLUX_JOB_TMPDIR`.
2828

2929
* - :option:`pty.interactive`

doc/man1/flux-archive.rst

Lines changed: 317 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
===============
2+
flux-archive(1)
3+
===============
4+
5+
6+
SYNOPSIS
7+
========
8+
9+
| **flux** **archive** **create** [*-n NAME*] [*-C DIR*] *PATH* ...
10+
| **flux** **archive** **list** [*-n NAME*] [*--long*] [*PATTERN*]
11+
| **flux** **archive** **extract** [*-n NAME*] [*-C DIR*] [*PATTERN*]
12+
| **flux** **archive** **remove** [*-n NAME*] [*-f*]
13+
14+
15+
DESCRIPTION
16+
===========
17+
18+
.. program:: flux archive
19+
20+
:program:`flux archive` stores multiple files in an RFC 37 archive
21+
under a single KVS key. The archive can be efficiently extracted in
22+
parallel across the Flux instance, leveraging the scalability properties
23+
of the KVS and its content addressable data store.
24+
25+
Sparse files such as file system images for virtual machines are archived
26+
efficiently.
27+
28+
File discretionary access permissions are preserved, but file attributes,
29+
ACLs, and group ownership are not.
30+
31+
The ``stage-in`` shell plugin described in :man1:`flux-shell` may be used to
32+
extract previously archived files into the directory referred to by
33+
:envvar:`FLUX_JOB_TMPDIR` or another directory.
34+
35+
Due to the potential impact on Flux's storage footprint on rank 0, this
36+
command is limited to instance owners, e.g. it works in batch jobs and
37+
allocations but not at the system level.
38+
39+
40+
COMMANDS
41+
========
42+
43+
create
44+
------
45+
46+
.. program:: flux archive create
47+
48+
:program:`flux archive create` archives one or more file *PATH* arguments.
49+
If a *PATH* refers to a directory, the directory is recursively archived.
50+
If a file is encountered that is not readable, or has a type other than
51+
regular file, directory, or symbolic link, a fatal error occurs.
52+
53+
.. option:: -n, --name=NAME
54+
55+
Specify the archive name. If a name is not specified, ``main`` is used.
56+
57+
The archive will be committed to the KVS as ``archive.NAME``.
58+
59+
.. option:: --overwrite
60+
61+
Unlink ``archive.NAME`` and ``archive.NAME_blobs`` from the KVS, and
62+
unmap all files associated with *NAME* that were previously archived
63+
with :option:`--mmap` before creating the archive.
64+
65+
Without :option:`--overwrite` or :option:`--append`, it is a fatal error
66+
if *NAME* exists.
67+
68+
.. option:: --append
69+
70+
If *NAME* exists, append new content to the existing archive.
71+
Otherwise create it from scratch.
72+
73+
Due to the way the KVS handles key changes, appending :math:`M` bytes to
74+
to a key of size :math:`N` consumes roughly :math:`2N + M` bytes in the
75+
backing store, while separate keys consume :math:`N + M`. As a consequence,
76+
creating multiple archives may be cheaper than building one iteratively
77+
with :option:`--append`.
78+
79+
.. option:: -C, --directory=DIR
80+
81+
Change to the specified directory before performing the operation.
82+
83+
.. option:: --no-force-primary
84+
85+
Create the archive in the default KVS namespace, honoring
86+
:envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS
87+
namespace is used.
88+
89+
.. option:: --preserve
90+
91+
Write additional KVS metadata so that the archive remains intact across
92+
a Flux restart with garbage collection.
93+
94+
The metadata will be committed to the KVS as ``archive.NAME_blobs``.
95+
96+
.. option:: -v, --verbose=[LEVEL]
97+
98+
List files on standard error as the archive is created.
99+
100+
.. option:: --chunksize=N
101+
102+
Limit the content blob size to N bytes. Set to 0 for unlimited.
103+
N may be specified as a floating point number with multiplicative suffix
104+
k,K=1024, M=1024\*1024, or G=1024\*1024\*1024 up to ``INT_MAX``.
105+
The default is 1M.
106+
107+
.. option:: --small-file-threshold=N
108+
109+
Set the threshold in bytes for a small file. A small file is represented
110+
directly in the archive, as opposed to the content store. Set to 0 to
111+
always use the content store. N may be specified as a floating point
112+
number with multiplicative suffix k,K=1024, M=1024\*1024, or
113+
G=1024\*1024\*1024 up to ``INT_MAX``. The default is 1K.
114+
115+
.. option:: --mmap
116+
117+
For large files, use :linux:man2:`mmap` to map file data into the content
118+
store rather than copying it. This only works on rank 0, and does not work
119+
with :option:`--preserve` or :option:`--no-force-primary`. Furthermore,
120+
the files must remain available and unchanged while the archive exists.
121+
This is most appropriate for truly large files such as VM images.
122+
123+
.. warning::
124+
125+
The rank 0 Flux broker may die with a SIGBUS error if a mapped file is
126+
removed or truncated, and subsequently accessed, since that renders
127+
pages mapped into the brokers address space invalid.
128+
129+
If mapped file content changes, access may fail if the original data
130+
is not cached, but under no circumstances will the new content be
131+
returned.
132+
133+
list
134+
----
135+
136+
.. program:: flux archive list
137+
138+
:program:`flux archive list` shows the archive contents on standard output.
139+
If *PATTERN* is specified, only the files that match the :linux:man7:`glob`
140+
pattern are listed.
141+
142+
.. option:: -l, --long
143+
144+
List the archive in long form, including file type, mode, and size.
145+
146+
.. option:: --raw
147+
148+
List the RFC 37 file objects in JSON form, without decoding.
149+
150+
.. option:: -n, --name=NAME
151+
152+
Specify the archive name. If a name is not specified, ``main`` is used.
153+
154+
.. option:: --no-force-primary
155+
156+
List the archive in the default KVS namespace, honoring
157+
:envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS
158+
namespace is used.
159+
160+
remove
161+
------
162+
163+
.. program:: flux archive remove
164+
165+
:program:`flux archive remove` expunges an archive. The archive's KVS keys
166+
are unlinked, and any files previously mapped with :option:`--mmap` are
167+
unmapped.
168+
169+
.. option:: -n, --name=NAME
170+
171+
Specify the archive name. If a name is not specified, ``main`` is used.
172+
173+
.. option:: --no-force-primary
174+
175+
Remove the archive in the default KVS namespace, honoring
176+
:envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS
177+
namespace is used.
178+
179+
.. option:: -f, --force
180+
181+
Don't fail if the archive does not exist.
182+
183+
184+
extract
185+
-------
186+
187+
.. program:: flux archive extract
188+
189+
:program:`flux archive extract` extracts files from a KVS archive.
190+
If *PATTERN* is specified, only the files that match the :linux:man7:`glob`
191+
pattern are extracted.
192+
193+
.. option:: -t, --list-only
194+
195+
List files in the archive without extracting.
196+
197+
.. option:: -n, --name=NAME
198+
199+
Specify the archive name. If a name is not specified, ``main`` is used.
200+
201+
.. option:: -C, --directory=DIR
202+
203+
Change to the specified directory before performing the operation.
204+
205+
When extracting files in parallel, take care when specifying *DIR*:
206+
207+
- It should have enough space to hold the extracted files.
208+
209+
- It should not be a fragile network file system such that parallel
210+
writes could cause a distributed denial of service.
211+
212+
- It should not already be shared among the nodes of your job.
213+
214+
.. option:: -v, --verbose=[LEVEL]
215+
216+
List files on standard error as the archive is extracted.
217+
218+
.. option:: --overwrite
219+
220+
Overwrite existing files when extracting. :program:`flux archive extract`
221+
normally refuses to do this and treats it as a fatal error.
222+
223+
.. option:: --waitcreate[=FSD]
224+
225+
Wait for the archive key to appear in the KVS if it doesn't exist.
226+
This may be necessary in some circumstances as noted in `CAVEATS`_
227+
below.
228+
229+
If *FSD* is specified, it is interpreted as a timeout value in RFC 23
230+
Flux Standard Duration format.
231+
232+
.. option:: --no-force-primary
233+
234+
Extract from the archive in the default KVS namespace, honoring
235+
:envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS
236+
namespace is used.
237+
238+
CAVEATS
239+
=======
240+
241+
The KVS employs an "eventually consistent" cache update model, which
242+
means one has to be careful when writing a key on one broker rank and
243+
reading it on other broker ranks. Without some form of synchronization,
244+
there is a short period of time where the KVS cache on the other ranks
245+
may not yet have the new data.
246+
247+
This is not an issue for Example 2 below, where a batch script creates
248+
an archive, then submits jobs that read the archive because job
249+
submission and execution already include KVS synchronization.
250+
In other situations such as Example 1, it is advisable to use
251+
:option:`--waitcreate` or to explicitly synchronize between writing
252+
the archive and reading it, e.g.
253+
254+
.. code-block:: console
255+
256+
flux exec -r all flux kvs wait $(flux kvs version)
257+
258+
259+
EXAMPLES
260+
========
261+
262+
Example 1: a batch script that archives data from ``/project/dataset1``, then
263+
replicates it in a temporary directory on each node of the batch allocation
264+
where it can be used by multiple jobs.
265+
266+
.. code-block:: console
267+
268+
#!/bin/bash
269+
270+
flux archive create -C /project dataset1
271+
flux exec -r all mkdir -p /tmp/project
272+
flux exec -r all flux archive extract --waitcreate -C /tmp/project
273+
274+
# app1 and app2 have access to local copy of dataset1
275+
flux run -N1024 app1 --input=/tmp/project/dataset1
276+
flux run -N1024 app2 --input=/tmp/project/dataset1
277+
278+
# clean up
279+
flux exec -r all rm -rf /tmp/project
280+
flux archive remove
281+
282+
Example 2: a batch script that archives a large executable and a data set,
283+
then uses the ``stage-in`` shell plugin to copy them to
284+
:envvar:`FLUX_JOB_TMPDIR` which is automatically cleaned up after each job.
285+
286+
.. code-block:: console
287+
288+
#!/bin/bash
289+
290+
flux archive create --name=dataset1 -C /project dataset1
291+
flux archive create --name=app --mmap -C /home/fred app
292+
293+
flux run -N1024 -o stage-in.names=app,dataset1 \
294+
{{tmpdir}}/app --input={{tmpdir}}/dataset1
295+
296+
# clean up
297+
flux archive remove --name=dataset1
298+
flux archive remove --name=app
299+
300+
301+
RESOURCES
302+
=========
303+
304+
.. include:: common/resources.rst
305+
306+
FLUX RFC
307+
========
308+
309+
| :doc:`rfc:spec_16`
310+
| :doc:`rfc:spec_23`
311+
| :doc:`rfc:spec_37`
312+
313+
314+
SEE ALSO
315+
========
316+
317+
:man1:`flux-shell`, :man1:`flux-kvs`, :man1:`flux-exec`

0 commit comments

Comments
 (0)