|
| 1 | +=============== |
| 2 | +flux-archive(1) |
| 3 | +=============== |
| 4 | + |
| 5 | + |
| 6 | +SYNOPSIS |
| 7 | +======== |
| 8 | + |
| 9 | +| **flux** **archive** **create** [*-n NAME*] [*-C DIR*] *PATH* ... |
| 10 | +| **flux** **archive** **list** [*-n NAME*] [*--long*] [*PATTERN*] |
| 11 | +| **flux** **archive** **extract** [*-n NAME*] [*-C DIR*] [*PATTERN*] |
| 12 | +| **flux** **archive** **remove** [*-n NAME*] [*-f*] |
| 13 | +
|
| 14 | + |
| 15 | +DESCRIPTION |
| 16 | +=========== |
| 17 | + |
| 18 | +.. program:: flux archive |
| 19 | + |
| 20 | +:program:`flux archive` stores multiple files in an RFC 37 archive |
| 21 | +under a single KVS key. The archive can be efficiently extracted in |
| 22 | +parallel across the Flux instance, leveraging the scalability properties |
| 23 | +of the KVS and its content addressable data store. |
| 24 | + |
| 25 | +Sparse files such as file system images for virtual machines are archived |
| 26 | +efficiently. |
| 27 | + |
| 28 | +File discretionary access permissions are preserved, but file attributes, |
| 29 | +ACLs, and group ownership are not. |
| 30 | + |
| 31 | +The ``stage-in`` shell plugin described in :man1:`flux-shell` may be used to |
| 32 | +extract previously archived files into the directory referred to by |
| 33 | +:envvar:`FLUX_JOB_TMPDIR` or another directory. |
| 34 | + |
| 35 | +Due to the potential impact on Flux's storage footprint on rank 0, this |
| 36 | +command is limited to instance owners, e.g. it works in batch jobs and |
| 37 | +allocations but not at the system level. |
| 38 | + |
| 39 | + |
| 40 | +COMMANDS |
| 41 | +======== |
| 42 | + |
| 43 | +create |
| 44 | +------ |
| 45 | + |
| 46 | +.. program:: flux archive create |
| 47 | + |
| 48 | +:program:`flux archive create` archives one or more file *PATH* arguments. |
| 49 | +If a *PATH* refers to a directory, the directory is recursively archived. |
| 50 | +If a file is encountered that is not readable, or has a type other than |
| 51 | +regular file, directory, or symbolic link, a fatal error occurs. |
| 52 | + |
| 53 | +.. option:: -n, --name=NAME |
| 54 | + |
| 55 | + Specify the archive name. If a name is not specified, ``main`` is used. |
| 56 | + |
| 57 | + The archive will be committed to the KVS as ``archive.NAME``. |
| 58 | + |
| 59 | +.. option:: --overwrite |
| 60 | + |
| 61 | + Unlink ``archive.NAME`` and ``archive.NAME_blobs`` from the KVS, and |
| 62 | + unmap all files associated with *NAME* that were previously archived |
| 63 | + with :option:`--mmap` before creating the archive. |
| 64 | + |
| 65 | + Without :option:`--overwrite` or :option:`--append`, it is a fatal error |
| 66 | + if *NAME* exists. |
| 67 | + |
| 68 | +.. option:: --append |
| 69 | + |
| 70 | + If *NAME* exists, append new content to the existing archive. |
| 71 | + Otherwise create it from scratch. |
| 72 | + |
| 73 | + Due to the way the KVS handles key changes, appending :math:`M` bytes to |
| 74 | + to a key of size :math:`N` consumes roughly :math:`2N + M` bytes in the |
| 75 | + backing store, while separate keys consume :math:`N + M`. As a consequence, |
| 76 | + creating multiple archives may be cheaper than building one iteratively |
| 77 | + with :option:`--append`. |
| 78 | + |
| 79 | +.. option:: -C, --directory=DIR |
| 80 | + |
| 81 | + Change to the specified directory before performing the operation. |
| 82 | + |
| 83 | +.. option:: --no-force-primary |
| 84 | + |
| 85 | + Create the archive in the default KVS namespace, honoring |
| 86 | + :envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS |
| 87 | + namespace is used. |
| 88 | + |
| 89 | +.. option:: --preserve |
| 90 | + |
| 91 | + Write additional KVS metadata so that the archive remains intact across |
| 92 | + a Flux restart with garbage collection. |
| 93 | + |
| 94 | + The metadata will be committed to the KVS as ``archive.NAME_blobs``. |
| 95 | + |
| 96 | +.. option:: -v, --verbose=[LEVEL] |
| 97 | + |
| 98 | + List files on standard error as the archive is created. |
| 99 | + |
| 100 | +.. option:: --chunksize=N |
| 101 | + |
| 102 | + Limit the content blob size to N bytes. Set to 0 for unlimited. |
| 103 | + N may be specified as a floating point number with multiplicative suffix |
| 104 | + k,K=1024, M=1024\*1024, or G=1024\*1024\*1024 up to ``INT_MAX``. |
| 105 | + The default is 1M. |
| 106 | + |
| 107 | +.. option:: --small-file-threshold=N |
| 108 | + |
| 109 | + Set the threshold in bytes for a small file. A small file is represented |
| 110 | + directly in the archive, as opposed to the content store. Set to 0 to |
| 111 | + always use the content store. N may be specified as a floating point |
| 112 | + number with multiplicative suffix k,K=1024, M=1024\*1024, or |
| 113 | + G=1024\*1024\*1024 up to ``INT_MAX``. The default is 1K. |
| 114 | + |
| 115 | +.. option:: --mmap |
| 116 | + |
| 117 | + For large files, use :linux:man2:`mmap` to map file data into the content |
| 118 | + store rather than copying it. This only works on rank 0, and does not work |
| 119 | + with :option:`--preserve` or :option:`--no-force-primary`. Furthermore, |
| 120 | + the files must remain available and unchanged while the archive exists. |
| 121 | + This is most appropriate for truly large files such as VM images. |
| 122 | + |
| 123 | + .. warning:: |
| 124 | + |
| 125 | + The rank 0 Flux broker may die with a SIGBUS error if a mapped file is |
| 126 | + removed or truncated, and subsequently accessed, since that renders |
| 127 | + pages mapped into the brokers address space invalid. |
| 128 | + |
| 129 | + If mapped file content changes, access may fail if the original data |
| 130 | + is not cached, but under no circumstances will the new content be |
| 131 | + returned. |
| 132 | + |
| 133 | +list |
| 134 | +---- |
| 135 | + |
| 136 | +.. program:: flux archive list |
| 137 | + |
| 138 | +:program:`flux archive list` shows the archive contents on standard output. |
| 139 | +If *PATTERN* is specified, only the files that match the :linux:man7:`glob` |
| 140 | +pattern are listed. |
| 141 | + |
| 142 | +.. option:: -l, --long |
| 143 | + |
| 144 | + List the archive in long form, including file type, mode, and size. |
| 145 | + |
| 146 | +.. option:: --raw |
| 147 | + |
| 148 | + List the RFC 37 file objects in JSON form, without decoding. |
| 149 | + |
| 150 | +.. option:: -n, --name=NAME |
| 151 | + |
| 152 | + Specify the archive name. If a name is not specified, ``main`` is used. |
| 153 | + |
| 154 | +.. option:: --no-force-primary |
| 155 | + |
| 156 | + List the archive in the default KVS namespace, honoring |
| 157 | + :envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS |
| 158 | + namespace is used. |
| 159 | + |
| 160 | +remove |
| 161 | +------ |
| 162 | + |
| 163 | +.. program:: flux archive remove |
| 164 | + |
| 165 | +:program:`flux archive remove` expunges an archive. The archive's KVS keys |
| 166 | +are unlinked, and any files previously mapped with :option:`--mmap` are |
| 167 | +unmapped. |
| 168 | + |
| 169 | +.. option:: -n, --name=NAME |
| 170 | + |
| 171 | + Specify the archive name. If a name is not specified, ``main`` is used. |
| 172 | + |
| 173 | +.. option:: --no-force-primary |
| 174 | + |
| 175 | + Remove the archive in the default KVS namespace, honoring |
| 176 | + :envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS |
| 177 | + namespace is used. |
| 178 | + |
| 179 | +.. option:: -f, --force |
| 180 | + |
| 181 | + Don't fail if the archive does not exist. |
| 182 | + |
| 183 | + |
| 184 | +extract |
| 185 | +------- |
| 186 | + |
| 187 | +.. program:: flux archive extract |
| 188 | + |
| 189 | +:program:`flux archive extract` extracts files from a KVS archive. |
| 190 | +If *PATTERN* is specified, only the files that match the :linux:man7:`glob` |
| 191 | +pattern are extracted. |
| 192 | + |
| 193 | +.. option:: -t, --list-only |
| 194 | + |
| 195 | + List files in the archive without extracting. |
| 196 | + |
| 197 | +.. option:: -n, --name=NAME |
| 198 | + |
| 199 | + Specify the archive name. If a name is not specified, ``main`` is used. |
| 200 | + |
| 201 | +.. option:: -C, --directory=DIR |
| 202 | + |
| 203 | + Change to the specified directory before performing the operation. |
| 204 | + |
| 205 | + When extracting files in parallel, take care when specifying *DIR*: |
| 206 | + |
| 207 | + - It should have enough space to hold the extracted files. |
| 208 | + |
| 209 | + - It should not be a fragile network file system such that parallel |
| 210 | + writes could cause a distributed denial of service. |
| 211 | + |
| 212 | + - It should not already be shared among the nodes of your job. |
| 213 | + |
| 214 | +.. option:: -v, --verbose=[LEVEL] |
| 215 | + |
| 216 | + List files on standard error as the archive is extracted. |
| 217 | + |
| 218 | +.. option:: --overwrite |
| 219 | + |
| 220 | + Overwrite existing files when extracting. :program:`flux archive extract` |
| 221 | + normally refuses to do this and treats it as a fatal error. |
| 222 | + |
| 223 | +.. option:: --waitcreate[=FSD] |
| 224 | + |
| 225 | + Wait for the archive key to appear in the KVS if it doesn't exist. |
| 226 | + This may be necessary in some circumstances as noted in `CAVEATS`_ |
| 227 | + below. |
| 228 | + |
| 229 | + If *FSD* is specified, it is interpreted as a timeout value in RFC 23 |
| 230 | + Flux Standard Duration format. |
| 231 | + |
| 232 | +.. option:: --no-force-primary |
| 233 | + |
| 234 | + Extract from the archive in the default KVS namespace, honoring |
| 235 | + :envvar:`FLUX_KVS_NAMESPACE`, if set. By default, the primary KVS |
| 236 | + namespace is used. |
| 237 | + |
| 238 | +CAVEATS |
| 239 | +======= |
| 240 | + |
| 241 | +The KVS employs an "eventually consistent" cache update model, which |
| 242 | +means one has to be careful when writing a key on one broker rank and |
| 243 | +reading it on other broker ranks. Without some form of synchronization, |
| 244 | +there is a short period of time where the KVS cache on the other ranks |
| 245 | +may not yet have the new data. |
| 246 | + |
| 247 | +This is not an issue for Example 2 below, where a batch script creates |
| 248 | +an archive, then submits jobs that read the archive because job |
| 249 | +submission and execution already include KVS synchronization. |
| 250 | +In other situations such as Example 1, it is advisable to use |
| 251 | +:option:`--waitcreate` or to explicitly synchronize between writing |
| 252 | +the archive and reading it, e.g. |
| 253 | + |
| 254 | +.. code-block:: console |
| 255 | +
|
| 256 | + flux exec -r all flux kvs wait $(flux kvs version) |
| 257 | +
|
| 258 | +
|
| 259 | +EXAMPLES |
| 260 | +======== |
| 261 | + |
| 262 | +Example 1: a batch script that archives data from ``/project/dataset1``, then |
| 263 | +replicates it in a temporary directory on each node of the batch allocation |
| 264 | +where it can be used by multiple jobs. |
| 265 | + |
| 266 | +.. code-block:: console |
| 267 | +
|
| 268 | + #!/bin/bash |
| 269 | +
|
| 270 | + flux archive create -C /project dataset1 |
| 271 | + flux exec -r all mkdir -p /tmp/project |
| 272 | + flux exec -r all flux archive extract --waitcreate -C /tmp/project |
| 273 | +
|
| 274 | + # app1 and app2 have access to local copy of dataset1 |
| 275 | + flux run -N1024 app1 --input=/tmp/project/dataset1 |
| 276 | + flux run -N1024 app2 --input=/tmp/project/dataset1 |
| 277 | +
|
| 278 | + # clean up |
| 279 | + flux exec -r all rm -rf /tmp/project |
| 280 | + flux archive remove |
| 281 | +
|
| 282 | +Example 2: a batch script that archives a large executable and a data set, |
| 283 | +then uses the ``stage-in`` shell plugin to copy them to |
| 284 | +:envvar:`FLUX_JOB_TMPDIR` which is automatically cleaned up after each job. |
| 285 | + |
| 286 | +.. code-block:: console |
| 287 | +
|
| 288 | + #!/bin/bash |
| 289 | +
|
| 290 | + flux archive create --name=dataset1 -C /project dataset1 |
| 291 | + flux archive create --name=app --mmap -C /home/fred app |
| 292 | +
|
| 293 | + flux run -N1024 -o stage-in.names=app,dataset1 \ |
| 294 | + {{tmpdir}}/app --input={{tmpdir}}/dataset1 |
| 295 | +
|
| 296 | + # clean up |
| 297 | + flux archive remove --name=dataset1 |
| 298 | + flux archive remove --name=app |
| 299 | +
|
| 300 | +
|
| 301 | +RESOURCES |
| 302 | +========= |
| 303 | + |
| 304 | +.. include:: common/resources.rst |
| 305 | + |
| 306 | +FLUX RFC |
| 307 | +======== |
| 308 | + |
| 309 | +| :doc:`rfc:spec_16` |
| 310 | +| :doc:`rfc:spec_23` |
| 311 | +| :doc:`rfc:spec_37` |
| 312 | +
|
| 313 | + |
| 314 | +SEE ALSO |
| 315 | +======== |
| 316 | + |
| 317 | +:man1:`flux-shell`, :man1:`flux-kvs`, :man1:`flux-exec` |
0 commit comments