|
| 1 | +.. github display |
| 2 | + GitHub is NOT the preferred viewer for this file. Please visit |
| 3 | + https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_37.html |
| 4 | +
|
| 5 | +###################### |
| 6 | +37/File Archive Format |
| 7 | +###################### |
| 8 | + |
| 9 | +The File Archive Format defines a JSON representation of a set or list |
| 10 | +of file system objects. |
| 11 | + |
| 12 | +- Name: github.com/flux-framework/rfc/spec_37.rst |
| 13 | + |
| 14 | +- Editor: Jim Garlick < [email protected]> |
| 15 | + |
| 16 | +- State: raw |
| 17 | + |
| 18 | +******** |
| 19 | +Language |
| 20 | +******** |
| 21 | + |
| 22 | +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", |
| 23 | +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to |
| 24 | +be interpreted as described in `RFC 2119 <https://tools.ietf.org/html/rfc2119>`__. |
| 25 | + |
| 26 | +***************** |
| 27 | +Related Standards |
| 28 | +***************** |
| 29 | + |
| 30 | +- :doc:`10/Content Storage <spec_10>` |
| 31 | + |
| 32 | +- :doc:`14/Canonical Job Specification <spec_14>` |
| 33 | + |
| 34 | +********** |
| 35 | +Background |
| 36 | +********** |
| 37 | + |
| 38 | +The File Archive Format is a container of file system objects envisioned for: |
| 39 | + |
| 40 | +- Batch scripts, configuration files, and user defined input files embedded |
| 41 | + in RFC 14 jobspec. |
| 42 | + |
| 43 | +- Stage-in and stage-out of job data sets. |
| 44 | + |
| 45 | +***** |
| 46 | +Goals |
| 47 | +***** |
| 48 | + |
| 49 | +- Allow an archive to be defined as either a *set* or a *list*, as appropriate |
| 50 | + for the circumstances. |
| 51 | + |
| 52 | +- Represent regular files, directories, and symbolic links. |
| 53 | + |
| 54 | +- Support self-contained representation of file content. |
| 55 | + |
| 56 | +- Support JSON file content without encoding. |
| 57 | + |
| 58 | +- Avoid requiring metadata that is not essential for reconstructing the file |
| 59 | + system object. |
| 60 | + |
| 61 | +- Enable the distributed associative cache described in RFC 10 to be leveraged |
| 62 | + for efficient file broadcast. |
| 63 | + |
| 64 | +- Enable efficient representation of sparse files. |
| 65 | + |
| 66 | +************** |
| 67 | +Implementation |
| 68 | +************** |
| 69 | + |
| 70 | +An archive SHALL consist of either a JSON array or a JSON dictionary of |
| 71 | +file system objects. File system objects MAY represent regular files, |
| 72 | +directories, or symbolic links. |
| 73 | + |
| 74 | +The following keys are REQUIRED in file system objects: |
| 75 | + |
| 76 | +mode |
| 77 | + (integer) The file type and permissions encoded as defined for the |
| 78 | + ``st_mode`` member of the POSIX ``stat`` structure [#f1]_. |
| 79 | + |
| 80 | +path |
| 81 | + (string) The UNIX path to the file system object. Paths SHALL NOT contain |
| 82 | + ``.`` or ``..`` characters and SHALL NOT begin with a ``/``. When the |
| 83 | + archive container is a dictionary, the path is derived from the dictionary |
| 84 | + key and SHALL NOT appear in the file system object. |
| 85 | + |
| 86 | +The following keys are OPTIONAL in file system objects: |
| 87 | + |
| 88 | +mtime |
| 89 | + (integer) The last data modification timestamp in seconds since the Epoch. |
| 90 | + |
| 91 | +ctime |
| 92 | + (integer) The last file status change timestamp in seconds since the Epoch. |
| 93 | + |
| 94 | +size |
| 95 | + (integer) The size of the regular file in bytes. |
| 96 | + |
| 97 | +encoding |
| 98 | + (string) The encoding type used in **data** for regular files. Possible |
| 99 | + values are ``utf-8``, ``base64``, ``blobvec``. |
| 100 | + |
| 101 | +data |
| 102 | + Regular file content (see below) or symbolic link target (string). |
| 103 | + |
| 104 | +Directories |
| 105 | +=========== |
| 106 | + |
| 107 | +A directory object SHALL be represented as *empty* in the archive, thus |
| 108 | +the **size**, **data**, and **encoding** fields SHALL NOT be present in |
| 109 | +a directory object. |
| 110 | + |
| 111 | +Example: |
| 112 | + |
| 113 | +.. code:: json |
| 114 | +
|
| 115 | + { |
| 116 | + "path":"appdata/phase1", |
| 117 | + "mode":16893, |
| 118 | + "mtime":1677604007, |
| 119 | + "ctime":1677604007, |
| 120 | + } |
| 121 | +
|
| 122 | +
|
| 123 | +Symbolic Links |
| 124 | +============== |
| 125 | + |
| 126 | +A symbolic link object SHALL store the link target in the **data** field |
| 127 | +as a UTF-8 string. The **size** and **encoding** fields SHALL NOT be |
| 128 | +present in a symbolic link object. |
| 129 | + |
| 130 | +Example: |
| 131 | + |
| 132 | +.. code:: json |
| 133 | +
|
| 134 | + { |
| 135 | + "path":"src", |
| 136 | + "mode":41471, |
| 137 | + "data":"/users/fred/work/project", |
| 138 | + } |
| 139 | +
|
| 140 | +Regular Files |
| 141 | +============= |
| 142 | + |
| 143 | +Regular files are represented as follows. |
| 144 | + |
| 145 | +Empty Files |
| 146 | +^^^^^^^^^^^ |
| 147 | + |
| 148 | +An empty regular file (zero length or sparse with no data) SHALL be |
| 149 | +represented with **size** set to the file size and no **encoding** or |
| 150 | +**data** fields. |
| 151 | + |
| 152 | +Example: |
| 153 | + |
| 154 | +.. code:: json |
| 155 | +
|
| 156 | + { |
| 157 | + "path":"data/empty", |
| 158 | + "mode":33204, |
| 159 | + "size":0, |
| 160 | + "mtime":1677604909, |
| 161 | + "ctime":1677604909 |
| 162 | + } |
| 163 | +
|
| 164 | +JSON Content |
| 165 | +^^^^^^^^^^^^ |
| 166 | + |
| 167 | +A regular file with JSON content MAY be represented without encoding. |
| 168 | +In this case, **size** and **encoding** SHALL NOT be set and **data** SHALL |
| 169 | +be set to any JSON value, array, or object. When such a file is unarchived, |
| 170 | +its content SHALL be a faithful JSON encoding but MAY vary in other ways |
| 171 | +including file size. |
| 172 | + |
| 173 | +Example: |
| 174 | + |
| 175 | +.. code:: json |
| 176 | +
|
| 177 | + { |
| 178 | + "path":"config.json", |
| 179 | + "mode":33204, |
| 180 | + "data":{ |
| 181 | + "resource":{ |
| 182 | + "exclude":"node42" |
| 183 | + } |
| 184 | + } |
| 185 | + } |
| 186 | +
|
| 187 | +Text Content |
| 188 | +^^^^^^^^^^^^ |
| 189 | + |
| 190 | +A regular file containing text MAY be represented with UTF-8 encoding. |
| 191 | +In this case, **size** SHALL be set to the file size, **encoding** SHALL be |
| 192 | +set to ``utf-8``, and **data** SHALL be set to a UTF-8 string. |
| 193 | + |
| 194 | +Example: |
| 195 | + |
| 196 | +.. code:: json |
| 197 | +
|
| 198 | + { |
| 199 | + "path":"data.csv", |
| 200 | + "mode":33204, |
| 201 | + "encoding":"utf-8", |
| 202 | + "data":"iteration,density\n1,35435.555\n2,356655.332\n3,5454545.500\n", |
| 203 | + "size":57, |
| 204 | + } |
| 205 | +
|
| 206 | +Literal Binary Content |
| 207 | +^^^^^^^^^^^^^^^^^^^^^^ |
| 208 | + |
| 209 | +A regular file that requires a self-contained representation in the archive |
| 210 | +and whose content is unknown SHALL be represented with base64 encoding. |
| 211 | +In this case, **size** SHALL be set to the file size, **encoding** SHALL Be |
| 212 | +set to ``base64``, and **data** SHALL be set to a base64 string. |
| 213 | + |
| 214 | +Example: |
| 215 | + |
| 216 | +.. code:: json |
| 217 | +
|
| 218 | + { |
| 219 | + "path":"vectors.dat", |
| 220 | + "mode":33204, |
| 221 | + "encoding":"base64", |
| 222 | + "data":"MzU0MzUuNTU1CjIsMzU2NjU1LjMzMgozLDU0NTQ1NDUuNTAwCg==" |
| 223 | + "size":37, |
| 224 | + } |
| 225 | +
|
| 226 | +Referenced Binary Content |
| 227 | +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 228 | + |
| 229 | +A regular file that requires content to be referenced to the associative cache |
| 230 | +described in RFC 10 SHALL be represented with blobvec encoding. In this case, |
| 231 | +**size** is set to the file size, **encoding** is set to ``blobvec``, and |
| 232 | +**data** SHALL be set to an array of 3-tuples representing file regions. |
| 233 | +Each region is an array of three REQUIRED values: |
| 234 | + |
| 235 | +offset |
| 236 | + (integer) region starting byte |
| 237 | + |
| 238 | +size |
| 239 | + (integer) size of the region in bytes |
| 240 | + |
| 241 | +blobref |
| 242 | + (string) RFC 10 blobref string |
| 243 | + |
| 244 | +Example: |
| 245 | + |
| 246 | +.. code:: json |
| 247 | +
|
| 248 | + { |
| 249 | + "path": "kernel8.img", |
| 250 | + "size": 8194604, |
| 251 | + "mtime": 1674520056, |
| 252 | + "ctime": 1674520057, |
| 253 | + "mode": 33261, |
| 254 | + "encoding":"blobvec", |
| 255 | + "data": [ |
| 256 | + [0, 1048576, "sha1-d4a09c5dd5a0d2d570066b6f13e465c73c3f9944"], |
| 257 | + [1048576, 1048576, "sha1-3eb8716208bc606a28948e2cf2fcce113e22b202"], |
| 258 | + [2097152, 1048576, "sha1-d7cc175e14044e9d9c02d908e4df4bcf80788bc9"], |
| 259 | + [3145728, 1048576, "sha1-34ce5050ff615ee4e2712a1f1e5b3d3df5ae6072"], |
| 260 | + [4194304, 1048576, "sha1-d79525827b6f326ac3d731764ee2d088bc2e5fec"], |
| 261 | + [5242880, 1048576, "sha1-ae1c6b3cb8eba86241fc4a761ee393dd22b833a7"], |
| 262 | + [6291456, 1048576, "sha1-289585f4d0c26db7ae98ecb36c04393ff32cabeb"], |
| 263 | + [7340032, 854572, "sha1-649d3449aa52ac46e19dc894360409d6abbeb882"] |
| 264 | + ], |
| 265 | + } |
| 266 | +
|
| 267 | +.. note:: |
| 268 | + Only blobvec encoding is capable of representing non-empty sparse files. |
| 269 | + |
| 270 | +.. [#f1] `sys/stat.h - data returned by the stat() function sys/stat.h <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html>`__; The Open Group Base Specifications Issue 7, 2018 edition IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008) |
0 commit comments