Skip to content

Commit e83e8c7

Browse files
authored
Merge pull request #369 from garlick/rfc37
rfc37: add File Archive Format RFC
2 parents 6416c18 + e66482c commit e83e8c7

File tree

4 files changed

+282
-0
lines changed

4 files changed

+282
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ Table of Contents
4747
- [34/Flux Task Map](spec_34.rst)
4848
- [35/Constraint Query Syntax](spec_35.rst)
4949
- [36/Batch Script Directives](spec_36.rst)
50+
- [37/File Archive Format](spec_37.rst)
5051

5152
Build Instructions
5253
------------------

index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,12 @@ JSON objects in the format described in RFC 31.
251251
This specification defines a method for embedding job submission options
252252
and other directives in files.
253253

254+
:doc:`37/File Archive Format <spec_37>`
255+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
256+
257+
The File Archive Format defines a JSON representation of a set or list
258+
of file system objects.
259+
254260
.. Each file must appear in a toctree
255261
.. toctree::
256262
:hidden:
@@ -290,3 +296,4 @@ and other directives in files.
290296
spec_34
291297
spec_35
292298
spec_36
299+
spec_37

spec_37.rst

Lines changed: 270 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,270 @@
1+
.. github display
2+
GitHub is NOT the preferred viewer for this file. Please visit
3+
https://flux-framework.rtfd.io/projects/flux-rfc/en/latest/spec_37.html
4+
5+
######################
6+
37/File Archive Format
7+
######################
8+
9+
The File Archive Format defines a JSON representation of a set or list
10+
of file system objects.
11+
12+
- Name: github.com/flux-framework/rfc/spec_37.rst
13+
14+
- Editor: Jim Garlick <[email protected]>
15+
16+
- State: raw
17+
18+
********
19+
Language
20+
********
21+
22+
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
23+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to
24+
be interpreted as described in `RFC 2119 <https://tools.ietf.org/html/rfc2119>`__.
25+
26+
*****************
27+
Related Standards
28+
*****************
29+
30+
- :doc:`10/Content Storage <spec_10>`
31+
32+
- :doc:`14/Canonical Job Specification <spec_14>`
33+
34+
**********
35+
Background
36+
**********
37+
38+
The File Archive Format is a container of file system objects envisioned for:
39+
40+
- Batch scripts, configuration files, and user defined input files embedded
41+
in RFC 14 jobspec.
42+
43+
- Stage-in and stage-out of job data sets.
44+
45+
*****
46+
Goals
47+
*****
48+
49+
- Allow an archive to be defined as either a *set* or a *list*, as appropriate
50+
for the circumstances.
51+
52+
- Represent regular files, directories, and symbolic links.
53+
54+
- Support self-contained representation of file content.
55+
56+
- Support JSON file content without encoding.
57+
58+
- Avoid requiring metadata that is not essential for reconstructing the file
59+
system object.
60+
61+
- Enable the distributed associative cache described in RFC 10 to be leveraged
62+
for efficient file broadcast.
63+
64+
- Enable efficient representation of sparse files.
65+
66+
**************
67+
Implementation
68+
**************
69+
70+
An archive SHALL consist of either a JSON array or a JSON dictionary of
71+
file system objects. File system objects MAY represent regular files,
72+
directories, or symbolic links.
73+
74+
The following keys are REQUIRED in file system objects:
75+
76+
mode
77+
(integer) The file type and permissions encoded as defined for the
78+
``st_mode`` member of the POSIX ``stat`` structure [#f1]_.
79+
80+
path
81+
(string) The UNIX path to the file system object. Paths SHALL NOT contain
82+
``.`` or ``..`` characters and SHALL NOT begin with a ``/``. When the
83+
archive container is a dictionary, the path is derived from the dictionary
84+
key and SHALL NOT appear in the file system object.
85+
86+
The following keys are OPTIONAL in file system objects:
87+
88+
mtime
89+
(integer) The last data modification timestamp in seconds since the Epoch.
90+
91+
ctime
92+
(integer) The last file status change timestamp in seconds since the Epoch.
93+
94+
size
95+
(integer) The size of the regular file in bytes.
96+
97+
encoding
98+
(string) The encoding type used in **data** for regular files. Possible
99+
values are ``utf-8``, ``base64``, ``blobvec``.
100+
101+
data
102+
Regular file content (see below) or symbolic link target (string).
103+
104+
Directories
105+
===========
106+
107+
A directory object SHALL be represented as *empty* in the archive, thus
108+
the **size**, **data**, and **encoding** fields SHALL NOT be present in
109+
a directory object.
110+
111+
Example:
112+
113+
.. code:: json
114+
115+
{
116+
"path":"appdata/phase1",
117+
"mode":16893,
118+
"mtime":1677604007,
119+
"ctime":1677604007,
120+
}
121+
122+
123+
Symbolic Links
124+
==============
125+
126+
A symbolic link object SHALL store the link target in the **data** field
127+
as a UTF-8 string. The **size** and **encoding** fields SHALL NOT be
128+
present in a symbolic link object.
129+
130+
Example:
131+
132+
.. code:: json
133+
134+
{
135+
"path":"src",
136+
"mode":41471,
137+
"data":"/users/fred/work/project",
138+
}
139+
140+
Regular Files
141+
=============
142+
143+
Regular files are represented as follows.
144+
145+
Empty Files
146+
^^^^^^^^^^^
147+
148+
An empty regular file (zero length or sparse with no data) SHALL be
149+
represented with **size** set to the file size and no **encoding** or
150+
**data** fields.
151+
152+
Example:
153+
154+
.. code:: json
155+
156+
{
157+
"path":"data/empty",
158+
"mode":33204,
159+
"size":0,
160+
"mtime":1677604909,
161+
"ctime":1677604909
162+
}
163+
164+
JSON Content
165+
^^^^^^^^^^^^
166+
167+
A regular file with JSON content MAY be represented without encoding.
168+
In this case, **size** and **encoding** SHALL NOT be set and **data** SHALL
169+
be set to any JSON value, array, or object. When such a file is unarchived,
170+
its content SHALL be a faithful JSON encoding but MAY vary in other ways
171+
including file size.
172+
173+
Example:
174+
175+
.. code:: json
176+
177+
{
178+
"path":"config.json",
179+
"mode":33204,
180+
"data":{
181+
"resource":{
182+
"exclude":"node42"
183+
}
184+
}
185+
}
186+
187+
Text Content
188+
^^^^^^^^^^^^
189+
190+
A regular file containing text MAY be represented with UTF-8 encoding.
191+
In this case, **size** SHALL be set to the file size, **encoding** SHALL be
192+
set to ``utf-8``, and **data** SHALL be set to a UTF-8 string.
193+
194+
Example:
195+
196+
.. code:: json
197+
198+
{
199+
"path":"data.csv",
200+
"mode":33204,
201+
"encoding":"utf-8",
202+
"data":"iteration,density\n1,35435.555\n2,356655.332\n3,5454545.500\n",
203+
"size":57,
204+
}
205+
206+
Literal Binary Content
207+
^^^^^^^^^^^^^^^^^^^^^^
208+
209+
A regular file that requires a self-contained representation in the archive
210+
and whose content is unknown SHALL be represented with base64 encoding.
211+
In this case, **size** SHALL be set to the file size, **encoding** SHALL Be
212+
set to ``base64``, and **data** SHALL be set to a base64 string.
213+
214+
Example:
215+
216+
.. code:: json
217+
218+
{
219+
"path":"vectors.dat",
220+
"mode":33204,
221+
"encoding":"base64",
222+
"data":"MzU0MzUuNTU1CjIsMzU2NjU1LjMzMgozLDU0NTQ1NDUuNTAwCg=="
223+
"size":37,
224+
}
225+
226+
Referenced Binary Content
227+
^^^^^^^^^^^^^^^^^^^^^^^^^
228+
229+
A regular file that requires content to be referenced to the associative cache
230+
described in RFC 10 SHALL be represented with blobvec encoding. In this case,
231+
**size** is set to the file size, **encoding** is set to ``blobvec``, and
232+
**data** SHALL be set to an array of 3-tuples representing file regions.
233+
Each region is an array of three REQUIRED values:
234+
235+
offset
236+
(integer) region starting byte
237+
238+
size
239+
(integer) size of the region in bytes
240+
241+
blobref
242+
(string) RFC 10 blobref string
243+
244+
Example:
245+
246+
.. code:: json
247+
248+
{
249+
"path": "kernel8.img",
250+
"size": 8194604,
251+
"mtime": 1674520056,
252+
"ctime": 1674520057,
253+
"mode": 33261,
254+
"encoding":"blobvec",
255+
"data": [
256+
[0, 1048576, "sha1-d4a09c5dd5a0d2d570066b6f13e465c73c3f9944"],
257+
[1048576, 1048576, "sha1-3eb8716208bc606a28948e2cf2fcce113e22b202"],
258+
[2097152, 1048576, "sha1-d7cc175e14044e9d9c02d908e4df4bcf80788bc9"],
259+
[3145728, 1048576, "sha1-34ce5050ff615ee4e2712a1f1e5b3d3df5ae6072"],
260+
[4194304, 1048576, "sha1-d79525827b6f326ac3d731764ee2d088bc2e5fec"],
261+
[5242880, 1048576, "sha1-ae1c6b3cb8eba86241fc4a761ee393dd22b833a7"],
262+
[6291456, 1048576, "sha1-289585f4d0c26db7ae98ecb36c04393ff32cabeb"],
263+
[7340032, 854572, "sha1-649d3449aa52ac46e19dc894360409d6abbeb882"]
264+
],
265+
}
266+
267+
.. note::
268+
Only blobvec encoding is capable of representing non-empty sparse files.
269+
270+
.. [#f1] `sys/stat.h - data returned by the stat() function sys/stat.h <https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html>`__; The Open Group Base Specifications Issue 7, 2018 edition IEEE Std 1003.1-2017 (Revision of IEEE Std 1003.1-2008)

spell.en.pws

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -464,3 +464,7 @@ llsubmit
464464
multiline
465465
Lua
466466
docstring
467+
blobvec
468+
mtime
469+
ctime
470+
unarchived

0 commit comments

Comments
 (0)