Skip to content

Commit 981ca71

Browse files
authored
Merge pull request #316 from garlick/rfc10_update
update content/KVS RFCs
2 parents e503d58 + 366dd78 commit 981ca71

File tree

2 files changed

+29
-71
lines changed

2 files changed

+29
-71
lines changed

spec_10.rst

Lines changed: 17 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -28,29 +28,28 @@ Related Standards
2828

2929
- :doc:`3/Flux Message Protocol <spec_3>`
3030

31+
- :doc:`11/Key Value Store Tree Object Format v1 <spec_11>`
32+
3133

3234
Goals
3335
-----
3436

35-
The Flux content storage service is available for general purpose
36-
data storage within a Flux instance. The goals of the content storage
37-
service are:
37+
The Flux content storage service is the storage layer for the Flux Key Value
38+
Store (KVS). The goals of the content storage service are:
3839

3940
- Provide storage for opaque binary blobs.
4041

41-
- Stored content remains available for the lifetime of the Flux instance.
42+
- Once stored from any broker rank, content is available to all broker ranks.
4243

4344
- Stored content is immutable.
4445

45-
- Stored content is available from any broker rank within an instance.
46+
- Content may not be removed while the Flux instance is running.
4647

4748
- Stored content is addressable by its message digest, computed using a
4849
cryptographic hash.
4950

5051
- The cryptographic hash algorithm is configurable per instance.
5152

52-
- Content may be shared between instances
53-
5453
This kind of store has interesting and well-understood properties, as
5554
explored in Venti, Git, and Camlistore (see References below).
5655

@@ -71,16 +70,17 @@ Rank 0 SHALL retain all content previously stored by the instance.
7170
Rank 0 MAY extend its cache with an OPTIONAL backing store, the details
7271
of which are beyond the scope of this RFC.
7372

74-
Rank 0 MAY, as a last resort, attempt to satisfy load requests by making
75-
a transitive request to the enclosing instance, if any.
76-
7773

7874
Content
7975
~~~~~~~
8076

8177
Content SHALL consist of from zero to 1,048,576 bytes of data.
8278
Content SHALL NOT be interpreted by the content service.
8379

80+
Note: The blob size limit was temporarily increased to one gigabyte to
81+
avoid failures resulting from extreme workloads. The original limit will
82+
be restored once KVS *hdir* objects are implemented.
83+
8484

8585
Blobref
8686
~~~~~~~
@@ -155,66 +155,17 @@ A dropcache request SHALL cause the local content service to drop all
155155
non-essential entries from its cache.
156156

157157

158-
Foreign Content
159-
~~~~~~~~~~~~~~~
160-
161-
If a load request cannot be satisfied by the instance’s content service,
162-
a load request MAY be sent to the enclosing instance, if applicable.
163-
164-
The enclosing instance MAY have configured a different hash algorithm.
165-
The content service, therefore, SHALL NOT require that a blobref specified
166-
in a load request match the configured hash.
167-
168-
169158
Garbage Collection
170159
~~~~~~~~~~~~~~~~~~
171160

172-
References to content are unconstrained from the perspective of the
173-
content service, therefore content MUST persist for the lifetime of
174-
the instance.
175-
176-
During instance shutdown, some content MAY be preserved by storing it
177-
in the enclosing instance when the instance is *reaped*. All other
178-
content SHALL be destroyed when the instance terminates.
179-
180-
181-
Message Definitions
182-
~~~~~~~~~~~~~~~~~~~
183-
184-
Content service messages SHALL follow the Flux rules described
185-
in RFC 3 for requests and responses, and are described in detail by
186-
the following ABNF grammar:
187-
188-
::
189-
190-
CONTENT = C:store-req S:store-rep
191-
/ C:load-req S:load-rep
192-
/ C:flush-req S:flush-rep
193-
/ C:dropcache-req S:dropcache-rep
194-
195-
; Multi-part ZeroMQ messages
196-
C:store-req = [routing] "content.store" [blob] PROTO
197-
S:store-rep = [routing] "content.store" blobref PROTO
198-
199-
; Multi-part ZeroMQ messages
200-
C:load-req = [routing] "content.load" blobref PROTO
201-
S:load-rep = [routing] "content.load" [blob] PROTO
202-
203-
; Multi-part ZeroMQ messages
204-
C:flush-req = [routing] "content.flush" PROTO
205-
S:flush-rep = [routing] "content.flush" PROTO
206-
207-
; Multi-part ZeroMQ messages
208-
C:dropcache-req = [routing] "content.dropcache" PROTO
209-
S:dropcache-rep = [routing] "content.dropcache" PROTO
210-
211-
blobref = hash-name "-" digest %x00
212-
hash-name = 1*(ALPHA / DIGIT)
213-
digest = 1*(HEXDIG)
214-
215-
blob = 0*(OCTET)
161+
References to content are the responsibility of the Flux key Value Store.
162+
Content that the KVS no longer references MAY NOT be removed while the Flux
163+
instance is running.
216164

217-
; PROTO and [routing] are as defined in RFC 3.
165+
A Flux instance that is configured to restart saves content before shutting
166+
down. The shutdown process, after the KVS service has been stopped, MAY choose
167+
to omit content that the final KVS root does not reference as a form of
168+
garbage collection.
218169

219170

220171
References

spec_11.rst

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,11 +35,7 @@ Related Standards
3535
Goals
3636
-----
3737

38-
- Facilitate services implementing private KVS namespaces.
39-
40-
- Users may directly walk a KVS namespace, starting with a dirref.
41-
42-
- Tree objects can be exchanged between Flux instances.
38+
- Define KVS metadata compatible with the RFC 10 content storage service.
4339

4440
- Tree objects can be parsed years after they were written (provenance).
4541

@@ -115,6 +111,12 @@ A *val* represents opaque data directly, base64-encoded.
115111
"data":"NDIyCg==",
116112
}
117113

114+
Short values that are not large enough to warrant a *valref* and independent
115+
blobs SHOULD be represented as a *val* when written to the content store.
116+
117+
The *val* object MAY be used as part of the protocol for sending key-value
118+
tuples of any size to the KVS in the JSON payload of an RPC.
119+
118120

119121
Dirref
120122
~~~~~~
@@ -129,6 +131,9 @@ stored in the content store.
129131
"data":["sha1-aaa...","sha1-bbb...",...],
130132
}
131133

134+
Although the *dirref* definition supports an array of multiple blobrefs,
135+
at this time the array size is limited to one.
136+
132137

133138
Dir
134139
~~~
@@ -178,6 +183,8 @@ Hash buckets MAY be sparsely populated. Each hash bucket contains a single
178183
}
179184
}
180185

186+
At this time, *hdir* objects have not been implemented.
187+
181188

182189
Symlink
183190
~~~~~~~

0 commit comments

Comments
 (0)