@@ -22,6 +22,35 @@ Alistair Miles ([@alimanfoo](https://github.com/alimanfoo))
2222
2323====
2424
25+ ### Design principles
26+
27+ 1 . Hackable
28+ 2 . Parallel
29+ 3 . Distributed
30+
31+ ===
32+
33+ ### Hackable
34+
35+ * Easy to implement
36+ * Easy to extend with new functionality
37+ * Easy to inspect and manipulate data with generic tools
38+
39+ ===
40+
41+ ### Parallel
42+
43+ * Think "what happens if two workers do X at the same time"?
44+ * Avoid race conditions
45+
46+ ===
47+
48+ ### Distributed
49+
50+ * Accommodate eventual consistency
51+
52+ ====
53+
2554### Modular spec architecture
2655
2756* [ Core protocol spec] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html )
@@ -150,14 +179,15 @@ document. Store.*
150179
151180### Node names - restrictions
152181
153- * Node paths are used by users to access nodes and navigate a
182+ * Node paths are used by users to access nodes and explore/ navigate a
154183 hierarchy.
155184
156185* N.B., node paths are also used to form storage keys (see later).
157186
158187* To try and ensure compatibility with a variety of storage systems,
159- the core protocol currently states fairly heavy restrictions on node
160- names.
188+ the core protocol currently states fairly heavy
189+ [ restrictions] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#node-names )
190+ on node names.
161191
162192 * Includes restriction to ASCII alpha-numeric characters, "-", "_ "
163193 and ".".
@@ -193,15 +223,121 @@ document. Store.*
193223
194224====
195225
196- ### Core protocol - data types
226+ ### [ Core protocol - data types] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#id5 )
197227
198- @@TODO
228+ * Boolean (single byte)
229+
230+ * Integer (signed or unsigned; 1, 2, 4, 8 bytes; little- or
231+ big-endian)
232+
233+ * Float (2, 4, 8 bytes; little- or big-endian)
234+
235+ * Any other data type can be defined via a protocol extension
236+
237+ * E.g., [ datetime data
238+ types] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/extensions/datetime-dtypes/v1.0.html )
239+
240+ ===
241+
242+ ### Data types - identifiers
243+
244+ * Each data type needs an identifier for use in metadata documents.
245+
246+ * E.g., "bool", "i1", "<i4", ">u8", "<f2", etc.
199247
200248====
201249
202250### Core protocol - chunk grids
203251
204- @@TODO
252+ * A chunk grid defines a set of chunks which contain the elements of an array.
253+
254+ * The chunks of a grid form a tessellation of the array space, which
255+ is a space defined by the dimensionality and shape of the array.
256+
257+ * => Every element of the array is a member of one chunk, and there
258+ are no gaps or overlaps between chunks.
259+
260+ ===
261+
262+ ### Grid types
263+
264+ * In general there are several different possible types of grid.
265+
266+ * The core protocol defines [ regular
267+ grids] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#regular-grids ) .
268+
269+ * Other grid types can be defined via protocol extensions.
270+
271+ * Any grid type must define:
272+
273+ * How the array space is divided into chunks.
274+
275+ * A unique identifier for each chunk in the grid (used to form
276+ storage keys, see later).
277+
278+ ===
279+
280+ ### [ Regular grids] ( https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#regular-grids )
281+
282+ * Grid type where each chunk is a (hyper)rectangle of the same shape.
283+
284+ * I.e., grid type used in HDF5 and Zarr v2.
285+
286+ * Each chunk has a grid index, which is a tuple of grid coordinates.
287+
288+ * E.g., grid index (0, 3, 7) means first chunk along first
289+ dimension, fourth chunk along second dimension, 8th chunk along
290+ third dimension.
291+
292+ ===
293+
294+ ### Regular grids - chunk identifiers
295+
296+ * Chunk identifier is formed from grid index.
297+
298+ * E.g., chunk at grid index (0, 3, 7) has identifier "0.3.7".
299+
300+ * Default separator is "." but can be changed (e.g., to "/") in array
301+ metadata (see later).
302+
303+ ===
304+
305+ ### Regular grids - edge chunks
306+
307+ * All chunks have the same shape.
308+
309+ * If the length of any array dimension is not perfectly divisible by
310+ the chunk length along the same dimension, the grid will overhang
311+ the edge of the array space.
312+
313+ * Spec currently doesn't say any more about how to handle edge chunks,
314+ maybe it should?
315+
316+ * E.g., suggest using array fill value to fill contents of edge
317+ chunks beyond the array space.
318+
319+ * Other approaches (e.g., truncated edge chunks) could be defined as a
320+ different grid type via a protocol extension.
321+
322+ ===
323+
324+ ### Regular grids - resizing arrays
325+
326+ * Regular grid supports growing and shrinking an array along any
327+ dimension.
328+
329+ * Growing only requires change to array metadata (update array
330+ shape), no chunk data needs to be added or modified.
331+
332+ * Shrinking requires change to array metadata (update array shape)
333+ plus delete any chunks now completely outside the array space.
334+
335+ * Regular grid does not support growing an array in "negative"
336+ direction, i.e., prepending.
337+
338+ * But could define a grid type that does support this via a protocol
339+ extension.
340+
205341
206342====
207343
0 commit comments