Skip to content

Commit 944abf7

Browse files
committed
add grids
1 parent 5bc282e commit 944abf7

File tree

2 files changed

+145
-6
lines changed

2 files changed

+145
-6
lines changed

slides/v3-update-20190619.html

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@
3939
.reveal h1, .reveal h2, .reveal h3, .reveal h4, .reveal h5 {
4040
text-transform: none;
4141
}
42+
.reveal p, .reveal li {
43+
font-size: 0.9em;
44+
}
4245
</style>
4346

4447
</head>

slides/v3-update-20190619.md

Lines changed: 142 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,35 @@ Alistair Miles ([@alimanfoo](https://github.com/alimanfoo))
2222

2323
====
2424

25+
### Design principles
26+
27+
1. Hackable
28+
2. Parallel
29+
3. Distributed
30+
31+
===
32+
33+
### Hackable
34+
35+
* Easy to implement
36+
* Easy to extend with new functionality
37+
* Easy to inspect and manipulate data with generic tools
38+
39+
===
40+
41+
### Parallel
42+
43+
* Think "what happens if two workers do X at the same time"?
44+
* Avoid race conditions
45+
46+
===
47+
48+
### Distributed
49+
50+
* Accommodate eventual consistency
51+
52+
====
53+
2554
### Modular spec architecture
2655

2756
* [Core protocol spec](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html)
@@ -150,14 +179,15 @@ document. Store.*
150179

151180
### Node names - restrictions
152181

153-
* Node paths are used by users to access nodes and navigate a
182+
* Node paths are used by users to access nodes and explore/navigate a
154183
hierarchy.
155184

156185
* N.B., node paths are also used to form storage keys (see later).
157186

158187
* To try and ensure compatibility with a variety of storage systems,
159-
the core protocol currently states fairly heavy restrictions on node
160-
names.
188+
the core protocol currently states fairly heavy
189+
[restrictions](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#node-names)
190+
on node names.
161191

162192
* Includes restriction to ASCII alpha-numeric characters, "-", "_"
163193
and ".".
@@ -193,15 +223,121 @@ document. Store.*
193223

194224
====
195225

196-
### Core protocol - data types
226+
### [Core protocol - data types](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#id5)
197227

198-
@@TODO
228+
* Boolean (single byte)
229+
230+
* Integer (signed or unsigned; 1, 2, 4, 8 bytes; little- or
231+
big-endian)
232+
233+
* Float (2, 4, 8 bytes; little- or big-endian)
234+
235+
* Any other data type can be defined via a protocol extension
236+
237+
* E.g., [datetime data
238+
types](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/extensions/datetime-dtypes/v1.0.html)
239+
240+
===
241+
242+
### Data types - identifiers
243+
244+
* Each data type needs an identifier for use in metadata documents.
245+
246+
* E.g., "bool", "i1", "<i4", ">u8", "<f2", etc.
199247

200248
====
201249

202250
### Core protocol - chunk grids
203251

204-
@@TODO
252+
* A chunk grid defines a set of chunks which contain the elements of an array.
253+
254+
* The chunks of a grid form a tessellation of the array space, which
255+
is a space defined by the dimensionality and shape of the array.
256+
257+
* => Every element of the array is a member of one chunk, and there
258+
are no gaps or overlaps between chunks.
259+
260+
===
261+
262+
### Grid types
263+
264+
* In general there are several different possible types of grid.
265+
266+
* The core protocol defines [regular
267+
grids](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#regular-grids).
268+
269+
* Other grid types can be defined via protocol extensions.
270+
271+
* Any grid type must define:
272+
273+
* How the array space is divided into chunks.
274+
275+
* A unique identifier for each chunk in the grid (used to form
276+
storage keys, see later).
277+
278+
===
279+
280+
### [Regular grids](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html#regular-grids)
281+
282+
* Grid type where each chunk is a (hyper)rectangle of the same shape.
283+
284+
* I.e., grid type used in HDF5 and Zarr v2.
285+
286+
* Each chunk has a grid index, which is a tuple of grid coordinates.
287+
288+
* E.g., grid index (0, 3, 7) means first chunk along first
289+
dimension, fourth chunk along second dimension, 8th chunk along
290+
third dimension.
291+
292+
===
293+
294+
### Regular grids - chunk identifiers
295+
296+
* Chunk identifier is formed from grid index.
297+
298+
* E.g., chunk at grid index (0, 3, 7) has identifier "0.3.7".
299+
300+
* Default separator is "." but can be changed (e.g., to "/") in array
301+
metadata (see later).
302+
303+
===
304+
305+
### Regular grids - edge chunks
306+
307+
* All chunks have the same shape.
308+
309+
* If the length of any array dimension is not perfectly divisible by
310+
the chunk length along the same dimension, the grid will overhang
311+
the edge of the array space.
312+
313+
* Spec currently doesn't say any more about how to handle edge chunks,
314+
maybe it should?
315+
316+
* E.g., suggest using array fill value to fill contents of edge
317+
chunks beyond the array space.
318+
319+
* Other approaches (e.g., truncated edge chunks) could be defined as a
320+
different grid type via a protocol extension.
321+
322+
===
323+
324+
### Regular grids - resizing arrays
325+
326+
* Regular grid supports growing and shrinking an array along any
327+
dimension.
328+
329+
* Growing only requires change to array metadata (update array
330+
shape), no chunk data needs to be added or modified.
331+
332+
* Shrinking requires change to array metadata (update array shape)
333+
plus delete any chunks now completely outside the array space.
334+
335+
* Regular grid does not support growing an array in "negative"
336+
direction, i.e., prepending.
337+
338+
* But could define a grid type that does support this via a protocol
339+
extension.
340+
205341

206342
====
207343

0 commit comments

Comments
 (0)