Skip to content

Commit 6d6565d

Browse files
authored
Merge pull request #99 from alimanfoo/fix-key-suffix-omissions-20201016
2 parents e0245f7 + 119d6f9 commit 6d6565d

File tree

1 file changed

+92
-167
lines changed

1 file changed

+92
-167
lines changed

docs/protocol/core/v3.0.rst

Lines changed: 92 additions & 167 deletions
Original file line numberDiff line numberDiff line change
@@ -1129,50 +1129,20 @@ that an implementation of this specification could be modular and
11291129
allow for different store implementations to be used.
11301130

11311131
The store interface defines a set of operations involving `keys` and
1132-
`values`. In the context of this interface, a `key` is any
1133-
string containing only characters in the ranges ``a-z``, ``A-Z``,
1134-
``0-9``, or in the set ``/.-_``, and a `value` is any sequence of
1135-
bytes. It is assumed that the store holds (`key`, `value`) pairs, with
1136-
only one such pair for any given `key`. I.e., a store is a mapping
1137-
from keys to values. It is also assumed that keys are case sensitive,
1138-
i.e., the keys "foo" and "FOO" are different.
1132+
`values`. In the context of this interface, a `key` is any string
1133+
containing only characters in the ranges ``a-z``, ``A-Z``, ``0-9``, or
1134+
in the set ``/.-_``, where the final character is **not** a ``/``
1135+
character. A `value` is any sequence of bytes.
11391136

1140-
A store can make the following assumption on the structures of the keys it will receive:
1141-
1142-
- A key always:
1143-
- start with ``meta/``
1144-
- start with ``data/``
1145-
- is exactly ``zarr.json``.
1146-
1147-
- Most of the keys:
1148-
- start with ``meta/root``
1149-
- start with ``data/root``
1150-
1151-
1152-
- List operations ``list_dir`` will always be passed keys ending with a trailing
1153-
slash, that is to say it will only be asked to work with complete node names.
1154-
1155-
Store implementation can assume they will only be given trailing slashes, and
1156-
protocol implementation MUST pass trailing slashes to underlying stores.
1157-
1158-
For example, a store containing the following keys:
1159-
1160-
- ``meta/root/2018/.group``
1161-
- ``meta/root/2018-01/.group``
1162-
- ``meta/root/2018/bar/.array``
1163-
- ``data/root/2018/bar/0.0``
1164-
1165-
The following queries are invalid:
1166-
- ``list_dir('201')`` is invalid as ``"201"`` is not an existing node.
1167-
- ``list_dir('2018')`` is invalid queries as ``"2018"`` does not ends with a ``/``,
1168-
1169-
This is valid:
1170-
- ``list_dir('2018/')``
1171-
- ``list_dir('2018-01/')``
1172-
1173-
This allows store implementation to avoid having to check for trailing slashes,
1174-
and avoid issues like "list_dir('2018')" returning values likes ``-01``
1137+
It is assumed that the store holds (`key`, `value`) pairs, with only
1138+
one such pair for any given `key`. I.e., a store is a mapping from
1139+
keys to values. It is also assumed that keys are case sensitive, i.e.,
1140+
the keys "foo" and "FOO" are different.
11751141

1142+
The store interface also defines some operations involving
1143+
`prefixes`. In the context of this interface, a prefix is a string
1144+
containing only characters that are valid for use in `keys` and ending
1145+
with a trailing ``/`` character.
11761146

11771147
The store operations are grouped into three sets of capabilities:
11781148
**readable**, **writeable** and **listable**. It is not necessary for
@@ -1197,15 +1167,10 @@ A **writeable store** supports the following operations:
11971167
| Parameters: `key`
11981168
| Output: none
11991169
1200-
``delete_prefix`` - Delete all keys with the given prefix from the store, include the prefix itself if it exists as a key:
1201-
1202-
| Parameter: `key`
1203-
| Output: None
1204-
1170+
``delete_prefix`` - Delete all keys with the given prefix from the store:
12051171

1206-
Clients of delete_prefix should pay attention to pass a trailing slash on
1207-
the key to delete a complete dataset or group, otherwise the store may
1208-
delete similar keys.
1172+
| Parameter: `prefix`
1173+
| Output: none
12091174
12101175
A **listable store** supports any one or more of the following
12111176
operations:
@@ -1215,7 +1180,6 @@ operations:
12151180
| Parameters: none
12161181
| Output: set of `keys`
12171182
1218-
12191183
``list_prefix`` - Retrieve all keys with a given prefix.
12201184

12211185
| Parameters: `prefix`
@@ -1228,56 +1192,22 @@ operations:
12281192
with a trailing slash ``/`` and store can assume there is as least one key
12291193
that stars with prefix.
12301194

1231-
12321195
``list_dir`` - Retrieve all keys and prefixes with a given prefix and
12331196
which do not contain the character "/" after the given prefix.
12341197

1235-
| Parameters: `prefix`, ends with a trailing slash ``/``
1198+
| Parameters: `prefix`
12361199
| Output: set of `keys` and set of `prefixes`
12371200
12381201
For example, if a store contains the keys "a/b", "a/c", "a/d/e",
12391202
"a/f/g", then ``list_dir("a/")`` would return keys "a/b" and "a/c"
1240-
and prefixes "a/d/" and "a/f/".
1241-
1242-
On non-existing prefix, store may return the empty set.
1243-
1203+
and prefixes "a/d/" and "a/f/". ``list_dir("b/")`` would return
1204+
the empty set.
12441205

1245-
Note: The requirement on trailing slashes is to avoid
1246-
search returning keys in the same hierarchy level but longer name, and
1247-
potentially expensive logic testing for the present of trailing slash on
1248-
each query. e.g:
12491206

1250-
- /meta/foo
1251-
- /meta/foo/dataset
1252-
- /meta/foobar
1253-
1254-
list_dir('/meta/foo') == '/meta/foo'&'/meta/foobar'
1255-
list_dir('/meta/foo/') == '/meta/foo/dataset'
1256-
1257-
1258-
Stores Must return trailing slashes in key responses when those
1259-
are prefix of other keys.
1260-
1261-
Like would ``list_dir('/meta/mydir')`` returns:
1262-
- ``/meta/path1``
1263-
- ``/meta/path2``
1264-
- ``/meta/path3/``
1265-
- ``/meta/path4/``
1266-
1267-
Thus we know that ``path1``, and ``path2`` are terminal objects with data,
1268-
and that ``/meta/path3`` and ``/meta/path4``.
1269-
1270-
1271-
This is similar to ``ls -p`` on Unix systems.
1272-
1273-
Note: In practice this means that this means most returned keys always ends in
1274-
``/``, ``.json``, ``.array``, ``.group``, they will otherwise be chunks
1275-
data.
1276-
1277-
Note that because keys are case sensitive, it is assumed that the operations
1278-
``set("foo", a)`` and ``set("FOO", b)`` will result in two separate (key, value)
1279-
pairs being stored. Subsequently ``get("foo")`` will return *a* and ``get("FOO")``
1280-
will return *b*.
1207+
Note that because keys are case sensitive, it is assumed that the
1208+
operations ``set("foo", a)`` and ``set("FOO", b)`` will result in two
1209+
separate (key, value) pairs being stored. Subsequently ``get("foo")``
1210+
will return *a* and ``get("FOO")`` will return *b*.
12811211

12821212

12831213
Store implementations
@@ -1351,30 +1281,31 @@ Storage keys
13511281
The entry point metadata document is stored under the key ``zarr.json``.
13521282

13531283
For a group at a non-root hierarchy path `P`, the metadata key for the
1354-
group metadata document is formed by concatenating ``meta/root``, `P`,
1355-
and ``.group``.
1284+
group metadata document is formed by concatenating "meta/root", `P`,
1285+
".group", and the metadata key suffix (which defaults to ".json").
13561286

13571287
For example, for a group at hierarchy path ``/foo/bar``, the
1358-
corresponding metadata key is ``meta/root/foo/bar.group``.
1288+
corresponding metadata key is "meta/root/foo/bar.group.json".
13591289

13601290
For an array at a non-root hierarchy path `P`, the metadata key for
1361-
the array metadata document is formed by concatenating "meta/root", `P`,
1362-
and ".array". The data key for array chunks is formed by concatenating
1363-
"data", `P`, "/", and the chunk identifier as defined by the chunk
1364-
grid layout.
1291+
the array metadata document is formed by concatenating "meta/root",
1292+
`P`, ".array", and the metadata key suffix.
13651293

1366-
To get the path ``P`` from a key, either remove the trailing ``.array`` or
1367-
``.group`` as well as the ``meta/root`` prefix.
1294+
The data key for array chunks is formed by concatenating "data", `P`,
1295+
"/", and the chunk identifier as defined by the chunk grid layout.
1296+
1297+
To get the path ``P`` from a metadata key, remove the trailing
1298+
".array.json" or ".group.json" and the "meta/root" prefix.
13681299

13691300
For example, for an array at hierarchy path "/foo/baz", the
1370-
corresponding metadata key is ``meta/root/foo/baz.array``. If the array
1371-
has two dimensions and a regular chunk grid, the data key for the
1372-
chunk with grid coordinates (0, 0) is "data/root/foo/baz/c0/0".
1301+
corresponding metadata key is "meta/root/foo/baz.array.json". If the
1302+
array has two dimensions and a regular chunk grid, the data key for
1303+
the chunk with grid coordinates (0, 0) is "data/root/foo/baz/c0/0".
13731304

1374-
If the root node is a group, the metadata key is ``meta/root.group``. If
1375-
the root node is an array, the metadata key is "meta/root.array", and
1376-
the data keys are formed by concatenating "data/root/" and the chunk
1377-
identifier.
1305+
If the root node is a group, the metadata key is
1306+
"meta/root.group.json". If the root node is an array, the metadata key
1307+
is "meta/root.array.json", and the data keys are formed by
1308+
concatenating "data/root/" and the chunk identifier.
13781309

13791310

13801311
.. list-table:: Metadata Storage Key example
@@ -1388,25 +1319,22 @@ identifier.
13881319
- `zarr.json`
13891320
* - Array (Root)
13901321
- `/`
1391-
- `meta/root.array`
1322+
- `meta/root.array.json`
13921323
* - Group (Root)
13931324
- `/`
1394-
- `meta/root.group`
1325+
- `meta/root.group.json`
13951326
* - Group
13961327
- `/foo`
1397-
- `meta/root/foo.group`
1328+
- `meta/root/foo.group.json`
13981329
* - Array
13991330
- `/foo`
1400-
- `meta/root/foo.array`
1331+
- `meta/root/foo.array.json`
14011332
* - Group
14021333
- `/foo/bar`
1403-
- `meta/root/foo/bar.group`
1334+
- `meta/root/foo/bar.group.json`
14041335
* - Array
14051336
- `/foo/baz`
1406-
- `meta/root/foo/baz.array`
1407-
1408-
1409-
1337+
- `meta/root/foo/baz.array.json`
14101338

14111339

14121340
.. list-table:: Data Storage Key example
@@ -1464,35 +1392,37 @@ Let "+" be the string concatenation operator.
14641392

14651393
**Store element values in an array**
14661394

1467-
To store element in an array at path `P` and coordinate (`i`, `j`, ...)
1468-
perform ``set(data_key(P, i, j, ...), value)``, where `value` is the
1469-
serialisation of the corresponding chunk following the metadata that is
1470-
or will be stored in ``array_meta_key(P)``.
1395+
To store element in an array at path `P` and coordinate (`i`, `j`,
1396+
...) perform ``set(data_key(P, i, j, ...), value)``, where
1397+
`value` is the serialisation of the corresponding chunk, encoded
1398+
according to the information in the array metadata stored under
1399+
the key ``array_meta_key(P)``.
14711400

14721401
**Retrieve element values in an array**
14731402

1474-
To retrieve element in an array at path `P` and coordinate (`i`, `j`, ...)
1475-
perform ``get(data_key(P, i, j, ...), value)``, where `value` is the
1476-
serialisation of the corresponding chunk following the metadata stored at
1477-
``array_meta_key(P)``.
1403+
To retrieve element in an array at path `P` and coordinate (`i`,
1404+
`j`, ...) perform ``get(data_key(P, i, j, ...), value)``, where
1405+
`value` is the serialisation of the corresponding chunk, encoded
1406+
according to the array metadata stored at ``array_meta_key(P)``.
14781407

14791408
**Discover children of a group**
14801409

14811410
To discover the children of a group at hierarchy path `P`, perform
14821411
``list_dir("meta/root" + P + "/")``. Any returned key ending in
1483-
".array" indicates an array. Any returned key ending in
1484-
".group" indicates a group. Any returned prefix indicates a
1412+
".array.json" indicates an array. Any returned key ending in
1413+
".group.json" indicates a group. Any returned prefix indicates a
14851414
child group implied by some descendant.
14861415

14871416
For example, if a group is created at path "/foo/bar" and an array
14881417
is created at path "/foo/baz/qux", then the store will contain the
1489-
keys "meta/root/foo/bar.group" and "meta/root/foo/bar/baz/qux.array". Groups
1490-
at paths "/", "/foo" and "/foo/baz" have not been explicitly
1491-
created but are implied by their descendants. To list the children
1492-
of the group at path "/foo", perform ``list_dir("meta/root/foo/")``,
1493-
which will return the key "meta/root/foo/bar.group" and the prefix
1494-
"meta/root/foo/baz/". From this it can be inferred that child groups
1495-
"/foo/bar" and "/foo/baz" are present.
1418+
keys "meta/root/foo/bar.group.json" and
1419+
"meta/root/foo/bar/baz/qux.array.json". Groups at paths "/",
1420+
"/foo" and "/foo/baz" have not been explicitly created but are
1421+
implied by their descendants. To list the children of the group at
1422+
path "/foo", perform ``list_dir("meta/root/foo/")``, which will
1423+
return the key "meta/root/foo/bar.group.json" and the prefix
1424+
"meta/root/foo/baz/". From this it can be inferred that child
1425+
groups "/foo/bar" and "/foo/baz" are present.
14961426

14971427
If a store does not support any of the list operations then
14981428
discovery of group children is not possible, and the contents of
@@ -1501,54 +1431,49 @@ Let "+" be the string concatenation operator.
15011431

15021432
**Discover all nodes in a hierarchy**
15031433

1504-
To discover all nodes in a hierarchy, one can call ``list("meta/")``.
1505-
- all keys represent either explicit group or arrays.
1506-
- all intermediate prefixes ending in a ``/`` are implicit groups.
1434+
To discover all nodes in a hierarchy, one can call
1435+
``list("meta/")``. All keys represent either explicit group or
1436+
arrays. All intermediate prefixes ending in a ``/`` are implicit
1437+
groups.
15071438

15081439
**Delete a group or array**
15091440

1510-
To delete an array it is necessary to
1511-
- delete the metadata document for the array, (meta/P.array)
1512-
- delete all keys which prefix have path pointing to this to this array. (data/P/\*)
1441+
To delete an array at path `P`:
1442+
- delete the metadata document for the array, ``delete(array_meta_key(P))``
1443+
- delete all data keys which prefix have path pointing to this to this array,
1444+
``delete_prefix("data/root" + P + "/")``
15131445

1514-
To delete a implicit group.
1515-
- delete all arrays under this group
1516-
- it should be sufficient to delete all the keys starting with prefix meta/P/ and data/P/
1446+
To delete an implicit group at path `P`:
1447+
- delete all nodes under this group - it should be sufficient to
1448+
perform ``delete_prefix("meta/root" + P + "/")`` and
1449+
``delete_prefix("data/root" + P + "/")``.
15171450

1518-
To delete an explicit group.
1519-
- delete all arrays under this group,
1520-
- delete all keys with meta/P/ prefix, meta/P/groups all keys with /data/P prefix,
1451+
To delete an explicit group at path `P`:
1452+
- delete the metadata document for the group, ``delete(group_meta_key(P))``
1453+
- delete all nodes under this group - it should be sufficient to
1454+
perform ``delete_prefix("meta/root" + P + "/")`` and
1455+
``delete_prefix("data/root" + P + "/")``.
15211456

1522-
Note that store implementation may decide to reify implicit groups and thus
1523-
protocol implementation should attempt to delete the .meta/P/.group file if
1524-
they really wish to delete an empty implicit group.
1457+
Note that store implementation may decide to reify implicit groups
1458+
and thus protocol implementation should attempt to delete the
1459+
group metadata file if they really wish to delete an empty
1460+
implicit group. @@TODO clarify this
15251461

15261462
Store implementation are also allowed to delete any implicit parent of a
15271463
deleted implicit groups, so a protocol implementation should make sure to
1528-
reify a parent group if they need to keep it. For example assuming the
1529-
following:
1530-
1531-
>>> z = new_dataset()
1532-
>>> z.create_array('/path/to/array')
1533-
1534-
>>> z.delete_array('/path/to/array')
1535-
1536-
This may not be sufficient to delete the group ``/path/to/``, as a store
1537-
implementation, and thus removing ``/path/to/`` may need an implmentation
1538-
to explicitly call
1539-
1540-
>>> z.delete_group('/path/to/')
1464+
reify a parent group if they need to keep it. @@TODO clarify this
15411465

1542-
Even if an explicit group was not explicitly created.
15431466

15441467
**Determine if a node exists**
15451468

1546-
To determine if a node exists at path `P`, you need to check the existence
1547-
of one of ``get("meta/root"+P+".array")``, ``get("meta/root"+P+".group")``
1548-
or ``get("meta/root"+P+"/")``.
1469+
To determine if a node exists at path ``P``, try in the following
1470+
order ``get(array_meta_key(P))`` (success implies an array at
1471+
``P``); ``get(group_meta_key(P))`` (success implies an explicit
1472+
group at ``P``); ``list_dir("meta/root" + P + "/")`` (non-empty
1473+
result set implies an implicit group at ``P``).
15491474

15501475
.. note::
1551-
For listable store, ``listdir(parent(P))`` can be an alternative.
1476+
For listable store, ``list_dir(parent(P))`` can be an alternative.
15521477

15531478

15541479
Protocol extensions

0 commit comments

Comments
 (0)