Skip to content

Commit 4ec176f

Browse files
committed
New spec for xcube Multi-Resolution Datasets
1 parent d3fc869 commit 4ec176f

File tree

1 file changed

+87
-0
lines changed

1 file changed

+87
-0
lines changed

docs/source/mldatasets.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
xcube Multi-Resolution Datasets
2+
===============================
3+
4+
Definition
5+
----------
6+
7+
A xcube _multi-resolution dataset_ refers to an N-D [image
8+
pyramid](https://en.wikipedia.org/wiki/Pyramid_(image_processing))
9+
where an _image_ refers to a 2-D dataset with two spatial dimensions
10+
in some horizontal coordinate system.
11+
12+
A multi-resolution dataset comprises a fixed number of
13+
_levels_, which are regular datasets at different spatial resolutions.
14+
Level zero represents the original resolution `res(L=0)`, higher level
15+
resolutions decrease by a factor of two: `res(L) = res(0) / 2^L`.
16+
17+
18+
Implementation in xcube
19+
-----------------------
20+
21+
In xcube, multi-resolution datasets are represented by the abstract class
22+
`xcube.core.mldataset.MultiLevelDataset`. The xcube data store framework
23+
refers to this datatype using the alias `mldataset`. The corresponding
24+
default data format is the xcube `levels` format. Later xcube will also
25+
support Cloud Optimized GeoTIFF (COG) as format for multi-resolution
26+
datasets.
27+
28+
The xcube Levels Format
29+
-----------------------
30+
31+
The xcube Levels format is basically a single top-level directory
32+
The filename extension of that directory should be `.levels`
33+
by convention. The directory entries are Zarr datasets
34+
35+
1. that are representations of regular xarray datasets named after
36+
their zero-based level index, `{level}.zarr`;
37+
2. that comply with the xcube dataset convention.
38+
39+
<div style="color: red;">
40+
TODO (forman): link to xcube dataset convention
41+
</div>
42+
43+
44+
The following is a multi-resolution dataset with three levels:
45+
46+
- test_pyramid.levels/
47+
- 0.zarr/
48+
- 1.zarr/
49+
- 2.zarr/
50+
51+
An important use case is generating image pyramids from existing large
52+
datasets without the need to create a copy of level zero.
53+
54+
To support this, the level zero dataset may be a link to an existing
55+
Zarr dataset. The filename is then `0.link` rather than `0.zarr`.
56+
The link file contains the path to the actual Zarr dataset
57+
to be used as level zero as a plain text string. It may be an absolute
58+
path or a path relative to the top-level dataset.
59+
60+
- test_pyramid.levels/
61+
- 0.link/ # --> link to actual level zero dataset
62+
- 1.zarr/
63+
- 2.zarr/
64+
65+
Related reads
66+
-------------
67+
68+
* [WIP: Multiscale use-case](https://github.com/zarr-developers/zarr-specs/issues/23)
69+
in zarr-developers / zarr-specs on GitHub
70+
71+
72+
To be discussed
73+
---------------
74+
75+
* Allow links for all levels?
76+
* Add top-level metadata such as `num_levels` and links for each
77+
level?
78+
* Make top-level directory a Zarr group (`.zgroup`)
79+
and encode level metadata in `.zattrs` (e.g. `num_levels`)?
80+
* Link relative to link file?
81+
82+
To do
83+
-----
84+
85+
* Currently, the FS data stores treat relative link paths as relative
86+
to the data store's `root`.
87+

0 commit comments

Comments
 (0)