@@ -19,26 +19,57 @@ Xarray ``Dataset`` objects.
19
19
20
20
Second, from Xarray's point of view, the key difference between
21
21
NetCDF and Zarr is that all NetCDF arrays have *dimension names * while Zarr
22
- arrays do not. Therefore, in order to store NetCDF data in Zarr, Xarray must
23
- somehow encode and decode the name of each array's dimensions.
24
-
25
- To accomplish this, Xarray developers decided to define a special Zarr array
26
- attribute: ``_ARRAY_DIMENSIONS ``. The value of this attribute is a list of
27
- dimension names (strings), for example ``["time", "lon", "lat"] ``. When writing
28
- data to Zarr, Xarray sets this attribute on all variables based on the variable
29
- dimensions. When reading a Zarr group, Xarray looks for this attribute on all
30
- arrays, raising an error if it can't be found. The attribute is used to define
31
- the variable dimension names and then removed from the attributes dictionary
32
- returned to the user.
33
-
34
- Because of these choices, Xarray cannot read arbitrary array data, but only
35
- Zarr data with valid ``_ARRAY_DIMENSIONS `` or
36
- `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html >`_ attributes
37
- on each array (NCZarr dimension names are defined in the ``.zarray `` file).
38
-
39
- After decoding the ``_ARRAY_DIMENSIONS `` or NCZarr attribute and assigning the variable
40
- dimensions, Xarray proceeds to [optionally] decode each variable using its
41
- standard CF decoding machinery used for NetCDF data (see :py:func: `decode_cf `).
22
+ arrays do not. In Zarr v2, Xarray uses an ad-hoc convention to encode and decode
23
+ the name of each array's dimensions. However, starting with Zarr v3, the
24
+ ``dimension_names `` attribute provides a formal convention for storing the
25
+ NetCDF data model in Zarr.
26
+
27
+ Dimension Encoding in Zarr Formats
28
+ -----------------------------------
29
+
30
+ Xarray encodes array dimensions differently depending on the Zarr format version:
31
+
32
+ **Zarr V2 Format: **
33
+ Xarray uses a special Zarr array attribute: ``_ARRAY_DIMENSIONS ``. The value of this
34
+ attribute is a list of dimension names (strings), for example ``["time", "lon", "lat"] ``.
35
+ When writing data to Zarr V2, Xarray sets this attribute on all variables based on the
36
+ variable dimensions. This attribute is visible when accessing arrays directly with
37
+ zarr-python.
38
+
39
+ **Zarr V3 Format: **
40
+ Xarray uses the native ``dimension_names `` field in the array metadata. This is part
41
+ of the official Zarr V3 specification and is not stored as a regular attribute.
42
+ When accessing arrays with zarr-python, this information is available in the array's
43
+ metadata but not in the attributes dictionary.
44
+
45
+ When reading a Zarr group, Xarray looks for dimension information in the appropriate
46
+ location based on the format version, raising an error if it can't be found. The
47
+ dimension information is used to define the variable dimension names and then
48
+ (for Zarr V2) removed from the attributes dictionary returned to the user.
49
+
50
+ CF Conventions
51
+ --------------
52
+
53
+ Xarray uses its standard CF encoding/decoding functionality for handling metadata
54
+ (see :py:func: `decode_cf `). This includes encoding concepts such as dimensions and
55
+ coordinates. The ``coordinates `` attribute, which lists coordinate variables
56
+ (e.g., ``"yc xc" `` for spatial coordinates), is one part of the broader CF conventions
57
+ used to describe metadata in NetCDF and Zarr.
58
+
59
+ Compatibility and Reading
60
+ -------------------------
61
+
62
+ Because of these encoding choices, Xarray cannot read arbitrary Zarr arrays, but only
63
+ Zarr data with valid dimension metadata. Xarray supports:
64
+
65
+ - Zarr V2 arrays with ``_ARRAY_DIMENSIONS `` attributes
66
+ - Zarr V3 arrays with ``dimension_names `` metadata
67
+ - `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html >`_ format
68
+ (dimension names are defined in the ``.zarray `` file)
69
+
70
+ After decoding the dimension information and assigning the variable dimensions,
71
+ Xarray proceeds to [optionally] decode each variable using its standard CF decoding
72
+ machinery used for NetCDF data.
42
73
43
74
Finally, it's worth noting that Xarray writes (and attempts to read)
44
75
"consolidated metadata" by default (the ``.zmetadata `` file), which is another
@@ -49,34 +80,63 @@ warning about poor performance when reading non-consolidated stores unless they
49
80
explicitly set ``consolidated=False ``. See :ref: `io.zarr.consolidated_metadata `
50
81
for more details.
51
82
52
- As a concrete example, here we write a tutorial dataset to Zarr and then
53
- re-open it directly with Zarr:
83
+ Examples: Zarr Format Differences
84
+ ----------------------------------
85
+
86
+ The following examples demonstrate how dimension and coordinate encoding differs
87
+ between Zarr format versions. We'll use the same tutorial dataset but write it
88
+ in different formats to show what users will see when accessing the files directly
89
+ with zarr-python.
90
+
91
+ **Example 1: Zarr V2 Format **
54
92
55
93
.. jupyter-execute ::
56
94
57
95
import os
58
96
import xarray as xr
59
97
import zarr
60
98
99
+ # Load tutorial dataset and write as Zarr V2
61
100
ds = xr.tutorial.load_dataset("rasm")
62
- ds.to_zarr("rasm.zarr", mode="w", consolidated=False)
63
- os.listdir("rasm.zarr")
101
+ ds.to_zarr("rasm_v2.zarr", mode="w", consolidated=False, zarr_format=2)
102
+
103
+ # Open with zarr-python and examine attributes
104
+ zgroup = zarr.open("rasm_v2.zarr")
105
+ print("Zarr V2 - Tair attributes:")
106
+ tair_attrs = dict(zgroup["Tair"].attrs)
107
+ for key, value in tair_attrs.items():
108
+ print(f" '{key}': {repr(value)}")
64
109
65
110
.. jupyter-execute ::
111
+ :hide-code:
66
112
67
- zgroup = zarr.open("rasm.zarr")
68
- zgroup.tree()
113
+ import shutil
114
+ shutil.rmtree("rasm_v2.zarr")
115
+
116
+ **Example 2: Zarr V3 Format **
69
117
70
118
.. jupyter-execute ::
71
119
72
- dict(zgroup["Tair"].attrs)
120
+ # Write the same dataset as Zarr V3
121
+ ds.to_zarr("rasm_v3.zarr", mode="w", consolidated=False, zarr_format=3)
122
+
123
+ # Open with zarr-python and examine attributes
124
+ zgroup = zarr.open("rasm_v3.zarr")
125
+ print("Zarr V3 - Tair attributes:")
126
+ tair_attrs = dict(zgroup["Tair"].attrs)
127
+ for key, value in tair_attrs.items():
128
+ print(f" '{key}': {repr(value)}")
129
+
130
+ # For Zarr V3, dimension information is in metadata
131
+ tair_array = zgroup["Tair"]
132
+ print(f"\n Zarr V3 - dimension_names in metadata: {tair_array.metadata.dimension_names}")
73
133
74
134
.. jupyter-execute ::
75
135
:hide-code:
76
136
77
137
import shutil
138
+ shutil.rmtree("rasm_v3.zarr")
78
139
79
- shutil.rmtree("rasm.zarr")
80
140
81
141
Chunk Key Encoding
82
142
------------------
0 commit comments