Skip to content

Commit 7635f99

Browse files
committed
encode NaN as "NaN"
1 parent 3b2e04f commit 7635f99

File tree

4 files changed

+94
-42
lines changed

4 files changed

+94
-42
lines changed

docs/spec/v2.rst

Lines changed: 47 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Metadata
4040

4141
Each array requires essential configuration metadata to be stored, enabling
4242
correct interpretation of the stored data. This metadata is encoded using JSON
43-
and stored as the value of the '.zarray' key within an array store.
43+
and stored as the value of the ".zarray" key within an array store.
4444

4545
The metadata resource is a JSON object. The following keys MUST be present
4646
within the object:
@@ -66,9 +66,9 @@ fill_value
6666
A scalar value providing the default value to use for uninitialized
6767
portions of the array.
6868
order
69-
Either 'C' or 'F', defining the layout of bytes within each chunk of the
70-
array. 'C' means row-major order, i.e., the last dimension varies fastest;
71-
'F' means column-major order, i.e., the first dimension varies fastest.
69+
Either "C" or "F", defining the layout of bytes within each chunk of the
70+
array. "C" means row-major order, i.e., the last dimension varies fastest;
71+
"F" means column-major order, i.e., the first dimension varies fastest.
7272

7373
Other keys MUST NOT be present within the metadata object.
7474

@@ -116,8 +116,17 @@ Structured data types (i.e., with multiple named fields) are encoded as a list
116116
of two-element lists, following `NumPy array protocol type descriptions (descr)
117117
<http://docs.scipy.org/doc/numpy/reference/arrays.interface.html#>`_. For
118118
example, the JSON list ``[["r", "|u1"], ["g", "|u1"], ["b", "|u1"]]`` defines a
119-
data type composed of three single-byte unsigned integers labelled 'r', 'g' and
120-
'b'.
119+
data type composed of three single-byte unsigned integers labelled "r", "g" and
120+
"b".
121+
122+
Fill value encoding
123+
~~~~~~~~~~~~~~~~~~~
124+
125+
Not a Number (NaN) must be encoded as the JSON string "NaN" if used as the
126+
value of the "fill_value" field.
127+
128+
When decoding the "fill_value" field, the JSON string "NaN" should be decoded
129+
as Not a Number (NaN) if the dtype basic type is floating point ("f").
121130

122131
Chunks
123132
~~~~~~
@@ -134,17 +143,17 @@ compressed data.
134143
The compressed sequence of bytes for each chunk is stored under a key formed
135144
from the index of the chunk within the grid of chunks representing the array.
136145
To form a string key for a chunk, the indices are converted to strings and
137-
concatenated with the period character ('.') separating each index. For
146+
concatenated with the period character (".") separating each index. For
138147
example, given an array with shape (10000, 10000) and chunk shape (1000, 1000)
139148
there will be 100 chunks laid out in a 10 by 10 grid. The chunk with indices
140149
(0, 0) provides data for rows 0-1000 and columns 0-1000 and is stored under the
141-
key '0.0'; the chunk with indices (2, 4) provides data for rows 2000-3000 and
142-
columns 4000-5000 and is stored under the key '2.4'; etc.
150+
key "0.0"; the chunk with indices (2, 4) provides data for rows 2000-3000 and
151+
columns 4000-5000 and is stored under the key "2.4"; etc.
143152

144153
There is no need for all chunks to be present within an array store. If a chunk
145154
is not present then it is considered to be in an uninitialized state. An
146155
unitialized chunk MUST be treated as if it was uniformly filled with the value
147-
of the 'fill_value' field in the array metadata. If the 'fill_value' field is
156+
of the "fill_value" field in the array metadata. If the "fill_value" field is
148157
``null`` then the contents of the chunk are undefined.
149158

150159
Note that all chunks in an array have the same shape. If the length of any
@@ -161,30 +170,30 @@ Logical storage paths
161170
Multiple arrays can be stored in the same array store by associating each array
162171
with a different logical path. A logical path is simply an ASCII string. The
163172
logical path is used to form a prefix for keys used by the array. For example,
164-
if an array is stored at logical path 'foo/bar' then the array metadata will be
165-
stored under the key 'foo/bar/.zarray', the user-defined attributes will be
166-
stored under the key 'foo/bar/.zattrs', and the chunks will be stored under
167-
keys like 'foo/bar/0.0', 'foo/bar/0.1', etc.
173+
if an array is stored at logical path "foo/bar" then the array metadata will be
174+
stored under the key "foo/bar/.zarray", the user-defined attributes will be
175+
stored under the key "foo/bar/.zattrs", and the chunks will be stored under
176+
keys like "foo/bar/0.0", "foo/bar/0.1", etc.
168177

169178
To ensure consistent behaviour across different storage systems, logical paths
170179
MUST be normalized as follows:
171180

172-
* Replace all backward slash characters ('\\') with forward slash characters
173-
('/')
174-
* Strip any leading '/' characters
175-
* Strip any trailing '/' characters
176-
* Collapse any sequence of more than one '/' character into a single '/'
181+
* Replace all backward slash characters ("\\") with forward slash characters
182+
("/")
183+
* Strip any leading "/" characters
184+
* Strip any trailing "/" characters
185+
* Collapse any sequence of more than one "/" character into a single "/"
177186
character
178187

179-
The key prefix is then obtained by appending a single '/' character to the
188+
The key prefix is then obtained by appending a single "/" character to the
180189
normalized logical path.
181190

182-
After normalization, if splitting a logical path by the '/' character results
183-
in any path segment equal to the string '.' or the string '..' then an error
191+
After normalization, if splitting a logical path by the "/" character results
192+
in any path segment equal to the string "." or the string ".." then an error
184193
MUST be raised.
185194

186195
N.B., how the underlying array store processes requests to store values under
187-
keys containing the '/' character is entirely up to the store implementation
196+
keys containing the "/" character is entirely up to the store implementation
188197
and is not constrained by this specification. E.g., an array store could simply
189198
treat all keys as opaque ASCII strings; equally, an array store could map
190199
logical paths onto some kind of hierarchical storage (e.g., directories on a
@@ -194,20 +203,20 @@ Groups
194203
~~~~~~
195204

196205
Arrays can be organized into groups which can also contain other groups. A
197-
group is created by storing group metadata under the '.zgroup' key under some
206+
group is created by storing group metadata under the ".zgroup" key under some
198207
logical path. E.g., a group exists at the root of an array store if the
199-
'.zgroup' key exists in the store, and a group exists at logical path 'foo/bar'
200-
if the 'foo/bar/.zgroup' key exists in the store.
208+
".zgroup" key exists in the store, and a group exists at logical path "foo/bar"
209+
if the "foo/bar/.zgroup" key exists in the store.
201210

202211
If the user requests a group to be created under some logical path, then groups
203212
MUST also be created at all ancestor paths. E.g., if the user requests group
204-
creation at path 'foo/bar' then groups MUST be created at path 'foo' and the
213+
creation at path "foo/bar" then groups MUST be created at path "foo" and the
205214
root of the store, if they don't already exist.
206215

207216
If the user requests an array to be created under some logical path, then
208217
groups MUST also be created at all ancestor paths. E.g., if the user requests
209-
array creation at path 'foo/bar/baz' then groups must be created at path
210-
'foo/bar', path 'foo', and the root of the store, if they don't already exist.
218+
array creation at path "foo/bar/baz" then groups must be created at path
219+
"foo/bar", path "foo", and the root of the store, if they don't already exist.
211220

212221
The group metadata resource is a JSON object. The following keys MUST be present
213222
within the object:
@@ -220,20 +229,20 @@ Other keys MUST NOT be present within the metadata object.
220229

221230
The members of a group are arrays and groups stored under logical paths that
222231
are direct children of the parent group's logical path. E.g., if a groups exist
223-
under the logical paths 'foo' and 'foo/bar' and an array exists at logical path
224-
'foo/baz' then the members of the group at path 'foo' are the group at path
225-
'foo/bar' and the array at path 'foo/baz'.
232+
under the logical paths "foo" and "foo/bar" and an array exists at logical path
233+
"foo/baz" then the members of the group at path "foo" are the group at path
234+
"foo/bar" and the array at path "foo/baz".
226235

227236
Attributes
228237
----------
229238

230239
An array or group can be associated with custom attributes, which are simple
231240
key/value items with application-specific meaning. Custom attributes are
232-
encoded as a JSON object and stored under the '.zattrs' key within an array
241+
encoded as a JSON object and stored under the ".zattrs" key within an array
233242
store.
234243

235244
For example, the JSON object below encodes three attributes named
236-
'foo', 'bar' and 'baz'::
245+
"foo", "bar" and "baz"::
237246

238247
{
239248
"foo": 42,
@@ -258,7 +267,7 @@ Create an array::
258267
... fill_value=42, compression='zlib', compression_opts=1,
259268
... store=store, overwrite=True)
260269

261-
No chunks are initialized yet, so only the '.zarray' and '.zattrs' keys
270+
No chunks are initialized yet, so only the ".zarray" and ".zattrs" keys
262271
have been set in the store::
263272

264273
>>> import os
@@ -427,7 +436,7 @@ Changes in version 2
427436

428437
* Added support for storing multiple arrays in the same store and organising
429438
arrays into hierarchies using groups.
430-
* Array metadata is now stored under the '.zarray' key instead of the 'meta'
439+
* Array metadata is now stored under the ".zarray" key instead of the "meta"
431440
key
432-
* Custom attributes are now stored under the '.zattrs' key instead of the
433-
'attrs' key
441+
* Custom attributes are now stored under the ".zattrs" key instead of the
442+
"attrs" key

zarr/meta.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,16 @@ def decode_array_metadata(s):
2121
if zarr_format != ZARR_FORMAT:
2222
raise MetadataError('unsupported zarr format: %s' % zarr_format)
2323
try:
24+
dtype = decode_dtype(meta['dtype'])
25+
fill_value = decode_fill_value(meta['fill_value'], dtype)
2426
meta = dict(
2527
zarr_format=meta['zarr_format'],
2628
shape=tuple(meta['shape']),
2729
chunks=tuple(meta['chunks']),
28-
dtype=decode_dtype(meta['dtype']),
30+
dtype=dtype,
2931
compression=meta['compression'],
3032
compression_opts=meta['compression_opts'],
31-
fill_value=meta['fill_value'],
33+
fill_value=fill_value,
3234
order=meta['order'],
3335
)
3436
except Exception as e:
@@ -45,7 +47,7 @@ def encode_array_metadata(meta):
4547
dtype=encode_dtype(meta['dtype']),
4648
compression=meta['compression'],
4749
compression_opts=meta['compression_opts'],
48-
fill_value=meta['fill_value'],
50+
fill_value=encode_fill_value(meta['fill_value']),
4951
order=meta['order'],
5052
)
5153
s = json.dumps(meta, indent=4, sort_keys=True, ensure_ascii=True)
@@ -98,3 +100,22 @@ def encode_group_metadata(meta=None):
98100
s = json.dumps(meta, indent=4, sort_keys=True, ensure_ascii=True)
99101
b = s.encode('ascii')
100102
return b
103+
104+
105+
def decode_fill_value(v, dtype):
106+
if v == 'NaN' and dtype.kind == 'f':
107+
return np.nan
108+
else:
109+
return v
110+
111+
112+
def encode_fill_value(v):
113+
try:
114+
isnan = np.isnan(v)
115+
except TypeError:
116+
return v
117+
else:
118+
if isnan:
119+
return 'NaN'
120+
else:
121+
return v

zarr/tests/test_creation.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,14 @@ def test_full():
9393
eq((10,), z.chunks)
9494
assert_array_equal(np.full(100, fill_value=42, dtype='i4'), z[:])
9595

96+
# nan
97+
z = full(100, chunks=10, fill_value=np.nan, dtype='f8')
98+
assert np.all(np.isnan(z[:]))
99+
100+
# "NaN"
101+
z = full(100, chunks=10, fill_value='NaN', dtype='U3')
102+
assert np.all(z[:] == 'NaN')
103+
96104

97105
def test_open_array():
98106

zarr/tests/test_meta.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,8 +118,22 @@ def test_encode_decode_array_nan_fill_value():
118118
order='C'
119119
)
120120

121-
# test fill value round trip
121+
meta_json = '''{
122+
"chunks": [10],
123+
"compression": "zlib",
124+
"compression_opts": 1,
125+
"dtype": "<f8",
126+
"fill_value": "NaN",
127+
"order": "C",
128+
"shape": [100],
129+
"zarr_format": %s
130+
}''' % ZARR_FORMAT
131+
132+
# test encoding
122133
meta_enc = encode_array_metadata(meta)
134+
assert_json_eq(meta_json, meta_enc)
135+
136+
# test decoding
123137
meta_dec = decode_array_metadata(meta_enc)
124138
actual = meta_dec['fill_value']
125139
assert np.isnan(actual)

0 commit comments

Comments
 (0)