Skip to content

Commit a16e120

Browse files
committed
add draft of struct dtype extension
1 parent 177c552 commit a16e120

File tree

3 files changed

+130
-2
lines changed

3 files changed

+130
-2
lines changed

docs/protocol/extensions.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ Under construction.
1212
extensions/complex-dtypes/v1.0
1313
extensions/datetime-dtypes/v1.0
1414
extensions/string-dtypes/v1.0
15+
extensions/struct-dtypes/v1.0
1516

1617

1718
A number of other features might be included in the core protocol v3, but are currently considered as extensions.

docs/protocol/extensions/string-dtypes/v1.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@ These are zero-terminated bytes and correspond to
9494
For example ``a = np.array(["a", "bcd", "efgh"], dtype="U4")`` creates an array where ``a.dtype.kind`` is 'U' and each element is a sequence of 4 characters where each character occupies 4 bytes (UTF-32).
9595

9696

97-
Units
98-
=====
97+
Data Types added by this extension
98+
==================================
9999

100100
.. list-table:: Data types
101101
:header-rows: 1
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
===================================
2+
String data types (version 1.0)
3+
===================================
4+
-----------------------------
5+
Editor's draft 2 March 2022
6+
-----------------------------
7+
8+
Specification URI:
9+
http://purl.org/zarr/spec/protocol/extensions/struct-dtypes/1.0
10+
Issue tracking:
11+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/struct-dtypes-v1.0>`_
12+
Suggest an edit for this spec:
13+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/protocol/extension/struct-dtypes/v1.0.rst>`_
14+
15+
Copyright 2022 `Zarr core development
16+
team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO
17+
list institutions?). This work is licensed under a `Creative Commons
18+
Attribution 3.0 Unported
19+
License <https://creativecommons.org/licenses/by/3.0/>`_.
20+
21+
----
22+
23+
24+
Abstract
25+
========
26+
27+
This specification is a Zarr protocol extension defining data types
28+
for structured arrays. It is an early draft and currently just describes existing support for NumPy-style `structured arrays`_ that already have support in
29+
zarr-python, but are not part of the core Zarr v3 spec.
30+
31+
32+
Status of this document
33+
=======================
34+
35+
This document is a **Work in Progress**. It may be updated, replaced
36+
or obsoleted by other documents at any time. It is inappapropriate to
37+
cite this document as other than work in progress.
38+
39+
Comments, questions or contributions to this document are very
40+
welcome. Comments and questions should be raised via `GitHub issues
41+
<https://github.com/zarr-developers/zarr-specs/labels/struct-dtypes-v1.0>`_. When
42+
raising an issue, please add the label "struct-dtypes-v1.0".
43+
44+
This document was produced by the `Zarr core development team
45+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
46+
47+
48+
Document conventions
49+
====================
50+
51+
Conformance requirements are expressed with a combination of
52+
descriptive assertions and [RFC2119]_ terminology. The key words
53+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
54+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
55+
parts of this document are to be interpreted as described in
56+
[RFC2119]_. However, for readability, these words do not appear in all
57+
uppercase letters in this specification.
58+
59+
All of the text of this specification is normative except sections
60+
explicitly marked as non-normative, examples, and notes. Examples in
61+
this specification are introduced with the words "for example".
62+
63+
64+
Extension data types
65+
====================
66+
67+
NumPy allows representation of `Structured Arrays`_ where each element of the
68+
array is actually some combination of fields, each of which may have its own
69+
unique data type. NumPy's Record Arrays (``numpy.recarray``) also use this data type. The actual data is stored as an opaque seqeuence of bytes
70+
(i.e. a structure) as represented by (``numpy.void``) and thus the string
71+
representation of this dtype in NumPy is ``'|Vn'`` where ``n`` is some integer
72+
number of bytes. In order to be able to properly interpret data of this type
73+
if is necessary to store information on the fields
74+
75+
A concrete example of such an array from the NumPy docs is::
76+
77+
dogs = np.array([('Rex', 9, 81.0), ('Fido', 3, 27.0)],
78+
dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f4')])
79+
80+
where here ``dogs.dtype.kind`` is 'V' and ``dogs.dtype.str`` is ``'|V48'``
81+
indicating the 48 bytes are needed to store each element (4 bytes each for
82+
``age`` and ``weight`` and 4 * 10 = 40 bytes for a 10-character UTF-32
83+
``name``). If we were to read such a sequence of bytes from a Zarr array, we
84+
need the dtype description to know how to properly interpret this sequence of
85+
48 bytes. The NumPy dtype object has a ``descr`` attribute that describes this.
86+
In this case ``dogs.dtype.descr`` is ``[('name', '<U10'), ('age', '<i4'), ('weight', '<f4')]``.
87+
88+
89+
Data Types added by this extension
90+
==================================
91+
92+
.. list-table:: Data types
93+
:header-rows: 1
94+
95+
* - Identifier
96+
- Numerical type
97+
- Size (no. bytes)
98+
- Byte order
99+
* - list of (<name>, <type>) tuples
100+
- structure with named fields, each with possibly unique data type
101+
- sum over the size of the dtypes in the identifier
102+
- None
103+
104+
Here <field name> is the name of the struct field and <type> is any of the
105+
scalar dtypes supported by the core Zarr v3 spec or the available extensions.
106+
In the case of NumPy's structured arrays this identifier is simply
107+
array's ``.dtype.descr`` attribute.
108+
109+
110+
References
111+
==========
112+
113+
.. [NumPy] NumPy Data type objects. NumPy version 1.22.0
114+
documentation. URL:
115+
https://numpy.org/doc/1.22/reference/arrays.dtypes.html
116+
117+
.. [NumPy Structured] Structured Arrays.
118+
documentation. URL:
119+
https://numpy.org/doc/1.22/user/basics.rec.html
120+
121+
Change log
122+
==========
123+
124+
@@TODO
125+
126+
127+
.. _structured arrays: https://numpy.org/doc/1.22/user/basics.rec.html

0 commit comments

Comments
 (0)