Skip to content

Commit b73be77

Browse files
committed
add draft of object dtype extension
1 parent a16e120 commit b73be77

File tree

2 files changed

+106
-0
lines changed

2 files changed

+106
-0
lines changed

docs/protocol/extensions.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Under construction.
1111
extensions/filters/v1.0
1212
extensions/complex-dtypes/v1.0
1313
extensions/datetime-dtypes/v1.0
14+
extensions/object-dtypes/v1.0
1415
extensions/string-dtypes/v1.0
1516
extensions/struct-dtypes/v1.0
1617

Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
===================================
2+
String data types (version 1.0)
3+
===================================
4+
-----------------------------
5+
Editor's draft 2 March 2022
6+
-----------------------------
7+
8+
Specification URI:
9+
http://purl.org/zarr/spec/protocol/extensions/object-dtypes/1.0
10+
Issue tracking:
11+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/object-dtypes-v1.0>`_
12+
Suggest an edit for this spec:
13+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/core-protocol-v3.0-dev/docs/protocol/extension/object-dtypes/v1.0.rst>`_
14+
15+
Copyright 2022 `Zarr core development
16+
team <https://github.com/orgs/zarr-developers/teams/core-devs>`_ (@@TODO
17+
list institutions?). This work is licensed under a `Creative Commons
18+
Attribution 3.0 Unported
19+
License <https://creativecommons.org/licenses/by/3.0/>`_.
20+
21+
----
22+
23+
24+
Abstract
25+
========
26+
27+
This specification is a Zarr protocol extension defining a data type where each
28+
element is an arbitrary Python object.
29+
30+
31+
Status of this document
32+
=======================
33+
34+
This document is a **Work in Progress**. It may be updated, replaced
35+
or obsoleted by other documents at any time. It is inappapropriate to
36+
cite this document as other than work in progress.
37+
38+
Comments, questions or contributions to this document are very
39+
welcome. Comments and questions should be raised via `GitHub issues
40+
<https://github.com/zarr-developers/zarr-specs/labels/object-dtypes-v1.0>`_. When
41+
raising an issue, please add the label "object-dtypes-v1.0".
42+
43+
This document was produced by the `Zarr core development team
44+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
45+
46+
47+
Document conventions
48+
====================
49+
50+
Conformance requirements are expressed with a combination of
51+
descriptive assertions and [RFC2119]_ terminology. The key words
52+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
53+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
54+
parts of this document are to be interpreted as described in
55+
[RFC2119]_. However, for readability, these words do not appear in all
56+
uppercase letters in this specification.
57+
58+
All of the text of this specification is normative except sections
59+
explicitly marked as non-normative, examples, and notes. Examples in
60+
this specification are introduced with the words "for example".
61+
62+
63+
Object data types
64+
=================
65+
NumPy's object arrays are arrays where each element is an arbitrary Python
66+
object. The array elements correspond to the
67+
`numpy.object_ <https://numpy.org/doc/1.22/reference/arrays.scalars.html#numpy.object_>`
68+
type which has character code `'O'`. A common concrete use case for this type
69+
is to have an array where each element is another array (and each array can
70+
have a different length). Another use case is to store an array of variable
71+
length strings. It is important to note that such an array actually just stores the references to the Python objects and not the objects themselves. Accessing
72+
an element of the array returns the Python object it refers to.
73+
74+
Data Types added by this extension
75+
==================================
76+
77+
.. list-table:: Data types
78+
:header-rows: 1
79+
80+
* - Identifier
81+
- Numerical type
82+
- Size (no. bytes)
83+
- Byte order
84+
* - ``O`` (uppercase letter o)
85+
- 8 (TODO: I assume this is actually a hardware-dependent memory address size?)
86+
- address of a Python object
87+
- None
88+
89+
90+
References
91+
==========
92+
93+
.. [NumPy] NumPy Data type objects. NumPy version 1.22.0
94+
documentation. URL:
95+
https://numpy.org/doc/1.22/reference/arrays.dtypes.html
96+
97+
.. [H5Py variable length strings] Variable length strings
98+
documentation. URL:
99+
https://docs.h5py.org/en/stable/special.html#variable-length-strings
100+
101+
Change log
102+
==========
103+
104+
@@TODO
105+

0 commit comments

Comments
 (0)