Skip to content

Commit 5d4b82f

Browse files
authored
#144 Merge pull request from astropenguin/astropenguin/issue143
Release v1.0.0
2 parents 23267b0 + 51ed3ff commit 5d4b82f

18 files changed

+272
-349
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2020-2021 Akio Taniguchi
3+
Copyright (c) 2020-2022 Akio Taniguchi
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 98 additions & 107 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![PyPI](https://img.shields.io/pypi/v/xarray-dataclasses.svg?label=PyPI&style=flat-square)](https://pypi.org/project/xarray-dataclasses/)
44
[![Python](https://img.shields.io/pypi/pyversions/xarray-dataclasses.svg?label=Python&color=yellow&style=flat-square)](https://pypi.org/project/xarray-dataclasses/)
5-
[![Test](https://img.shields.io/github/workflow/status/astropenguin/xarray-dataclasses/Test?logo=github&label=Test&style=flat-square)](https://github.com/astropenguin/xarray-dataclasses/actions)
5+
[![Test](https://img.shields.io/github/workflow/status/astropenguin/xarray-dataclasses/Tests?logo=github&label=Test&style=flat-square)](https://github.com/astropenguin/xarray-dataclasses/actions)
66
[![License](https://img.shields.io/badge/license-MIT-blue.svg?label=License&style=flat-square)](LICENSE)
77
[![DOI](https://img.shields.io/badge/DOI-10.5281/zenodo.4624819-blue?style=flat-square)](https://doi.org/10.5281/zenodo.4624819)
88

@@ -11,7 +11,7 @@ xarray extension for typed DataArray and Dataset creation
1111

1212
## Overview
1313

14-
xarray-dataclasses is a Python package that makes it easy to create typed DataArray and Dataset objects of [xarray] using [the Python's dataclass].
14+
xarray-dataclasses is a Python package that makes it easy to create [xarray]'s DataArray and Dataset objects that are "typed" (i.e. fixed dimensions, data type, coordinates, attributes, and name) using [the Python's dataclass]:
1515

1616
```python
1717
from dataclasses import dataclass
@@ -25,96 +25,36 @@ Y = Literal["y"]
2525

2626
@dataclass
2727
class Image(AsDataArray):
28-
"""Specs for a monochromatic image."""
28+
"""2D image as DataArray."""
2929

3030
data: Data[tuple[X, Y], float]
3131
x: Coord[X, int] = 0
3232
y: Coord[Y, int] = 0
33-
34-
35-
# create an image as DataArray
36-
image = Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
37-
38-
# create an image filled with ones
39-
ones = Image.ones((2, 2), x=[0, 1], y=[0, 1])
4033
```
4134

4235
### Features
4336

44-
- DataArray and Dataset objects with fixed dimensions, data type, and coordinates can easily be created.
45-
- NumPy-like special functions such as ``ones()`` are provided as class methods.
46-
- Compatible with [the Python's dataclass].
47-
- Compatible with static type check by [Pyright].
37+
- Typed DataArray or Dataset objects can easily be created:
38+
```python
39+
image = Image.new([[0, 1], [2, 3]], [0, 1], [0, 1])
40+
```
41+
- NumPy-like filled-data creation is also available:
42+
```python
43+
image = Image.zeros([2, 2], x=[0, 1], y=[0, 1])
44+
```
45+
- Support for features by [the Python's dataclass] (`field`, `__post_init__`, ...).
46+
- Support for static type check by [Pyright].
4847

4948
### Installation
5049

5150
```shell
52-
$ pip install xarray-dataclasses
51+
pip install xarray-dataclasses
5352
```
5453

55-
56-
## Background
57-
58-
[xarray] is useful for handling labeled multi-dimensional data, but it is a bit troublesome to create DataArray and Dataset objects with fixed dimensions, data type, or coordinates (typed DataArray and typed Dataset).
59-
For example, let us think about the following DataArray specifications for a monochromatic image.
60-
61-
- Dimensions of data must be `("x", "y")`.
62-
- Data type of data must be `float`.
63-
- Data type of dimensions must be `int`.
64-
- Default value of dimensions must be `0`.
65-
66-
Then a function to create a typed DataArray object is something like this.
67-
68-
```python
69-
import numpy as np
70-
import xarray as xr
71-
72-
73-
def create_image(data, x=0, y=0):
74-
"""Create a monochromatic image."""
75-
data = np.array(data)
76-
77-
if x == 0:
78-
x = np.full(data.shape[0], x)
79-
else:
80-
x = np.array(x)
81-
82-
if y == 0:
83-
y = np.full(data.shape[1], y)
84-
else:
85-
y = np.array(y)
86-
87-
return xr.DataArray(
88-
data=data.astype(float),
89-
dims=("x", "y"),
90-
coords={
91-
"x": ("x", x.astype(int)),
92-
"y": ("y", y.astype(int)),
93-
},
94-
)
95-
96-
97-
image = create_image([[0, 1], [2, 3]])
98-
```
99-
100-
The issues are
101-
102-
- It is not easy to figure out the specifications from the code.
103-
- It is not easy to reuse the code, for example, to add new coordinates.
104-
105-
xarray-dataclasses resolves them by defining the specifications as a dataclass.
106-
As shown in the code in the overview, the specifications become much easier to read.
107-
108-
- The type hints have complete information for DataArray creation.
109-
- The default values are given as class variables.
110-
- The mix-in class `AsDataArray` provides class methods such as `new()`.
111-
- The extension of the specifications is easy by class inheritance.
112-
11354
## Basic usage
11455

11556
xarray-dataclasses uses [the Python's dataclass].
116-
Please learn how to use it before proceeding.
117-
Data (or data variables), coordinates, attributes, and a name of a DataArray or a Dataset object are defined as dataclass fields with the following type hints.
57+
Data (or data variables), coordinates, attributes, and a name of DataArray or Dataset objects will be defined as dataclass fields by special type hints (`Data`, `Coord`, `Attr`, `Name`), respectively.
11858
Note that the following code is supposed in the examples below.
11959

12060
```python
@@ -130,14 +70,15 @@ Y = Literal["y"]
13070

13171
### Data field
13272

133-
The data field is a field whose value will become the data of a DataArray object or a data variable of a Dataset object.
73+
Data field is a field whose value will become the data of a DataArray object or a data variable of a Dataset object.
13474
The type hint `Data[TDims, TDtype]` fixes the dimensions and the data type of the object.
13575
Here are some examples of how to specify them.
13676

13777
Type hint | Inferred dimensions
13878
--- | ---
139-
`Data[Literal[()], ...]` | `()`
79+
`Data[tuple[()], ...]` | `()`
14080
`Data[Literal["x"], ...]` | `("x",)`
81+
`Data[tuple[Literal["x"]], ...]` | `("x",)`
14182
`Data[tuple[Literal["x"], Literal["y"]], ...]` | `("x", "y")`
14283

14384
Type hint | Inferred data type
@@ -146,33 +87,33 @@ Type hint | Inferred data type
14687
`Data[..., None]` | `None`
14788
`Data[..., float]` | `numpy.dtype("float64")`
14889
`Data[..., numpy.float128]` | `numpy.dtype("float128")`
149-
| `Data[..., Literal["datetime64[ns]"]]` | `numpy.dtype("<M8[ns]")`
90+
`Data[..., Literal["datetime64[ns]"]]` | `numpy.dtype("<M8[ns]")`
15091

15192
### Coordinate field
15293

153-
The coordinate field is a field whose value will become a coordinate of a DataArray or a Dataset object.
94+
Coordinate field is a field whose value will become a coordinate of a DataArray or a Dataset object.
15495
The type hint `Coord[TDims, TDtype]` fixes the dimensions and the data type of the object.
15596

15697
### Attribute field
15798

158-
The attribute field is a field whose value will become an attribute of a DataArray or a Dataset object.
159-
The type hint `Attr[T]` specifies the type of the value, which is used only for static type check.
99+
Attribute field is a field whose value will become an attribute of a DataArray or a Dataset object.
100+
The type hint `Attr[TAttr]` specifies the type of the value, which is used only for static type check.
160101

161102
### Name field
162103

163-
The name field is a field whose value will become the name of a DataArray object.
164-
The type hint `Name[T]` specifies the type of the value, which is used only for static type check.
104+
Name field is a field whose value will become the name of a DataArray object.
105+
The type hint `Name[TName]` specifies the type of the value, which is used only for static type check.
165106

166107
### DataArray class
167108

168-
The DataArray class is a dataclass that defines typed DataArray specifications.
109+
DataArray class is a dataclass that defines typed DataArray specifications.
169110
Exactly one data field is allowed in a DataArray class.
170111
The second and subsequent data fields are just ignored in DataArray creation.
171112

172113
```python
173114
@dataclass
174115
class Image(AsDataArray):
175-
"""Specs for a monochromatic image."""
116+
"""2D image as DataArray."""
176117

177118
data: Data[tuple[X, Y], float]
178119
x: Coord[X, int] = 0
@@ -181,7 +122,7 @@ class Image(AsDataArray):
181122
name: Name[str] = "luminance"
182123
```
183124

184-
A DataArray object is created by the shorthand method `new()`.
125+
A DataArray object will be created by a class method `new()`:
185126

186127
```python
187128
Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
@@ -196,7 +137,7 @@ Attributes:
196137
units: cd / m^2
197138
```
198139

199-
NumPy-like `empty()`, `zeros()`, `ones()`, `full()` methods are available.
140+
NumPy-like class methods (`zeros()`, `ones()`, ...) are also available:
200141

201142
```python
202143
Image.ones((3, 3))
@@ -214,13 +155,13 @@ Attributes:
214155

215156
### Dataset class
216157

217-
The Dataset class is a dataclass that defines typed Dataset specifications.
158+
Dataset class is a dataclass that defines typed Dataset specifications.
218159
Multiple data fields are allowed to define the data variables of the object.
219160

220161
```python
221162
@dataclass
222163
class ColorImage(AsDataset):
223-
"""Specs for a color image."""
164+
"""2D color image as Dataset."""
224165

225166
red: Data[tuple[X, Y], float]
226167
green: Data[tuple[X, Y], float]
@@ -230,7 +171,7 @@ class ColorImage(AsDataset):
230171
units: Attr[str] = "cd / m^2"
231172
```
232173

233-
A Dataset object is created by the shorthand method `new()`.
174+
A Dataset object will be created by a class method `new()`:
234175

235176
```python
236177
ColorImage.new(
@@ -256,42 +197,90 @@ Attributes:
256197

257198
### Coordof and Dataof type hints
258199

259-
xarray-dataclasses provides advanced type hints, `Coordof[T]` and `Dataof[T]`.
200+
xarray-dataclasses provides advanced type hints, `Coordof` and `Dataof`.
260201
Unlike `Data` and `Coord`, they specify a dataclass that defines a DataArray class.
261-
This is useful, for example, when users want to add metadata to dimensions for [plotting].
202+
This is useful when users want to add metadata to dimensions for [plotting].
203+
For example:
262204

263205
```python
264206
from xarray_dataclasses import Coordof
265207

266208

267209
@dataclass
268210
class XAxis:
269-
"""Specs for the x axis."""
270-
271211
data: Data[X, int]
272212
long_name: Attr[str] = "x axis"
273213
units: Attr[str] = "pixel"
274214

275215

276216
@dataclass
277217
class YAxis:
278-
"""Specs for the y axis."""
279-
280218
data: Data[Y, int]
281219
long_name: Attr[str] = "y axis"
282220
units: Attr[str] = "pixel"
283221

284222

285223
@dataclass
286224
class Image(AsDataArray):
287-
"""Specs for a monochromatic image."""
225+
"""2D image as DataArray."""
288226

289227
data: Data[tuple[X, Y], float]
290228
x: Coordof[XAxis] = 0
291229
y: Coordof[YAxis] = 0
292230
```
293231

294-
### Options for DataArray and Dataset creation
232+
### General data varible names in Dataset creation
233+
234+
Due to the limitation of Python's parameter names, it is not possible to define data variable names that contain white spaces, for example.
235+
In such cases, please define DataArray classes of each data variable so that they have name fields and specify them by `Dataof` in a Dataset class.
236+
Then the values of the name fields will be used as data variable names.
237+
For example:
238+
239+
```python
240+
@dataclass
241+
class Red:
242+
data: Data[tuple[X, Y], float]
243+
name: Name[str] = "Red image"
244+
245+
246+
@dataclass
247+
class Green:
248+
data: Data[tuple[X, Y], float]
249+
name: Name[str] = "Green image"
250+
251+
252+
@dataclass
253+
class Blue:
254+
data: Data[tuple[X, Y], float]
255+
name: Name[str] = "Blue image"
256+
257+
258+
@dataclass
259+
class ColorImage(AsDataset):
260+
"""2D color image as Dataset."""
261+
262+
red: Dataof[Red]
263+
green: Dataof[Green]
264+
blue: Dataof[Blue]
265+
```
266+
267+
```python
268+
ColorImage.new(
269+
[[0, 0], [0, 0]],
270+
[[1, 1], [1, 1]],
271+
[[2, 2], [2, 2]],
272+
)
273+
274+
<xarray.Dataset>
275+
Dimensions: (x: 2, y: 2)
276+
Dimensions without coordinates: x, y
277+
Data variables:
278+
Red image (x, y) float64 0.0 0.0 0.0 0.0
279+
Green image (x, y) float64 1.0 1.0 1.0 1.0
280+
Blue image (x, y) float64 2.0 2.0 2.0 2.0
281+
```
282+
283+
### Customization of DataArray or Dataset creation
295284

296285
For customization, users can add a special class attribute, `__dataoptions__`, to a DataArray or Dataset class.
297286
A custom factory for DataArray or Dataset creation is only supported in the current implementation.
@@ -307,45 +296,47 @@ class Custom(xr.DataArray):
307296

308297
__slots__ = ()
309298

310-
def custom_method(self) -> None:
311-
print("Custom method!")
299+
def custom_method(self) -> bool:
300+
"""Custom method."""
301+
return True
312302

313303

314304
@dataclass
315305
class Image(AsDataArray):
316-
"""Specs for a monochromatic image."""
317-
318-
__dataoptions__ = DataOptions(Custom)
306+
"""2D image as DataArray."""
319307

320308
data: Data[tuple[X, Y], float]
321309
x: Coord[X, int] = 0
322310
y: Coord[Y, int] = 0
323311

312+
__dataoptions__ = DataOptions(Custom)
313+
324314

325315
image = Image.ones([3, 3])
326316
isinstance(image, Custom) # True
327-
image.custom_method() # Custom method!
317+
image.custom_method() # True
328318
```
329319

330320
### DataArray and Dataset creation without shorthands
331321

332322
xarray-dataclasses provides functions, `asdataarray` and `asdataset`.
333-
This is useful, for example, users do not want to inherit the mix-in class (`AsDataArray` or `AsDataset`) in a DataArray or Dataset dataclass.
323+
This is useful when users do not want to inherit the mix-in class (`AsDataArray` or `AsDataset`) in a DataArray or Dataset dataclass.
324+
For example:
334325

335326
```python
336327
from xarray_dataclasses import asdataarray
337328

338329

339330
@dataclass
340331
class Image:
341-
"""Specifications of images."""
332+
"""2D image as DataArray."""
342333

343334
data: Data[tuple[X, Y], float]
344335
x: Coord[X, int] = 0
345336
y: Coord[Y, int] = 0
346337

347338

348-
image = asdataarray(Image([[0, 1], [2, 3]], x=[0, 1], y=[0, 1]))
339+
image = asdataarray(Image([[0, 1], [2, 3]], [0, 1], [0, 1]))
349340
```
350341

351342

0 commit comments

Comments
 (0)