Skip to content

Commit b5f45e4

Browse files
committed
#143 Update README
1 parent 3dc9a1e commit b5f45e4

File tree

1 file changed

+95
-104
lines changed

1 file changed

+95
-104
lines changed

README.md

Lines changed: 95 additions & 104 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ xarray extension for typed DataArray and Dataset creation
1111

1212
## Overview
1313

14-
xarray-dataclasses is a Python package that makes it easy to create typed DataArray and Dataset objects of [xarray] using [the Python's dataclass].
14+
xarray-dataclasses is a Python package that makes it easy to create [xarray]'s DataArray or Dataset objects that are "typed" (i.e. fixed dimensions, data type, coordinates, attributes, and name) using [the Python's dataclass]:
1515

1616
```python
1717
from dataclasses import dataclass
@@ -25,95 +25,36 @@ Y = Literal["y"]
2525

2626
@dataclass
2727
class Image(AsDataArray):
28-
"""Specs for a monochromatic image."""
28+
"""2D image as DataArray."""
2929

3030
data: Data[tuple[X, Y], float]
3131
x: Coord[X, int] = 0
3232
y: Coord[Y, int] = 0
33-
34-
35-
# create an image as DataArray
36-
image = Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
37-
38-
# create an image filled with ones
39-
ones = Image.ones((2, 2), x=[0, 1], y=[0, 1])
4033
```
4134

4235
### Features
4336

44-
- DataArray and Dataset objects with fixed dimensions, data type, and coordinates can easily be created.
45-
- NumPy-like special functions such as ``ones()`` are provided as class methods.
46-
- Compatible with [the Python's dataclass].
47-
- Compatible with static type check by [Pyright].
37+
- Typed DataArray or Dataset objects can easily be created:
38+
```python
39+
image = Image.new([[0, 1], [2, 3]], [0, 1], [0, 1])
40+
```
41+
- NumPy-like filled-data creation is also available:
42+
```python
43+
image = Image.zeros([2, 2], x=[0, 1], y=[0, 1])
44+
```
45+
- Support for features by [the Python's dataclass] (`field`, `__post_init__`, ...).
46+
- Support for static type check by [Pyright].
4847

4948
### Installation
5049

5150
```shell
5251
pip install xarray-dataclasses
5352
```
5453

55-
## Background
56-
57-
[xarray] is useful for handling labeled multi-dimensional data, but it is a bit troublesome to create DataArray and Dataset objects with fixed dimensions, data type, or coordinates (typed DataArray and typed Dataset).
58-
For example, let us think about the following DataArray specifications for a monochromatic image.
59-
60-
- Dimensions of data must be `("x", "y")`.
61-
- Data type of data must be `float`.
62-
- Data type of dimensions must be `int`.
63-
- Default value of dimensions must be `0`.
64-
65-
Then a function to create a typed DataArray object is something like this.
66-
67-
```python
68-
import numpy as np
69-
import xarray as xr
70-
71-
72-
def create_image(data, x=0, y=0):
73-
"""Create a monochromatic image."""
74-
data = np.array(data)
75-
76-
if x == 0:
77-
x = np.full(data.shape[0], x)
78-
else:
79-
x = np.array(x)
80-
81-
if y == 0:
82-
y = np.full(data.shape[1], y)
83-
else:
84-
y = np.array(y)
85-
86-
return xr.DataArray(
87-
data=data.astype(float),
88-
dims=("x", "y"),
89-
coords={
90-
"x": ("x", x.astype(int)),
91-
"y": ("y", y.astype(int)),
92-
},
93-
)
94-
95-
96-
image = create_image([[0, 1], [2, 3]])
97-
```
98-
99-
The issues are
100-
101-
- It is not easy to figure out the specifications from the code.
102-
- It is not easy to reuse the code, for example, to add new coordinates.
103-
104-
xarray-dataclasses resolves them by defining the specifications as a dataclass.
105-
As shown in the code in the overview, the specifications become much easier to read.
106-
107-
- The type hints have complete information for DataArray creation.
108-
- The default values are given as class variables.
109-
- The mix-in class `AsDataArray` provides class methods such as `new()`.
110-
- The extension of the specifications is easy by class inheritance.
111-
11254
## Basic usage
11355

11456
xarray-dataclasses uses [the Python's dataclass].
115-
Please learn how to use it before proceeding.
116-
Data (or data variables), coordinates, attributes, and a name of a DataArray or a Dataset object are defined as dataclass fields with the following type hints.
57+
Data (or data variables), coordinates, attributes, and a name of DataArray or Dataset objects will be defined as dataclass fields by special type hints (`Data`, `Coord`, `Attr`, `Name`), respectively.
11758
Note that the following code is supposed in the examples below.
11859

11960
```python
@@ -129,14 +70,15 @@ Y = Literal["y"]
12970

13071
### Data field
13172

132-
The data field is a field whose value will become the data of a DataArray object or a data variable of a Dataset object.
73+
Data field is a field whose value will become the data of a DataArray object or a data variable of a Dataset object.
13374
The type hint `Data[TDims, TDtype]` fixes the dimensions and the data type of the object.
13475
Here are some examples of how to specify them.
13576

13677
Type hint | Inferred dimensions
13778
--- | ---
138-
`Data[Literal[()], ...]` | `()`
79+
`Data[tuple[()], ...]` | `()`
13980
`Data[Literal["x"], ...]` | `("x",)`
81+
`Data[tuple[Literal["x"]], ...]` | `("x",)`
14082
`Data[tuple[Literal["x"], Literal["y"]], ...]` | `("x", "y")`
14183

14284
Type hint | Inferred data type
@@ -145,33 +87,33 @@ Type hint | Inferred data type
14587
`Data[..., None]` | `None`
14688
`Data[..., float]` | `numpy.dtype("float64")`
14789
`Data[..., numpy.float128]` | `numpy.dtype("float128")`
148-
| `Data[..., Literal["datetime64[ns]"]]` | `numpy.dtype("<M8[ns]")`
90+
`Data[..., Literal["datetime64[ns]"]]` | `numpy.dtype("<M8[ns]")`
14991

15092
### Coordinate field
15193

152-
The coordinate field is a field whose value will become a coordinate of a DataArray or a Dataset object.
94+
Coordinate field is a field whose value will become a coordinate of a DataArray or a Dataset object.
15395
The type hint `Coord[TDims, TDtype]` fixes the dimensions and the data type of the object.
15496

15597
### Attribute field
15698

157-
The attribute field is a field whose value will become an attribute of a DataArray or a Dataset object.
158-
The type hint `Attr[T]` specifies the type of the value, which is used only for static type check.
99+
Attribute field is a field whose value will become an attribute of a DataArray or a Dataset object.
100+
The type hint `Attr[TAttr]` specifies the type of the value, which is used only for static type check.
159101

160102
### Name field
161103

162-
The name field is a field whose value will become the name of a DataArray object.
163-
The type hint `Name[T]` specifies the type of the value, which is used only for static type check.
104+
Name field is a field whose value will become the name of a DataArray object.
105+
The type hint `Name[TName]` specifies the type of the value, which is used only for static type check.
164106

165107
### DataArray class
166108

167-
The DataArray class is a dataclass that defines typed DataArray specifications.
109+
DataArray class is a dataclass that defines typed DataArray specifications.
168110
Exactly one data field is allowed in a DataArray class.
169111
The second and subsequent data fields are just ignored in DataArray creation.
170112

171113
```python
172114
@dataclass
173115
class Image(AsDataArray):
174-
"""Specs for a monochromatic image."""
116+
"""2D image as DataArray."""
175117

176118
data: Data[tuple[X, Y], float]
177119
x: Coord[X, int] = 0
@@ -180,7 +122,7 @@ class Image(AsDataArray):
180122
name: Name[str] = "luminance"
181123
```
182124

183-
A DataArray object is created by the shorthand method `new()`.
125+
A DataArray object will be created by a class method `new()`:
184126

185127
```python
186128
Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])
@@ -195,7 +137,7 @@ Attributes:
195137
units: cd / m^2
196138
```
197139

198-
NumPy-like `empty()`, `zeros()`, `ones()`, `full()` methods are available.
140+
NumPy-like class methods (`zeros()`, `ones()`, ...) are also available:
199141

200142
```python
201143
Image.ones((3, 3))
@@ -213,13 +155,13 @@ Attributes:
213155

214156
### Dataset class
215157

216-
The Dataset class is a dataclass that defines typed Dataset specifications.
158+
Dataset class is a dataclass that defines typed Dataset specifications.
217159
Multiple data fields are allowed to define the data variables of the object.
218160

219161
```python
220162
@dataclass
221163
class ColorImage(AsDataset):
222-
"""Specs for a color image."""
164+
"""2D color image as Dataset."""
223165

224166
red: Data[tuple[X, Y], float]
225167
green: Data[tuple[X, Y], float]
@@ -229,7 +171,7 @@ class ColorImage(AsDataset):
229171
units: Attr[str] = "cd / m^2"
230172
```
231173

232-
A Dataset object is created by the shorthand method `new()`.
174+
A Dataset object will be created by a class method `new()`:
233175

234176
```python
235177
ColorImage.new(
@@ -255,42 +197,89 @@ Attributes:
255197

256198
### Coordof and Dataof type hints
257199

258-
xarray-dataclasses provides advanced type hints, `Coordof[T]` and `Dataof[T]`.
200+
xarray-dataclasses provides advanced type hints, `Coordof` and `Dataof`.
259201
Unlike `Data` and `Coord`, they specify a dataclass that defines a DataArray class.
260-
This is useful, for example, when users want to add metadata to dimensions for [plotting].
202+
This is useful when users want to add metadata to dimensions for [plotting].
203+
For example:
261204

262205
```python
263206
from xarray_dataclasses import Coordof
264207

265208

266209
@dataclass
267210
class XAxis:
268-
"""Specs for the x axis."""
269-
270211
data: Data[X, int]
271212
long_name: Attr[str] = "x axis"
272213
units: Attr[str] = "pixel"
273214

274215

275216
@dataclass
276217
class YAxis:
277-
"""Specs for the y axis."""
278-
279218
data: Data[Y, int]
280219
long_name: Attr[str] = "y axis"
281220
units: Attr[str] = "pixel"
282221

283222

284223
@dataclass
285224
class Image(AsDataArray):
286-
"""Specs for a monochromatic image."""
225+
"""2D image as DataArray."""
287226

288227
data: Data[tuple[X, Y], float]
289228
x: Coordof[XAxis] = 0
290229
y: Coordof[YAxis] = 0
291230
```
292231

293-
### Options for DataArray and Dataset creation
232+
### General data varible names in Dataset creation
233+
234+
Due to the limitation of Python's parameter names, it is not possible to define data variable names that contain white spaces, for example.
235+
In such cases, please define DataArray classes of each data variable so that they have name fields and specify them by `Dataof` in a Dataset class.
236+
Then the values of the name fields will be used as data variable names.
237+
For example:
238+
239+
```python
240+
@dataclass
241+
class Red:
242+
data: Data[tuple[X, Y], float]
243+
name: Name[str] = "Red image"
244+
245+
246+
@dataclass
247+
class Green:
248+
data: Data[tuple[X, Y], float]
249+
name: Name[str] = "Green image"
250+
251+
252+
@dataclass
253+
class Blue:
254+
data: Data[tuple[X, Y], float]
255+
name: Name[str] = "Blue image"
256+
257+
258+
@dataclass
259+
class ColorImage(AsDataset):
260+
"""2D color image as Dataset."""
261+
262+
red: Dataof[Red]
263+
green: Dataof[Green]
264+
blue: Dataof[Blue]
265+
266+
267+
ColorImage.new(
268+
[[0, 0], [0, 0]],
269+
[[1, 1], [1, 1]],
270+
[[2, 2], [2, 2]],
271+
)
272+
273+
<xarray.Dataset>
274+
Dimensions: (x: 2, y: 2)
275+
Dimensions without coordinates: x, y
276+
Data variables:
277+
Red image (x, y) float64 0.0 0.0 0.0 0.0
278+
Green image (x, y) float64 1.0 1.0 1.0 1.0
279+
Blue image (x, y) float64 2.0 2.0 2.0 2.0
280+
```
281+
282+
### Customization of DataArray or Dataset creation
294283

295284
For customization, users can add a special class attribute, `__dataoptions__`, to a DataArray or Dataset class.
296285
A custom factory for DataArray or Dataset creation is only supported in the current implementation.
@@ -306,45 +295,47 @@ class Custom(xr.DataArray):
306295

307296
__slots__ = ()
308297

309-
def custom_method(self) -> None:
310-
print("Custom method!")
298+
def custom_method(self) -> bool:
299+
"""Custom method."""
300+
return True
311301

312302

313303
@dataclass
314304
class Image(AsDataArray):
315-
"""Specs for a monochromatic image."""
316-
317-
__dataoptions__ = DataOptions(Custom)
305+
"""2D image as DataArray."""
318306

319307
data: Data[tuple[X, Y], float]
320308
x: Coord[X, int] = 0
321309
y: Coord[Y, int] = 0
322310

311+
__dataoptions__ = DataOptions(Custom)
312+
323313

324314
image = Image.ones([3, 3])
325315
isinstance(image, Custom) # True
326-
image.custom_method() # Custom method!
316+
image.custom_method() # True
327317
```
328318

329319
### DataArray and Dataset creation without shorthands
330320

331321
xarray-dataclasses provides functions, `asdataarray` and `asdataset`.
332-
This is useful, for example, users do not want to inherit the mix-in class (`AsDataArray` or `AsDataset`) in a DataArray or Dataset dataclass.
322+
This is useful when users do not want to inherit the mix-in class (`AsDataArray` or `AsDataset`) in a DataArray or Dataset dataclass.
323+
For example:
333324

334325
```python
335326
from xarray_dataclasses import asdataarray
336327

337328

338329
@dataclass
339330
class Image:
340-
"""Specifications of images."""
331+
"""2D image as DataArray."""
341332

342333
data: Data[tuple[X, Y], float]
343334
x: Coord[X, int] = 0
344335
y: Coord[Y, int] = 0
345336

346337

347-
image = asdataarray(Image([[0, 1], [2, 3]], x=[0, 1], y=[0, 1]))
338+
image = asdataarray(Image([[0, 1], [2, 3]], [0, 1], [0, 1]))
348339
```
349340

350341

0 commit comments

Comments
 (0)