Skip to content

Commit 70e739c

Browse files
authored
Merge pull request #81 from bcdev/forman-prep_v06
Preparing 0.6 release
2 parents f67c6eb + 8e55ed5 commit 70e739c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+378
-251
lines changed

CHANGES.md

Lines changed: 43 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,42 @@
1-
## Version 0.6.0 (in development)
1+
## Version 0.6.0 (from 2024-03-12)
2+
3+
### Enhancements
24

35
* Added configuration setting `force_new`, which forces creation of a new
46
target dataset. An existing target dataset (and its lock) will be
5-
permanently deleted before appending of slice datasets begins. [#72]
7+
permanently deleted before appending of slice datasets begins. (#72)
8+
9+
* Chunk sizes can now be `null` for a given dimension. In this case the actual
10+
chunk size used is the size of the array's shape in that dimension. (#77)
11+
12+
### API Changes
613

714
* Simplified writing of custom slice sources for users. The configuration setting
815
`slice_source` can now be a `SliceSource` class or any function that returns a
9-
_slice item_: an `xarray.Dataset` object, a `SliceSource` object or
10-
local file path or URI of type `str` or `FileObj`.
11-
Dropped concept of _slice factories_ entirely. [#78]
12-
13-
* Chunk sizes can now be `null` for a given dimension. In this case the actual
14-
chunk size used is the size of the array's shape in that dimension. [#77]
16+
_slice item_: a local file path or URI, an `xarray.Dataset`,
17+
a `SliceSource` object.
18+
Dropped concept of _slice factories_ entirely, including functions
19+
`to_slice_factory()` and `to_slice_factories()`. (#78)
1520

16-
* Internal refactoring: Extracted `Config` class out of `Context` and
17-
made available via new `Context.config: Config` property.
18-
The change concerns any usages of the `ctx: Context` argument passed to
19-
user slice factories. [#74]
21+
* Extracted `Config` class out of `Context` and made available via new
22+
`Context.config: Config` property. The change concerns any usages of the
23+
`ctx: Context` argument passed to user slice factories. (#74)
2024

21-
## Version 0.5.1 (2024-02-23)
25+
## Version 0.5.1 (from 2024-02-23)
2226

2327
* Fixed rollback for situations where writing to Zarr fails shortly after the
24-
Zarr directory has been created. [#69]
28+
Zarr directory has been created. (#69)
2529

2630
In this case the error message was
2731
```TypeError: Transaction._delete_dir() missing 1 required positional argument: 'target_path'```.
2832

2933

30-
## Version 0.5.0 (2024-02-19)
34+
## Version 0.5.0 (from 2024-02-19)
3135

3236
### Enhancements
3337

3438
* The configuration setting `attrs` can now be used to define dynamically
35-
computed dataset attributes using the syntax `{{ expression }}`. [#60]
39+
computed dataset attributes using the syntax `{{ expression }}`. (#60)
3640

3741
Example:
3842
```yaml
@@ -44,16 +48,16 @@
4448
```
4549
4650
* Introduced new configuration setting `attrs_update_mode` that controls
47-
how dataset attributes are updated. [#59]
51+
how dataset attributes are updated. (#59)
4852

4953
* Simplified logging to console. You can now set configuration setting
5054
`logging` to a log level which will implicitly enable console logging with
51-
given log level. [#64]
55+
given log level. (#64)
5256

5357
* Added a section in the notebook `examples/zappend-demo.ipynb`
5458
that demonstrates transaction rollbacks.
5559

56-
* Added CLI option `--traceback`. [#57]
60+
* Added CLI option `--traceback`. (#57)
5761

5862
* Added a section in the notebook `examples/zappend-demo.ipynb`
5963
that demonstrates transaction rollbacks.
@@ -62,17 +66,17 @@
6266

6367
* Fixed issue where a NetCDF package was missing to run the
6468
demo Notebook `examples/zappend-demo.ipynb` in
65-
[Binder](https://mybinder.readthedocs.io/). [#47]
69+
[Binder](https://mybinder.readthedocs.io/). (#47)
6670

6771
## Version 0.4.1 (from 2024-02-13)
6872

6973
### Fixes
7074

71-
* Global metadata attributes of target dataset is no longer empty. [#56]
75+
* Global metadata attributes of target dataset is no longer empty. (#56)
7276

7377
* If the target _parent_ directory did not exist, an exception was raised
7478
reporting that the lock file to be written does not exist. Changed this to
75-
report that the target parent directory does not exist. [#55]
79+
report that the target parent directory does not exist. (#55)
7680

7781
### Enhancements
7882

@@ -87,27 +91,27 @@
8791
the step sizes between the labels of a coordinate variable associated with
8892
the append dimension. Its value can be a number for numerical labels
8993
or a time delta value of the form `8h` (8 hours) or `2D` (two days) for
90-
date/time labels. The value can also be negative. [#21]
94+
date/time labels. The value can also be negative. (#21)
9195

9296
* The configuration setting `append_step` can take the special values
9397
`"+"` and `"-"` which are used to verify that the labels are monotonically
94-
increasing and decreasing, respectively. [#20]
98+
increasing and decreasing, respectively. (#20)
9599

96100
* It is now possible to reference environment variables
97-
in configuration files using the syntax `${ENV_VAR}`. [#36]
101+
in configuration files using the syntax `${ENV_VAR}`. (#36)
98102

99103
* Added a demo Notebook `examples/zappend-demo.ipynb` and linked
100-
it by a binder badge in README.md. [#47]
104+
it by a binder badge in README.md. (#47)
101105

102106
### Fixes
103107

104108
* When `slice_source` was given as class or function and passed
105109
to the `zappend()` function either as configuration entry or as keyword
106-
argument, a `ValidationError` was accidentally raised. [#49]
110+
argument, a `ValidationError` was accidentally raised. (#49)
107111

108112
* Fixed an issue where an absolute lock file path was computed if the target
109113
Zarr path was relative in the local filesystem, and had no parent directory.
110-
[#45]
114+
(#45)
111115

112116
## Version 0.3.0 (from 2024-01-26)
113117

@@ -119,22 +123,22 @@
119123
of `zappend.api.SliceSource`. If `slice_source` is given, slices passed to
120124
the zappend function or CLI command will be interpreted as parameter(s)
121125
passed to the constructor of the specified class or the factory function.
122-
[#27]
126+
(#27)
123127

124128
* It is now possible to configure runtime profiling of the `zappend`
125-
processing using the new configuration setting `profiling`. [#39]
129+
processing using the new configuration setting `profiling`. (#39)
126130

127-
* Added `--version` option to CLI. [#42]
131+
* Added `--version` option to CLI. (#42)
128132

129133
* Using `sizes` instead of `dims` attribute of `xarray.Dataset` in
130-
implementation code. [#25]
134+
implementation code. (#25)
131135

132136
* Enhanced documentation including docstrings of several Python API objects.
133137

134138
### Fixes
135139

136140
* Fixed a problem where the underlying i/o stream of a persistent slice dataset
137-
was closed immediately after opening the dataset. [#31]
141+
was closed immediately after opening the dataset. (#31)
138142

139143
* Now logging ignored encodings on level DEBUG instead of WARNING because they
140144
occur very likely when processing NetCDF files.
@@ -146,17 +150,17 @@
146150
* Introduced _slice factories_
147151
- Allow passing slice object factories to the `zappend()` function.
148152
Main use case is to return instances of a custom `zappend.api.SliceSource`
149-
implemented by users. [#13]
153+
implemented by users. (#13)
150154

151155
- The utility functions `to_slice_factories` and `to_slice_factory`
152156
exported by `zappend.api` ease passing inputs specific for a custom
153-
`SliceSource` or other callables that can produce a slice object. [#22]
157+
`SliceSource` or other callables that can produce a slice object. (#22)
154158

155159
* Introduced new configuration flag `persist_mem_slices`.
156160
If set, in-memory `xr.Dataset` instances will be first persisted to a
157-
temporary Zarr, then reopened, and then appended to the target dataset. [#11]
161+
temporary Zarr, then reopened, and then appended to the target dataset. (#11)
158162

159-
* Added initial documentation. [#17]
163+
* Added initial documentation. (#17)
160164

161165
* Improved readability of generated configuration documentation.
162166

@@ -166,9 +170,9 @@
166170

167171
* Fixed problem when passing slices opened from NetCDF files. The error was
168172
`TypeError: VariableEncoding.__init__() got an unexpected keyword argument 'chunksizes'`.
169-
[#14]
173+
(#14)
170174

171-
* Fixed problem where info about closing slice was logged twice. [#9]
175+
* Fixed problem where info about closing slice was logged twice. (#9)
172176

173177

174178
## Version 0.1.1 (from 2024-01-10)

docs/guide.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -676,7 +676,7 @@ for computed slice datasets, especially if the specified target dataset chunking
676676
different from the slice dataset chunking. This may cause Dask graphs to be
677677
computed multiple times if the source chunking overlaps multiple target chunks,
678678
potentially causing large resource overheads while recomputing and/or reloading the same
679-
source chunks multiple times. In such cases it can help to "terminate" such
679+
source chunks multiple times. In such cases it can help to "terminate"
680680
computations for each slice by persisting the computed dataset first and then to
681681
reopen it. This can be specified using the `persist_mem_slice` setting:
682682

@@ -693,7 +693,7 @@ at the cost of additional i/o. It therefore defaults to `false`.
693693
#### Slice Sources
694694

695695
If you need some custom cleanup after a slice has been processed and appended to the
696-
target dataset, you can use an instance of `zappend.api.SliceSource` as slice item.
696+
target dataset, you can use instances of `zappend.api.SliceSource` as slice items.
697697
A `SliceSource` class requires you to implement two methods:
698698

699699
* `get_dataset()` to return the slice dataset of type `xarray.Dataset`, and
@@ -725,10 +725,11 @@ class MySliceSource(SliceSource):
725725
self.ds = None
726726
```
727727

728-
Instead of providing instances of `SliceSource` directly as a slice item, it is often
728+
Instead of providing instances of `SliceSource` as slice items, it is often
729729
easier to pass your `SliceSource` class and let `zappend` pass the slice item as
730730
arguments(s) to your `SliceSource`'s constructor. This can be achieved using the
731-
the `slice_source` configuration setting.
731+
the `slice_source` configuration setting. If you need to access configuration
732+
settings, it is even required to use the `slice_source` setting.
732733

733734
```json
734735
{
@@ -737,7 +738,8 @@ the `slice_source` configuration setting.
737738
```
738739

739740
The `slice_source` setting can actually be **any Python function** that returns a
740-
valid slice item as described above.
741+
valid slice item as described above such as a file path or URI, or
742+
an `xarray.Dataset`.
741743

742744
If a slice source is configured, each slice item passed to `zappend` is passed as
743745
argument to your slice source.
@@ -754,13 +756,14 @@ argument to your slice source.
754756
- `dict`: keyword arguments only;
755757
- Any other type is interpreted as single positional argument.
756758

757-
In addition, your slice source function or class constructor specified by `slice_source`
758-
may define a 1st positional argument or keyword argument named `ctx`,
759-
which will receive the current processing context of type `zappend.api.Context`.
760-
This can be useful if you need to read configuration settings.
759+
In addition, your slice source function or class constructor specified by
760+
`slice_source` may define a 1st positional argument or keyword argument
761+
named `ctx`, which will receive the current processing context of type
762+
`zappend.api.Context`. This can be useful if you need to read configuration
763+
settings.
761764

762765
Here is a more advanced example of a slice source that opens datasets from a given
763-
file path and averages the values first along the time dimension:
766+
file path and averages the values along the time dimension:
764767

765768
```python
766769
import numpy as np

docs/start.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,5 +74,24 @@ zappend((f"s3:/mybucket/data/{name}"
7474
config=config)
7575
```
7676

77-
Slice datasets can be passed in a number of ways; please refer to the section
78-
[_Slice Sources_](guide.md#slice-sources) in the [User Guide](guide.md).
77+
Slice items can also be arguments passed to your custom _slice source_,
78+
a function or class that provides the actual slice to be appended:
79+
80+
```python
81+
import xarray as xr
82+
from zappend.api import zappend
83+
84+
85+
def get_dataset(path: str):
86+
ds = xr.open_dataset(path)
87+
return ds.drop_vars(["ndvi_min", "ndvi_max"])
88+
89+
zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
90+
slice_source=get_dataset,
91+
target_dir="target.zarr")
92+
```
93+
94+
For the details, please refer to the section [_Slice Sources_](guide.md#slice-sources) in the
95+
[User Guide](guide.md).
96+
97+

0 commit comments

Comments
 (0)