Skip to content

Commit 4153ce4

Browse files
committed
Allow for passing custom slice sources via the configuration
1 parent 0d8b549 commit 4153ce4

File tree

9 files changed

+412
-72
lines changed

9 files changed

+412
-72
lines changed

CHANGES.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
11
## Version 0.2.1 (in development)
22

3-
* Using `sizes` instead of `dims` attribute of `xarray.Dataset` in implementation
4-
code. [#25]
3+
* Allow for passing custom slice sources via the configuration.
4+
The new configuration setting `slice_source` is the name of a class
5+
derived from `zappend.api.SliceSource` or a function that creates an instance
6+
of `zappend.api.SliceSource`. If `slice_source` is given, slices passed to
7+
the zappend function or CLI command will be interpreted as parameter(s)
8+
passed to the constructor of the specified class or the factory function.
9+
[#27]
10+
11+
* Using `sizes` instead of `dims` attribute of `xarray.Dataset` in
12+
implementation code. [#25]
13+
514
* Enhanced documentation including docstrings of several Python API objects.
615

716

docs/config.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,11 @@ The URI or local path of the target Zarr dataset. Must be a directory.
1515
Type _object_.
1616
Options for the filesystem given by the URI of `target_dir`.
1717

18+
### `slice_source`
19+
20+
Type _string_.
21+
The fully qualified name of a class or function that provides a slice source for each slice item. If a class is given, it must be derived from `zappend.api.SliceSource`. If a function is given, it must return an instance of `zappend.api.SliceSource`. Refer to the user guide for more information.
22+
1823
### `slice_engine`
1924

2025
Type _string_.

docs/guide.md

Lines changed: 58 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -459,10 +459,53 @@ Or use default polling:
459459

460460
### Slice Sources
461461

462-
Using the `zappend` command, slice dataset are provided as local filesystem paths
463-
or by paths into other filesystems in case the slice datasets are provided by a URI.
464-
This section describes additional options to pass slice datasets to the `slices`
465-
argument of the [`zappend`](api.md) Python function.
462+
A _slice source_ is an object that provides a slice dataset of type `xarray.Dataset`
463+
for given parameters of any type.
464+
465+
The optional `slice_source` configuration setting is used to specify a custom
466+
slice source. If not specified, `zappend` selects the slice source based on the type
467+
of a given slice object. These types are described in following subsections.
468+
469+
If given, the value of the `slice_source` setting is a class derived from
470+
`zappend.api.SliceSource`, or a function that creates an instance of
471+
`zappend.api.SliceSource`, or the fully qualified name of the aforementioned.
472+
In the case `slice_source` is given, the _slices_ argument passed to the CLI
473+
command and Python function become parameters to the specified class constructor
474+
or factory function.
475+
The individual slice items in the `SLICES` arguments of the `zappend` CLI
476+
command are of type `str`, typically interpreted as file paths or URIs.
477+
The individual slice items passed in the `slices` argument of the
478+
`zappend.api.zappend()` function can be of any type, but the `tuple`, `list`,
479+
and `dict` types have a special meaning:
480+
481+
* `tuple`: a pair of the form `(args, kwargs)`, where `args` is a list
482+
or tuple of positional arguments and `kwargs` is a dictionary of keyword
483+
arguments;
484+
* `list`: positional arguments only;
485+
* `dict`: keyword arguments only;
486+
* Any other type is interpreted as single positional argument.
487+
488+
In addition, your class constructor or factory function specified by `slice_source`
489+
may specify a positional or keyword argument named `ctx`, which will receive the
490+
current processing context of type `zappend.api.Context`.
491+
492+
If the `slice_source` setting is _not_ specified, the slice items passed as `slices`
493+
argument to the [`zappend`](api.md) Python function can be one of the types described
494+
in the following subsections.
495+
496+
#### `str` and `zappend.api.FileObj`
497+
498+
A slice object of type `str` is interpreted as local file path or URI, in the case
499+
the path has a protocol prefix, such as `s3://`.
500+
501+
An alternative to providing the slice dataset as path or URI is using the `FileObj`
502+
class, which combines a URI with dedicated filesystem storage options.
503+
504+
```python
505+
from zappend.api import FileObj
506+
507+
slice_obj = FileObj(slice_uri, storage_options=dict(...))
508+
```
466509

467510
#### `xarray.Dataset`
468511

@@ -504,19 +547,10 @@ at the cost of additional i/o. It therefore defaults to `false`.
504547
Often you want to perform some custom cleanup after a slice has been processed and
505548
appended to the target dataset. In this case you can write your own
506549
`zappend.api.SliceSource` by implementing its `get_dataset()` and `dispose()`
507-
methods. Slice source instances are supposed to be created by _slice factories_,
508-
see below.
509-
510-
#### `zappend.api.FileObj`
511-
512-
An alternative to providing the slice dataset as path or URI is using the `FileObj`
513-
class, which combines a URI with dedicated filesystem storage options.
550+
methods.
514551

515-
```python
516-
from zappend.api import FileObj
517-
518-
slice_obj = FileObj(slice_uri, storage_options=dict(...))
519-
```
552+
Slice source instances are supposed to be created by _slice factories_, see
553+
subsection below.
520554

521555
#### `zappend.api.SliceFactory`
522556

@@ -532,10 +566,8 @@ processed. Slice factories are created from the custom slice source and the slic
532566
using the utility function [to_slice_factories()][zappend.slice.factory.to_slice_factories]:
533567

534568
```python
535-
from typing import Iterable
536569
import numpy as np
537570
import xarray as xr
538-
from zappend.api import SliceFactory
539571
from zappend.api import SliceSource
540572
from zappend.api import to_slice_factories
541573
from zappend.api import zappend
@@ -578,6 +610,14 @@ zappend(to_slice_factories(MySliceSource, ["slice-1.nc", "slice-2.nc", "slice-3.
578610
target_dir="target.zarr")
579611
```
580612

613+
Note, the above example can be simplified by using the `slice_source` setting directly:
614+
615+
```python
616+
zappend(["slice-1.nc", "slice-2.nc", "slice-3.nc"],
617+
target_dir="target.zarr",
618+
slice_source=MySliceSource)
619+
```
620+
581621
## Logging
582622

583623
The `zappend` logging output is configured using the `logging` setting.

tests/test_config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,7 @@ def test_get_config_schema(self):
236236
"persist_mem_slices",
237237
"slice_engine",
238238
"slice_polling",
239+
"slice_source",
239240
"slice_storage_options",
240241
"target_storage_options",
241242
"target_dir",

tests/test_context.py

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,13 @@
44

55
import unittest
66

7+
import pytest
78
import xarray as xr
89
from zappend.api import zappend
910
from zappend.context import Context
1011
from zappend.fsutil.fileobj import FileObj
1112
from zappend.metadata import DatasetMetadata
13+
from zappend.slice import SliceSource
1214
from .helpers import clear_memory_fs
1315
from .helpers import make_test_dataset
1416

@@ -73,3 +75,90 @@ def test_dry_run(self):
7375
self.assertEqual(False, ctx.dry_run)
7476
ctx = Context({"target_dir": "memory://target.zarr", "dry_run": True})
7577
self.assertEqual(True, ctx.dry_run)
78+
79+
def test_slice_source_as_name(self):
80+
ctx = Context(
81+
{
82+
"target_dir": "memory://target.zarr",
83+
"slice_source": "tests.test_context.new_custom_slice_source",
84+
}
85+
)
86+
self.assertEqual(new_custom_slice_source, ctx.slice_source)
87+
88+
ctx = Context(
89+
{
90+
"target_dir": "memory://target.zarr",
91+
"slice_source": "tests.test_context.CustomSliceSource",
92+
}
93+
)
94+
self.assertEqual(CustomSliceSource, ctx.slice_source)
95+
96+
# staticmethod
97+
ctx = Context(
98+
{
99+
"target_dir": "memory://target.zarr",
100+
"slice_source": "tests.test_context.CustomSliceSource.new1",
101+
}
102+
)
103+
self.assertEqual(CustomSliceSource.new1, ctx.slice_source)
104+
105+
# classmethod
106+
ctx = Context(
107+
{
108+
"target_dir": "memory://target.zarr",
109+
"slice_source": "tests.test_context.CustomSliceSource.new2",
110+
}
111+
)
112+
self.assertEqual(CustomSliceSource.new2, ctx.slice_source)
113+
114+
def test_slice_source_as_type(self):
115+
ctx = Context(
116+
{
117+
"target_dir": "memory://target.zarr",
118+
"slice_source": new_custom_slice_source,
119+
}
120+
)
121+
self.assertIs(new_custom_slice_source, ctx.slice_source)
122+
123+
ctx = Context(
124+
{
125+
"target_dir": "memory://target.zarr",
126+
"slice_source": CustomSliceSource,
127+
}
128+
)
129+
self.assertIs(CustomSliceSource, ctx.slice_source)
130+
131+
with pytest.raises(
132+
TypeError,
133+
match=(
134+
"slice_source must a callable"
135+
" or the fully qualified name of a callable"
136+
),
137+
):
138+
Context(
139+
{
140+
"target_dir": "memory://target.zarr",
141+
"slice_source": 11,
142+
}
143+
)
144+
145+
146+
def new_custom_slice_source(ctx: Context, index: int):
147+
return CustomSliceSource(ctx, index)
148+
149+
150+
class CustomSliceSource(SliceSource):
151+
def __init__(self, ctx: Context, index: int):
152+
super().__init__(ctx)
153+
self.index = index
154+
155+
def get_dataset(self) -> xr.Dataset:
156+
return make_test_dataset(index=self.index)
157+
158+
@staticmethod
159+
def new1(ctx: Context, index: int):
160+
return CustomSliceSource(ctx, index)
161+
162+
@classmethod
163+
def new2(cls, ctx: Context, index: int):
164+
return cls(ctx, index)

0 commit comments

Comments
 (0)