Skip to content

Commit d8d7f02

Browse files
committed
Categorized configuration settings
1 parent 539870b commit d8d7f02

File tree

5 files changed

+190
-156
lines changed

5 files changed

+190
-156
lines changed

CHANGES.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
[contextlib.closing()](https://docs.python.org/3/library/contextlib.html#contextlib.closing)
1111
is applicable. Deprecated `SliceSource.dispose()`.
1212

13+
* Improve the configuration reference Introduced configuration schema categories
14+
15+
* Introduced configuration setting `extra`, which is an arbitrary configuration that
16+
is not validated by default. Intended use is by a `slice_source` that expects an
17+
argument named `ctx` and therefore can access the configuration.
18+
1319
## Version 0.6.0 (from 2024-03-12)
1420

1521
### Enhancements

docs/config.md

Lines changed: 57 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,15 @@
11
# Configuration Reference
22

3+
In the following all possible configuration settings are described.
34

4-
## `version`
5+
## Target Outline
56

6-
Configuration schema version. Allows the schema to evolve while still preserving backwards compatibility.
7-
Its value is `1`.
8-
Defaults to `1`.
9-
10-
## `zarr_version`
11-
12-
The Zarr version to be used.
13-
Its value is `2`.
14-
Defaults to `2`.
15-
16-
## `fixed_dims`
17-
18-
Type _object_.
19-
Specifies the fixed dimensions of the target dataset. Keys are dimension names, values are dimension sizes.
20-
The object's values are of type _integer_.
21-
22-
## `append_dim`
7+
### `append_dim`
238

249
Type _string_.
2510
The name of the variadic append dimension.
2611
Defaults to `"time"`.
27-
28-
## `append_step`
12+
### `append_step`
2913

3014
If set, enforces a step size in the append dimension between two slices or just enforces a direction.
3115
Must be one of the following:
@@ -46,20 +30,22 @@ Must be one of the following:
4630
A positive or negative numerical delta value.
4731

4832
Defaults to `null`.
33+
### `fixed_dims`
4934

50-
## `included_variables`
35+
Type _object_.
36+
Specifies the fixed dimensions of the target dataset. Keys are dimension names, values are dimension sizes.
37+
The object's values are of type _integer_.
38+
### `included_variables`
5139

5240
Type _array_.
5341
Specifies the names of variables to be included in the target dataset. Defaults to all variables found in the first contributing dataset.
5442
The items of the array are of type _string_.
55-
56-
## `excluded_variables`
43+
### `excluded_variables`
5744

5845
Type _array_.
5946
Specifies the names of individual variables to be excluded from all contributing datasets.
6047
The items of the array are of type _string_.
61-
62-
## `variables`
48+
### `variables`
6349

6450
Type _object_.
6551
Defines dimensions, encoding, and attributes for variables in the target dataset. Object property names refer to variable names. The special name `*` refers to all variables, which is useful for defining common values.
@@ -149,13 +135,11 @@ Variable metadata.
149135
* `attrs`:
150136
Type _object_.
151137
Arbitrary variable metadata attributes.
152-
153-
## `attrs`
138+
### `attrs`
154139

155140
Type _object_.
156141
Arbitrary dataset attributes. If `permit_eval` is set to `true`, string values may include Python expressions enclosed in `{{` and `}}` to dynamically compute attribute values; in the expression, the current dataset is named `ds`. Refer to the user guide for more information.
157-
158-
## `attrs_update_mode`
142+
### `attrs_update_mode`
159143

160144
The mode used update target attributes from slice attributes. Independently of this setting, extra attributes configured by the `attrs` setting will finally be used to update the resulting target attributes.
161145
Must be one of the following:
@@ -173,39 +157,37 @@ Must be one of the following:
173157
Its value is `"ignore"`.
174158

175159
Defaults to `"keep"`.
160+
### `zarr_version`
176161

177-
## `permit_eval`
178-
179-
Type _boolean_.
180-
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled. Refer to the user guide for more information.
181-
Defaults to `false`.
162+
The Zarr version to be used.
163+
Its value is `2`.
164+
Defaults to `2`.
165+
## Data I/O - Target
182166

183-
## `target_dir`
167+
### `target_dir`
184168

185169
Type _string_.
186170
The URI or local path of the target Zarr dataset. Must specify a directory whose parent directory must exist.
187-
188-
## `target_storage_options`
171+
### `target_storage_options`
189172

190173
Type _object_.
191174
Options for the filesystem given by the URI of `target_dir`.
175+
### `force_new`
192176

193-
## `slice_source`
194-
195-
Type _string_.
196-
The fully qualified name of a class or function that receives a slice item as argument(s) and provides the slice dataset. If a class is given, it must be derived from `zappend.api.SliceSource`. If the function is a context manager, it must yield an `xarray.Dataset`. If a plain function is given, it must return any valid slice item type. Refer to the user guide for more information.
197-
198-
## `slice_engine`
199-
200-
Type _string_.
201-
The name of the engine to be used for opening contributing datasets. Refer to the `engine` argument of the function `xarray.open_dataset()`.
177+
Type _boolean_.
178+
Force creation of a new target dataset. An existing target dataset (and its lock) will be permanently deleted before appending of slice datasets begins. WARNING: the deletion cannot be rolled back.
179+
Defaults to `false`.
180+
## Data I/O - Slices
202181

203-
## `slice_storage_options`
182+
### `slice_storage_options`
204183

205184
Type _object_.
206185
Options for the filesystem given by the protocol of the URIs of contributing datasets.
186+
### `slice_engine`
207187

208-
## `slice_polling`
188+
Type _string_.
189+
The name of the engine to be used for opening contributing datasets. Refer to the `engine` argument of the function `xarray.open_dataset()`.
190+
### `slice_polling`
209191

210192
Defines how to poll for contributing datasets.
211193
Must be one of the following:
@@ -230,36 +212,52 @@ Must be one of the following:
230212
Polling timeout in seconds.
231213
Defaults to `60`.
232214

215+
### `slice_source`
216+
217+
Type _string_.
218+
The fully qualified name of a class or function that receives a slice item as argument(s) and provides the slice dataset. If a class is given, it must be derived from `zappend.api.SliceSource`. If the function is a context manager, it must yield an `xarray.Dataset`. If a plain function is given, it must return any valid slice item type. Refer to the user guide for more information.
219+
### `slice_source_kwargs`
233220

234-
## `persist_mem_slices`
221+
Type _object_.
222+
Extra keyword-arguments passed to a configured `slice_source` together with each slice item.
223+
### `persist_mem_slices`
235224

236225
Type _boolean_.
237226
Persist in-memory slices and reopen from a temporary Zarr before appending them to the target dataset. This can prevent expensive re-computation of dask chunks at the cost of additional i/o.
238227
Defaults to `false`.
228+
## Data I/O - Transactions
239229

240-
## `temp_dir`
230+
### `temp_dir`
241231

242232
Type _string_.
243233
The URI or local path of the directory that will be used to temporarily store rollback information.
244-
245-
## `temp_storage_options`
234+
### `temp_storage_options`
246235

247236
Type _object_.
248237
Options for the filesystem given by the protocol of `temp_dir`.
249-
250-
## `force_new`
238+
### `disable_rollback`
251239

252240
Type _boolean_.
253-
Force creation of a new target dataset. An existing target dataset (and its lock) will be permanently deleted before appending of slice datasets begins. WARNING: the deletion cannot be rolled back.
241+
Disable rolling back dataset changes on failure. Effectively disables transactional dataset modifications, so use this setting with care.
254242
Defaults to `false`.
243+
## Misc.
244+
245+
### `version`
255246

256-
## `disable_rollback`
247+
Configuration schema version. Allows the schema to evolve while still preserving backwards compatibility.
248+
Its value is `1`.
249+
Defaults to `1`.
250+
### `dry_run`
257251

258252
Type _boolean_.
259-
Disable rolling back dataset changes on failure. Effectively disables transactional dataset modifications, so use this setting with care.
253+
If `true`, log only what would have been done, but don't apply any changes.
260254
Defaults to `false`.
255+
### `permit_eval`
261256

262-
## `profiling`
257+
Type _boolean_.
258+
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled. Refer to the user guide for more information.
259+
Defaults to `false`.
260+
### `profiling`
263261

264262
Profiling configuration. Allows for runtime profiling of the processing.
265263
Must be one of the following:
@@ -307,8 +305,7 @@ Must be one of the following:
307305
Pattern-match the standard name that is printed.
308306

309307

310-
311-
## `logging`
308+
### `logging`
312309

313310
Logging configuration.
314311
Must be one of the following:
@@ -402,9 +399,3 @@ Must be one of the following:
402399
The items of the array are of type _string_.
403400

404401

405-
## `dry_run`
406-
407-
Type _boolean_.
408-
If `true`, log only what would have been done, but don't apply any changes.
409-
Defaults to `false`.
410-

tests/config/test_schema.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ class ConfigSchemaTest(unittest.TestCase):
1111
def test_get_config_schema(self):
1212
schema = get_config_schema()
1313
self.assertIn("properties", schema)
14-
self.assertIsInstance(schema["properties"], dict)
14+
properties = schema["properties"]
15+
self.assertIsInstance(properties, dict)
1516
self.assertEqual(
1617
{
1718
"append_dim",
@@ -41,8 +42,12 @@ def test_get_config_schema(self):
4142
"version",
4243
"zarr_version",
4344
},
44-
set(schema["properties"].keys()),
45+
set(properties.keys()),
4546
)
47+
for k, v in properties.items():
48+
self.assertIsInstance(v, dict)
49+
self.assertIn("category", v, msg=k)
50+
self.assertIn("description", v, msg=k)
4651

4752
def test_get_config_schema_json(self):
4853
# Smoke test is sufficient here

zappend/config/markdown.py

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,29 @@
66
from typing import Any
77

88

9-
def schema_to_markdown(schema: dict[str, Any]) -> str:
9+
def schema_to_markdown(config_schema: dict[str, Any]) -> str:
1010
lines = []
11-
_schema_to_md(schema, [], lines)
11+
12+
settings = config_schema["properties"]
13+
categories = {}
14+
for setting_name, setting_schema in settings.items():
15+
category_name = setting_schema["category"]
16+
if category_name not in categories:
17+
categories[category_name] = []
18+
categories[category_name].append(setting_name)
19+
20+
lines.append("# Configuration Reference")
21+
lines.append("")
22+
lines.append("In the following all possible configuration settings are described.")
23+
lines.append("")
24+
for category_name, setting_names in categories.items():
25+
lines.append(f"## {category_name}")
26+
lines.append("")
27+
for setting_name in setting_names:
28+
lines.append(f"### `{setting_name}`")
29+
lines.append("")
30+
_schema_to_md(settings[setting_name], [setting_name], lines)
31+
1232
return "\n".join(lines)
1333

1434

@@ -19,10 +39,9 @@ def _schema_to_md(
1939
sequence_name: str | None = None,
2040
):
2141
undefined = object()
22-
is_root = len(path) == 0
2342

2443
_type = schema.get("type")
25-
if _type and not is_root:
44+
if _type:
2645
if isinstance(_type, str):
2746
_type = [_type]
2847
value = " | ".join([f"_{name}_" for name in _type])
@@ -31,12 +50,6 @@ def _schema_to_md(
3150
else:
3251
lines.append(f"Type {value}.")
3352

34-
title = schema.get("title")
35-
if title:
36-
prefix = "# " if is_root else ""
37-
lines.append(prefix + title)
38-
lines.append("")
39-
4053
description = schema.get("description")
4154
if description:
4255
lines.append(description)
@@ -83,18 +96,12 @@ def _schema_to_md(
8396
properties = schema.get("properties")
8497
if properties:
8598
for name, property_schema in properties.items():
86-
if is_root:
87-
lines.append("")
88-
lines.append(f"## `{name}`")
89-
lines.append("")
90-
_schema_to_md(property_schema, path + [name], lines)
91-
else:
92-
lines.append("")
93-
lines.append(f" * `{name}`:")
94-
sub_lines = []
95-
_schema_to_md(property_schema, path + [name], sub_lines)
96-
for sub_line in sub_lines:
97-
lines.append(" " + sub_line)
99+
lines.append("")
100+
lines.append(f" * `{name}`:")
101+
sub_lines = []
102+
_schema_to_md(property_schema, path + [name], sub_lines)
103+
for sub_line in sub_lines:
104+
lines.append(" " + sub_line)
98105

99106
additional_properties = schema.get("additionalProperties")
100107
if isinstance(additional_properties, dict):

0 commit comments

Comments
 (0)