Skip to content

Commit f31a5aa

Browse files
authored
Merge pull request #61 from bcdev/forman-60-custom_attrs
Allow for custom attributes
2 parents 29cab0b + b695007 commit f31a5aa

File tree

18 files changed

+1227
-168
lines changed

18 files changed

+1227
-168
lines changed

CHANGES.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,34 @@
22

33
### Enhancements
44

5+
* The configuration setting `attrs` can now be used to define dynamically
6+
computed dataset attributes using the syntax `{{ expression }}`. [#60]
7+
8+
Example:
9+
```yaml
10+
permit_eval: true
11+
attrs:
12+
title: HROC Ocean Colour Monthly Composite
13+
time_coverage_start: {{ lower_bound(ds.time) }}
14+
time_coverage_end: {{ upper_bound(ds.time) }}
15+
```
16+
17+
* Introduced new configuration setting `attrs_update_mode` that controls
18+
how dataset attributes are updated. [#59]
19+
520
* Simplified logging to console. You can now set configuration setting `logging`
621
to a log level which will implicitly enable console logging with given log
722
level. [#64]
823

924
* Added a section in the notebook `examples/zappend-demo.ipynb`
1025
that demonstrates transaction rollbacks.
1126

27+
1228
* Added CLI option `--traceback`. [#57]
1329

30+
* Added a section in the notebook `examples/zappend-demo.ipynb`
31+
that demonstrates transaction rollbacks.
32+
1433

1534
## Version 0.4.1 (from 2024-02-13)
1635

docs/config.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,36 @@ Variable metadata.
143143
Type _object_.
144144
Arbitrary variable metadata attributes.
145145

146+
## `attrs`
147+
148+
Type _object_.
149+
Arbitrary dataset attributes. If `permit_eval` is set to `true`, string values may include Python expressions enclosed in `{{` and `}}` to dynamically compute attribute values; in the expression, the current dataset is named `ds`. Refer to the user guide for more information.
150+
151+
## `attrs_update_mode`
152+
153+
The mode used update target attributes from slice attributes. Independently of this setting, extra attributes configured by the `attrs` setting will finally be used to update the resulting target attributes.
154+
Must be one of the following:
155+
156+
* Use attributes from first slice dataset and keep them.
157+
Its value is `"keep"`.
158+
159+
* Replace existing attributes by attributes of last slice dataset.
160+
Its value is `"replace"`.
161+
162+
* Update existing attributes by attributes of last slice dataset.
163+
Its value is `"update"`.
164+
165+
* Ignore attributes from slice datasets.
166+
Its value is `"ignore"`.
167+
168+
Defaults to `"keep"`.
169+
170+
## `permit_eval`
171+
172+
Type _boolean_.
173+
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled. Refer to the user guide for more information.
174+
Defaults to `false`.
175+
146176
## `target_dir`
147177

148178
Type _string_.

docs/guide.md

Lines changed: 82 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,9 @@ This remainder of this guide explains the how to use the various `zappend`
8484
variables. A variable comprises the actual data array as well as metadata describing
8585
the data dimensions, units, and encoding, such as chunking and compression.
8686

87-
## Dataset Outline
87+
## Dataset Metadata
88+
89+
### Outline
8890

8991
If no further configuration is supplied, then the target dataset's outline and data
9092
encoding is fully prescribed by the first slice dataset provided. By default, the
@@ -152,6 +154,85 @@ Often, it is easier to specify which variables should be excluded:
152154
"excluded_variables": ["GridCellId"]
153155
}
154156
```
157+
### Attributes
158+
159+
The target dataset should exploit information about itself using global
160+
metadata attributes.
161+
There are three choices to update the global attributes of the target
162+
dataset from slices. The configuration setting `attrs_update_mode`
163+
controls how this is done:
164+
165+
* `"keep"` - use attributes from first slice dataset and keep them (default);
166+
* `"replace"` - replace existing attributes by attributes of last slice dataset;
167+
* `"update"` - update existing attributes by attributes of last slice dataset;
168+
* `"ignore"` - ignore attributes from slice datasets.
169+
170+
Extra attributes can be added using the optional configuration setting `attrs`:
171+
172+
```json
173+
{
174+
"attrs_update_mode": "keep",
175+
"attrs": {
176+
"Conventions": "CF-1.10",
177+
"title": "SMOS Level 2C Soil Moisture 2-Days Composite"
178+
}
179+
}
180+
```
181+
182+
Independently of the `attrs_update_mode` setting, extra attributes configured
183+
by the `attrs` setting will always be used to update the resulting target
184+
attributes.
185+
186+
Attribute values in the `attrs` setting may also be computed dynamically using
187+
the syntax `{{ expression }}`, where `expression` is an arbitrary Python
188+
expression. For this to work, the setting `permit_eval` must be explicitly
189+
set for security reasons:
190+
191+
```json
192+
{
193+
"permit_eval": true,
194+
"attrs_update_mode": "keep",
195+
"attrs": {
196+
"time_coverage_start": "{{ ds.time[0] }}",
197+
"time_coverage_end": "{{ ds.time[-1] }}"
198+
}
199+
}
200+
```
201+
202+
Currently, the only variable accessible from expressions is `ds` which is
203+
a reference to the current state of the target dataset after the last slice
204+
append. It is of type
205+
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).
206+
207+
!!! danger "Evil eval()"
208+
The expressions in `{{ expression }}` are evaluated using the Python
209+
[eval() function](https://docs.python.org/3/library/functions.html#eval).
210+
This can pose a threat to your application and environment.
211+
Although `zappend` does not allow you to directly access Python built-in
212+
functions via expressions, it should be used judiciously and with extreme
213+
caution if used as part of a web service where configuration is injected
214+
from the outside of your network.
215+
216+
The following utility functions can be used as well and are handy if you need
217+
to store the upper and lower bounds of coordinates as attribute values:
218+
219+
* `lower_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
220+
Return the lower bound of a one-dimensional (coordinate) array `array`.
221+
* `upper_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
222+
Return the upper bound of a one-dimensional (coordinate) array `array`.
223+
224+
The `ref` value specifies the reference within an array element that is used
225+
as a basis for the boundary computation. E.g., if coordinate labels refer to
226+
array element centers, pass `ref="center"`.
227+
228+
```json
229+
{
230+
"attrs": {
231+
"time_coverage_start": "{{ lower_bound(ds.time, 'center') }}",
232+
"time_coverage_end": "{{ upper_bound(ds.time, 'center') }}"
233+
}
234+
}
235+
```
155236

156237
## Variable Metadata
157238

tests/config/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)