Skip to content

Commit a4866a9

Browse files
committed
permit_computed_attrs -> permit_eval;
new attrs_update_mode "ignore"; updated guide.
1 parent 208c16b commit a4866a9

File tree

10 files changed

+133
-29
lines changed

10 files changed

+133
-29
lines changed

docs/config.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -146,28 +146,31 @@ Variable metadata.
146146
## `attrs`
147147

148148
Type _object_.
149-
Arbitrary dataset attributes. If `permit_computed_attrs` is set to `true`, string values may include Python expressions enclosed in `{{` and `}}` to dynamically compute attribute values; in the expression, the current dataset is named `ds`. Refer to the user guide for more information.
149+
Arbitrary dataset attributes. If `permit_eval` is set to `true`, string values may include Python expressions enclosed in `{{` and `}}` to dynamically compute attribute values; in the expression, the current dataset is named `ds`. Refer to the user guide for more information.
150150

151151
## `attrs_update_mode`
152152

153153
The mode used update target attributes from slice attributes. Independently of this setting, extra attributes configured by the `attrs` setting will finally be used to update the resulting target attributes.
154154
Must be one of the following:
155155

156-
* Keep attributes from first slice dataset.
156+
* Use attributes from first slice dataset and keep them.
157157
Its value is `"keep"`.
158158

159-
* Replace attributes from last slice dataset.
159+
* Replace existing attributes by attributes of last slice dataset.
160160
Its value is `"replace"`.
161161

162-
* Update attributes from last slice dataset.
162+
* Update existing attributes by attributes of last slice dataset.
163163
Its value is `"update"`.
164164

165+
* Ignore attributes from slice datasets.
166+
Its value is `"ignore"`.
167+
165168
Defaults to `"keep"`.
166169

167-
## `permit_computed_attrs`
170+
## `permit_eval`
168171

169172
Type _boolean_.
170-
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled.
173+
Allow for dynamically computed values in dataset attributes `attrs` using the syntax `{{ expression }}`. Executing arbitrary Python expressions is a security risk, therefore this must be explicitly enabled. Refer to the user guide for more information.
171174
Defaults to `false`.
172175

173176
## `target_dir`

docs/guide.md

Lines changed: 74 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -156,14 +156,83 @@ Often, it is easier to specify which variables should be excluded:
156156
```
157157
### Attributes
158158

159-
_TODO: Write section_
159+
The target dataset should exploit information about itself using global
160+
metadata attributes.
161+
There are three choices to update the global attributes of the target
162+
dataset from slices. The configuration setting `attrs_update_mode`
163+
controls how this is done:
160164

161-
Configuration settings:
165+
* `"keep"` - use attributes from first slice dataset and keep them (default);
166+
* `"replace"` - replace existing attributes by attributes of last slice dataset;
167+
* `"update"` - update existing attributes by attributes of last slice dataset;
168+
* `"ignore"` - ignore attributes from slice datasets.
162169

163-
* `attrs`
164-
* `permit_dyn_attrs`
165-
* `attrs_update_mode`
170+
Extra attributes can be added using the optional configuration setting `attrs`:
166171

172+
```json
173+
{
174+
"attrs_update_mode": "keep",
175+
"attrs": {
176+
"Conventions": "CF-1.10",
177+
"title": "SMOS Level 2C Soil Moisture 2-Days Composite"
178+
}
179+
}
180+
```
181+
182+
Independently of the `attrs_update_mode` setting, extra attributes configured
183+
by the `attrs` setting will always be used to update the resulting target
184+
attributes.
185+
186+
Attribute values in the `attrs` setting may also be computed dynamically using
187+
the syntax `{{ expression }}`, where `expression` is an arbitrary Python
188+
expression. For this to work, the setting `permit_eval` must be explicitly
189+
set for security reasons:
190+
191+
```json
192+
{
193+
"permit_eval": true,
194+
"attrs_update_mode": "keep",
195+
"attrs": {
196+
"time_coverage_start": "{{ ds.time[0] }}",
197+
"time_coverage_end": "{{ ds.time[-1] }}"
198+
}
199+
}
200+
```
201+
202+
Currently, the only variable accessible from expressions is `ds` which is
203+
a reference to the current state of the target dataset after the last slice
204+
append. It is of type
205+
[xarray.Dataset](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).
206+
207+
!!! danger "Evil eval()"
208+
The expressions in `{{ expression }}` are evaluated using the Python
209+
[eval() function](https://docs.python.org/3/library/functions.html#eval).
210+
This can pose a threat to your application and environment.
211+
Although `zappend` does not allow you to directly access Python built-in
212+
functions via expressions, it should be used judiciously and with extreme
213+
caution if used as part of a web service where configuration is injected
214+
from the outside of your network.
215+
216+
The following utility functions can be used as well and are handy if you need
217+
to store the upper and lower bounds of coordinates as attribute values:
218+
219+
* `lower_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
220+
Return the lower bound of a one-dimensional (coordinate) array `array`.
221+
* `upper_bound(array, ref: "lower"|"upper"|"center" = "lower")`:
222+
Return the upper bound of a one-dimensional (coordinate) array `array`.
223+
224+
The `ref` value specifies the reference within an array element that is used
225+
as a basis for the boundary computation. E.g., if coordinate labels refer to
226+
array element centers, pass `ref="center"`.
227+
228+
```json
229+
{
230+
"attrs": {
231+
"time_coverage_start": "{{ lower_bound(ds.time, 'center') }}",
232+
"time_coverage_end": "{{ upper_bound(ds.time, 'center') }}"
233+
}
234+
}
235+
```
167236

168237
## Variable Metadata
169238

tests/config/test_schema.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ def test_get_config_schema(self):
2424
"fixed_dims",
2525
"included_variables",
2626
"logging",
27-
"permit_computed_attrs",
27+
"permit_eval",
2828
"persist_mem_slices",
2929
"profiling",
3030
"slice_engine",

tests/test_api.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -291,7 +291,7 @@ def test_dynamic_attrs_with_one_slice(self):
291291
zappend(
292292
slices,
293293
target_dir=target_dir,
294-
permit_computed_attrs=True,
294+
permit_eval=True,
295295
attrs={
296296
"title": "HROC Ocean Colour Monthly Composite",
297297
"time_coverage_start": "{{ ds.time[0] }}",
@@ -315,7 +315,7 @@ def test_dynamic_attrs_with_some_slices(self):
315315
zappend(
316316
slices,
317317
target_dir=target_dir,
318-
permit_computed_attrs=True,
318+
permit_eval=True,
319319
attrs={
320320
"title": "HROC Ocean Colour Monthly Composite",
321321
"time_coverage_start": "{{ ds.time[0] }}",

tests/test_context.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,18 +62,18 @@ def test_attrs(self):
6262
ctx = Context({"target_dir": "memory://target.zarr"})
6363
self.assertEqual({}, ctx.attrs)
6464
self.assertEqual("keep", ctx.attrs_update_mode)
65-
self.assertEqual(False, ctx.computed_attrs_permitted)
65+
self.assertEqual(False, ctx.permit_eval)
6666
ctx = Context(
6767
{
6868
"target_dir": "memory://target.zarr",
6969
"attrs": {"title": "OCC 2024"},
7070
"attrs_update_mode": "update",
71-
"permit_computed_attrs": True,
71+
"permit_eval": True,
7272
}
7373
)
7474
self.assertEqual({"title": "OCC 2024"}, ctx.attrs)
7575
self.assertEqual("update", ctx.attrs_update_mode)
76-
self.assertEqual(True, ctx.computed_attrs_permitted)
76+
self.assertEqual(True, ctx.permit_eval)
7777

7878
def test_slice_polling(self):
7979
ctx = Context({"target_dir": "memory://target.zarr"})

tests/test_tailoring.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,3 +232,14 @@ def test_it_updates_attrs_according_to_update_mode(self):
232232
},
233233
tailored_ds.attrs,
234234
)
235+
236+
tailored_ds = tailor_slice_dataset(
237+
slice_ds, target_md, "time", "ignore", {"a": 12, "b": True}
238+
)
239+
self.assertEqual(
240+
{
241+
"a": 12,
242+
"b": True,
243+
},
244+
tailored_ds.attrs,
245+
)

zappend/config/schema.py

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@
524524
attrs={
525525
"description": (
526526
"Arbitrary dataset attributes."
527-
" If `permit_computed_attrs` is set to `true`,"
527+
" If `permit_eval` is set to `true`,"
528528
" string values may include Python expressions"
529529
" enclosed in `{{` and `}}` to dynamically compute"
530530
" attribute values; in the expression, the current dataset "
@@ -543,26 +543,39 @@
543543
),
544544
"oneOf": [
545545
{
546-
"description": "Keep attributes from first slice dataset.",
546+
"description": (
547+
"Use attributes from first slice dataset and keep them."
548+
),
547549
"const": "keep",
548550
},
549551
{
550-
"description": "Replace attributes from last slice dataset.",
552+
"description": (
553+
"Replace existing attributes by attributes"
554+
" of last slice dataset."
555+
),
551556
"const": "replace",
552557
},
553558
{
554-
"description": "Update attributes from last slice dataset.",
559+
"description": (
560+
"Update existing attributes by attributes"
561+
" of last slice dataset."
562+
),
555563
"const": "update",
556564
},
565+
{
566+
"description": "Ignore attributes from slice datasets.",
567+
"const": "ignore",
568+
},
557569
],
558570
"default": DEFAULT_ATTRS_UPDATE_MODE,
559571
},
560-
permit_computed_attrs={
572+
permit_eval={
561573
"description": (
562574
"Allow for dynamically computed values in dataset attributes"
563575
" `attrs` using the syntax `{{ expression }}`. "
564576
" Executing arbitrary Python expressions is a security"
565577
" risk, therefore this must be explicitly enabled."
578+
" Refer to the user guide for more information."
566579
),
567580
"type": "boolean",
568581
"default": False,

zappend/context.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -119,13 +119,13 @@ def last_append_label(self) -> Any | None:
119119
return self._last_append_label
120120

121121
@property
122-
def computed_attrs_permitted(self) -> bool:
122+
def permit_eval(self) -> bool:
123123
"""Check if dynamically computed values in dataset attributes `attrs`
124124
using the syntax `{{ expression }}` is permitted. Executing arbitrary
125125
Python expressions is a security risk, therefore this must be explicitly
126126
enabled.
127127
"""
128-
return bool(self._config.get("permit_computed_attrs"))
128+
return bool(self._config.get("permit_eval"))
129129

130130
@property
131131
def target_metadata(self) -> DatasetMetadata | None:

zappend/processor.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ def post_create_target(ctx: Context, target_ds: xr.Dataset):
182182
target_ds: The target dataset.
183183
"""
184184
target_attrs = target_ds.attrs
185-
if ctx.computed_attrs_permitted and has_dyn_config_attrs(target_attrs):
185+
if ctx.permit_eval and has_dyn_config_attrs(target_attrs):
186186
target_store = ctx.target_dir.fs.get_mapper(root=ctx.target_dir.path)
187187
resolve_target_attrs(target_store, target_ds, target_attrs)
188188

@@ -197,7 +197,7 @@ def post_update_target(ctx: Context, target_store: RollbackStore, slice_ds: xr.D
197197
slice_ds: The current slice dataset that has already been appended.
198198
"""
199199
target_attrs = slice_ds.attrs
200-
if ctx.computed_attrs_permitted and has_dyn_config_attrs(target_attrs):
200+
if ctx.permit_eval and has_dyn_config_attrs(target_attrs):
201201
with xr.open_zarr(target_store) as target_ds:
202202
resolve_target_attrs(target_store, target_ds, target_attrs)
203203

zappend/tailoring.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,28 +14,33 @@
1414
from .metadata import DatasetMetadata
1515

1616

17+
# TODO: use ctx as only argument
1718
def tailor_target_dataset(
1819
dataset: xr.Dataset, target_metadata: DatasetMetadata
1920
) -> xr.Dataset:
2021
dataset = _strip_dataset(dataset, target_metadata)
2122
dataset = _complete_dataset(dataset, target_metadata)
2223

23-
# Complement dataset attributes and set
24-
# variable encoding and attributes
24+
# TODO: use ctx.attrs_update_mode to set initial dataset attributes
25+
26+
# Set initial dataset attributes
2527
dataset.attrs = target_metadata.attrs
28+
29+
# Set variable encoding and attributes
2630
for var_name, var_metadata in target_metadata.variables.items():
2731
variable = dataset.variables[var_name]
2832
variable.encoding = var_metadata.encoding.to_dict()
2933
variable.attrs = var_metadata.attrs
3034
return dataset
3135

3236

37+
# TODO: use ctx and slice_ds as only arguments
3338
def tailor_slice_dataset(
3439
slice_ds: xr.Dataset,
3540
target_metadata: DatasetMetadata,
3641
append_dim: str = DEFAULT_APPEND_DIM,
3742
attrs_update_mode: (
38-
Literal["keep"] | Literal["replace"] | Literal["update"]
43+
Literal["keep"] | Literal["replace"] | Literal["update"] | Literal["ignore"]
3944
) = DEFAULT_ATTRS_UPDATE_MODE,
4045
attrs: dict[str, Any] | None = None,
4146
) -> xr.Dataset:
@@ -62,6 +67,9 @@ def tailor_slice_dataset(
6267
elif attrs_update_mode == "update":
6368
# Update from last slice dataset
6469
slice_ds.attrs = target_metadata.attrs | slice_ds.attrs
70+
elif attrs_update_mode == "ignore":
71+
# Ignore attributes from slice dataset
72+
slice_ds.attrs = {}
6573
if attrs:
6674
# Always update by configured attributes
6775
slice_ds.attrs.update(attrs)

0 commit comments

Comments
 (0)