Skip to content

Commit 4a79e95

Browse files
authored
Merge pull request #30 from bcdev/forman-xcube_mldatasets
Support xcube multi-level datasets
2 parents 9c53a01 + 0955eda commit 4a79e95

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+1439
-380
lines changed

CHANGES.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,40 @@
22

33
## Version 0.4.0 (in development)
44

5-
5+
- New xcube multi-level dataset rules:
6+
- `ml-dataset-meta`: verifies that a meta info file exists and is consistent
7+
- `ml-dataset-xy`: verifies that the levels have expected spatial resolutions
8+
- `ml-dataset-time`: verifies that the levels have expected time dimension, if any
9+
- Now supporting xcube multi-level datasets `*.levels`:
10+
- Added xcube plugin processor `"xcube/multi-level-dataset"` that is used
11+
inside the predefined xcube configurations "all" and "recommended".
12+
- Directories that are recognized by file patterns associated with a non-empty
13+
configuration object are no longer recursively
14+
traversed.
15+
- Introduced method `Plugin.define_config` which defines a named plugin
16+
configuration. It takes a name and a configuration object or list of
17+
configuration objects.
18+
- Changed the way how configuration is defined and exported from
19+
Python configuration files:
20+
- Renamed function that exports configuration from `export_configs`
21+
into `export_config`.
22+
- The returned value should be a list of values that can be
23+
converted into configuration objects: mixed `Config` instances,
24+
dictionary, or a name that refers to a named configuration of a plugin.
25+
- Node path names now contain the dataset index if a file path
26+
has been opened by a processor produced multiple
27+
datasets to validate.
28+
29+
- Other changes:
30+
- Changed type of `Plugin.configs` from `dict[str, Config]` to
31+
`dict[str, list[Config]]`.
32+
- Inbuilt plugin rules now import their `plugin` instance from
33+
`xrlint.plugins.<plugin>.plugin` module.
34+
- `JsonSerializable` now recognizes `dataclass` instances and no longer
35+
serializes property values that are also default values.
36+
- Pinned zarr dependency to be >=2.18, <3 until test
37+
`tests.plugins.xcube.processors.test_mldataset.MultiLevelDatasetProcessorTest`
38+
is adjusted or fsspec's memory filesystem is updated.
639

740
## Version 0.3.0 (from 2025-01-20)
841

docs/config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Same using JSON:
4343
And as Python script:
4444

4545
```python
46-
def export_configs():
46+
def export_config():
4747
return [
4848
{"files": ["**/*.zarr", "**/*.nc"]},
4949
{

docs/rule-ref.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,27 @@ Latitude and longitude coordinates and dimensions should be called 'lat' and 'lo
114114

115115
Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:
116116

117+
### :material-lightbulb: `ml-dataset-meta`
118+
119+
Multi-level datasets should provide '.zlevels' meta information file and if so, it should be consistent.
120+
[:material-information-variant:](https://xcube.readthedocs.io/en/latest/mldatasets.html#the-xcube-levels-format)
121+
122+
Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:
123+
124+
### :material-bug: `ml-dataset-time`
125+
126+
The `time` dimension of multi-level datasets should use a chunk size of 1. This allows for faster image tile generation for visualisation.
127+
[:material-information-variant:](https://xcube.readthedocs.io/en/latest/mldatasets.html#definition)
128+
129+
Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:
130+
131+
### :material-bug: `ml-dataset-xy`
132+
133+
Multi-level dataset levels should provide spatial resolutions decreasing by powers of two.
134+
[:material-information-variant:](https://xcube.readthedocs.io/en/latest/mldatasets.html#definition)
135+
136+
Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:
137+
117138
### :material-bug: `single-grid-mapping`
118139

119140
A single grid mapping shall be used for all spatial data variables of a datacube.

docs/todo.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,6 @@
1414
## Desired
1515

1616
- project logo
17-
- support validating xcube 'levels' format. Options:
18-
- implement xarray backend so we can open them using `xr.open_dataset`
19-
with `opener_options: {"engine": "xc-levels"}`.
20-
- implement a `xrlint.processor.Processor` for that purpose.
2117
- add some more tests so we reach 99% coverage
2218
- support rule op args/kwargs schema validation
2319
- Support `RuleTest.expected`, it is currently unused

environment.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@ dependencies:
2020
- requests-mock
2121
- ruff
2222
# Testing Datasets
23+
- dask
2324
- pandas
2425
- netcdf4
2526
- numpy
26-
- zarr
27+
- zarr >=2.18,<3 # tests fail with zarr 3+

examples/plugin_config.py

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -3,27 +3,11 @@
33
using the `Plugin` class and its `define_rule()` decorator method.
44
"""
55

6-
from xrlint.config import Config
76
from xrlint.node import DatasetNode
87
from xrlint.plugin import new_plugin
98
from xrlint.rule import RuleContext, RuleOp
109

11-
plugin = new_plugin(
12-
name="hello-plugin",
13-
version="1.0.0",
14-
configs={
15-
# "configs" entries must be `Config` objects!
16-
"recommended": Config.from_value(
17-
{
18-
"rules": {
19-
"hello/good-title": "warn",
20-
# Configure more rules here...
21-
},
22-
}
23-
),
24-
# Add more configurations here...
25-
},
26-
)
10+
plugin = new_plugin(name="hello-plugin", version="1.0.0")
2711

2812

2913
@plugin.define_rule("good-title")
@@ -42,7 +26,22 @@ def dataset(self, ctx: RuleContext, node: DatasetNode):
4226
# Define more rules here...
4327

4428

45-
def export_configs():
29+
plugin.define_config(
30+
"recommended",
31+
[
32+
{
33+
"rules": {
34+
"hello/good-title": "warn",
35+
# Configure more rules here...
36+
},
37+
}
38+
],
39+
)
40+
41+
# Add more configurations here...
42+
43+
44+
def export_config():
4645
return [
4746
# Use "hello" plugin
4847
{

examples/virtual_plugin_config.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ def dataset(self, ctx: RuleContext, node: DatasetNode):
2222
# Define more rules here...
2323

2424

25-
def export_configs():
25+
def export_config():
2626
return [
2727
# Define and use "hello" plugin
2828
{
@@ -37,12 +37,14 @@ def export_configs():
3737
# Add more rules here...
3838
},
3939
"configs": {
40-
"recommended": {
41-
"rules": {
42-
"hello/good-title": "warn",
43-
# Configure more rules here...
44-
},
45-
},
40+
"recommended": [
41+
{
42+
"rules": {
43+
"hello/good-title": "warn",
44+
# Configure more rules here...
45+
},
46+
}
47+
],
4648
# Add more configurations here...
4749
},
4850
},

mkruleref.py

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
from xrlint.plugin import Plugin
2+
from xrlint.rule import RuleConfig
23

34
# for icons, see
45
# https://squidfunk.github.io/mkdocs-material/reference/icons-emojis/
@@ -39,7 +40,7 @@ def write_rule_ref_page():
3940

4041

4142
def write_plugin_rules(stream, plugin: Plugin):
42-
configs = plugin.configs
43+
config_rules = get_plugin_rule_configs(plugin)
4344
for rule_id in sorted(plugin.rules.keys()):
4445
rule_meta = plugin.rules[rule_id].meta
4546
stream.write(
@@ -51,9 +52,8 @@ def write_plugin_rules(stream, plugin: Plugin):
5152
stream.write("\n\n")
5253
# List the predefined configurations that contain the rule
5354
stream.write("Contained in: ")
54-
for config_id in sorted(configs.keys()):
55-
config = configs[config_id]
56-
rule_configs = config.rules or {}
55+
for config_id in sorted(config_rules.keys()):
56+
rule_configs = config_rules[config_id]
5757
rule_config = rule_configs.get(rule_id) or rule_configs.get(
5858
f"{plugin.meta.name}/{rule_id}"
5959
)
@@ -62,5 +62,21 @@ def write_plugin_rules(stream, plugin: Plugin):
6262
stream.write("\n\n")
6363

6464

65+
def get_plugin_rule_configs(plugin):
66+
configs = plugin.configs
67+
config_rules: dict[str, dict[str, RuleConfig]] = {}
68+
for config_name, config_list in configs.items():
69+
# note, here we assume most plugins configure their rules
70+
# in one dedicated config object only. However, this is not
71+
# the general case as file patterns may be used to make the
72+
# rules configurations specific.
73+
rule_configs = {}
74+
for config in config_list:
75+
if config.rules:
76+
rule_configs.update(config.rules)
77+
config_rules[config_name] = rule_configs
78+
return config_rules
79+
80+
6581
if __name__ == "__main__":
6682
write_rule_ref_page()

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,10 +73,11 @@ dev = [
7373
"ruff",
7474
"twine",
7575
# Dataset testing
76+
"dask",
7677
"netcdf4",
7778
"numpy",
7879
"pandas",
79-
"zarr",
80+
"zarr >=2.18,<3",
8081
]
8182
doc = [
8283
"mkdocs",

tests/_linter/test_rulectx.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,15 @@ class RuleContextImplTest(TestCase):
1212
def test_defaults(self):
1313
config = Config()
1414
dataset = xr.Dataset()
15-
context = RuleContextImpl(config, dataset, "./ds.zarr")
15+
context = RuleContextImpl(config, dataset, "./ds.zarr", None)
1616
self.assertIs(config, context.config)
1717
self.assertIs(dataset, context.dataset)
1818
self.assertEqual({}, context.settings)
1919
self.assertEqual("./ds.zarr", context.file_path)
20+
self.assertEqual(None, context.file_index)
2021

2122
def test_report(self):
22-
context = RuleContextImpl(Config(), xr.Dataset(), "./ds.zarr")
23+
context = RuleContextImpl(Config(), xr.Dataset(), "./ds.zarr", None)
2324
with context.use_state(rule_id="no-xxx"):
2425
context.report(
2526
"What the heck do you mean?",

0 commit comments

Comments
 (0)