Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

## Version 0.4.1 (in development)

### Changes

- Added core rule `conventions` that checks for the `Conventions`attribute.
- Added core rule `context-descr` that checks content description
- Added core rule `var-descr` that checks data variable description
- Renamed rules for consistency:
- `var-units-attrs` and `var-units`
- `flags` into `var-flags`

### Fixes

- Fixed an issue that prevented recursively traversing folders referred
to by URLs (such as `s3://<bucket>/<path>/`) rather than local directory
paths. (#39)
Expand Down
47 changes: 38 additions & 9 deletions docs/rule-ref.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,27 @@ New rules will be added by upcoming XRLint releases.

## Core Rules

### :material-lightbulb: `content-desc`

A dataset should provide information about where the data came from and what has been done to it. This information is mainly for the benefit of human readers. The rule accepts the following configuration parameters:

- `globals`: list of names of required global attributes. Defaults to `['title', 'history']`.
- `commons`: list of names of required variable attributes that can also be defined globally. Defaults to `['institution', 'source', 'references', 'comment']`.
- `no_vars`: do not check variables at all. Defaults to `False`.
- `ignored_vars`: list of ignored variables (regex patterns). Defaults to `['crs', 'spatial_ref']`.

[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#description-of-file-contents)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:

### :material-lightbulb: `conventions`

Datasets should identify the applicable conventions using the `Conventions` attribute.
The rule has an optional configuration parameter `match` which is a regex pattern that the value of the `Conventions` attribute must match, if any. If not provided, the rule just verifies that the attribute exists and whether it is a character string.
[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#identification-of-conventions)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:

### :material-bug: `coords-for-dims`

Dimensions of data variables should have corresponding coordinates.
Expand All @@ -17,13 +38,6 @@ Datasets should be given a non-empty title.

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:

### :material-lightbulb: `flags`

Validate attributes 'flag_values', 'flag_masks' and 'flag_meanings' that make variables that contain flag values self describing.
[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#flags)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:

### :material-bug: `grid-mappings`

Grid mappings, if any, shall have valid grid mapping coordinate variables.
Expand Down Expand Up @@ -64,9 +78,24 @@ Time coordinates should have valid and unambiguous time units encoding.

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:

### :material-lightbulb: `var-units-attr`
### :material-lightbulb: `var-desc`

Check that each data variable provides an identification and description of the content. The rule can be configured by parameter `attrs` which is a list of names of attributes that provides descriptive information. It defaults to `['standard_name', 'long_name']`.
[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#standard-name)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:

### :material-lightbulb: `var-flags`

Validate attributes 'flag_values', 'flag_masks' and 'flag_meanings' that make variables that contain flag values self describing.
[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#flags)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-lightning-bolt:

### :material-lightbulb: `var-units`

Every variable should have a valid 'units' attribute.
Every variable should provide a description of its units.
[:material-information-variant:](https://cfconventions.org/cf-conventions/cf-conventions.html#units)

Contained in: `all`-:material-lightning-bolt: `recommended`-:material-alert:

Expand Down
7 changes: 7 additions & 0 deletions docs/todo.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
- use mkdocstrings ref syntax in docstrings
- provide configuration examples (use as tests?)
- add `docs_url` to all existing rules
- API changes for v0.5:
- clarify when users can pass configuration objects like values
and when configuration like values
- config class naming is confusing,
change `Config` -> `ConfigObject`, `ConfigList` -> `Config`
- Change `verify` -> `validate`,
prefix `RuleOp` methods by `validate_` for clarity.

## Desired

Expand Down
98 changes: 98 additions & 0 deletions tests/plugins/core/rules/test_content_desc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
import xarray as xr

from xrlint.plugins.core.rules.content_desc import ContentDesc
from xrlint.testing import RuleTest, RuleTester

global_attrs = dict(
title="OC-Climatology",
history="2025-01-26: created",
)

common_attrs = dict(
institution="ESA",
source="a.nc; b.nc",
references="!",
comment="?",
)

all_attrs = global_attrs | common_attrs

time_coord = xr.DataArray(
[1, 2, 3], dims="time", attrs=dict(units="days since 2025-01-01")
)

valid_dataset_0 = xr.Dataset(
attrs=all_attrs,
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict())),
coords=dict(time=time_coord),
)
valid_dataset_1 = xr.Dataset(
attrs=global_attrs,
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=common_attrs)),
coords=dict(time=time_coord),
)
valid_dataset_1a = xr.Dataset(
attrs=global_attrs,
data_vars=dict(
chl=xr.DataArray([1, 2, 3], dims="time", attrs=common_attrs),
crs=xr.DataArray(0, attrs=dict(grid_mapping_name="...")),
),
coords=dict(time=time_coord),
)
valid_dataset_1b = xr.Dataset(
attrs=global_attrs,
data_vars=dict(
chl=xr.DataArray([1, 2, 3], dims="time", attrs=common_attrs),
chl_unc=xr.DataArray(0, attrs=dict(units="...")),
),
coords=dict(time=time_coord),
)
valid_dataset_2 = xr.Dataset(
attrs=global_attrs,
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict())),
coords=dict(time=time_coord),
)
valid_dataset_3 = xr.Dataset(
attrs=global_attrs,
data_vars=dict(
chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict(description="Bla!"))
),
coords=dict(time=time_coord),
)

invalid_dataset_0 = xr.Dataset()
invalid_dataset_1 = xr.Dataset(
attrs=dict(),
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict())),
coords=dict(time=time_coord),
)
invalid_dataset_2 = xr.Dataset(
attrs=global_attrs,
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict())),
coords=dict(time=time_coord),
)

ContentDescTest = RuleTester.define_test(
"content-desc",
ContentDesc,
valid=[
RuleTest(dataset=valid_dataset_0, name="0"),
RuleTest(dataset=valid_dataset_1, name="1"),
RuleTest(dataset=valid_dataset_1a, name="1a"),
RuleTest(
dataset=valid_dataset_1b, name="1b", kwargs={"ignored_vars": ["chl_unc"]}
),
RuleTest(dataset=valid_dataset_2, name="2", kwargs={"commons": []}),
RuleTest(
dataset=valid_dataset_2, name="2", kwargs={"commons": [], "skip_vars": True}
),
RuleTest(
dataset=valid_dataset_3, name="3", kwargs={"commons": ["description"]}
),
],
invalid=[
RuleTest(dataset=invalid_dataset_0, expected=2),
RuleTest(dataset=invalid_dataset_1, expected=6),
RuleTest(dataset=invalid_dataset_2, kwargs={"skip_vars": True}, expected=4),
],
)
37 changes: 37 additions & 0 deletions tests/plugins/core/rules/test_conventions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import xarray as xr

from xrlint.plugins.core.rules.conventions import Conventions
from xrlint.testing import RuleTest, RuleTester

valid_dataset_0 = xr.Dataset(attrs=dict(Conventions="CF-1.10"))

invalid_dataset_0 = xr.Dataset()
invalid_dataset_1 = xr.Dataset(attrs=dict(Conventions=1.12))
invalid_dataset_2 = xr.Dataset(attrs=dict(Conventions="CF 1.10"))


ConventionsTest = RuleTester.define_test(
"conventions",
Conventions,
valid=[
RuleTest(dataset=valid_dataset_0),
RuleTest(dataset=valid_dataset_0, kwargs={"match": r"CF-.*"}),
],
invalid=[
RuleTest(
dataset=invalid_dataset_0,
expected=["Missing attribute 'Conventions'."],
),
RuleTest(
dataset=invalid_dataset_1,
expected=["Invalid attribute 'Conventions': 1.12."],
),
RuleTest(
dataset=invalid_dataset_2,
kwargs={"match": r"CF-.*"},
expected=[
"Invalid attribute 'Conventions': 'CF 1.10' doesn't match 'CF-.*'."
],
),
],
)
92 changes: 92 additions & 0 deletions tests/plugins/core/rules/test_var_desc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
import xarray as xr

from xrlint.plugins.core.rules.var_desc import VarDesc
from xrlint.testing import RuleTest, RuleTester

pressure_attrs = dict(
long_name="mean sea level pressure",
units="hPa",
standard_name="air_pressure_at_sea_level",
)

time_coord = xr.DataArray(
[1, 2, 3], dims="time", attrs=dict(units="days since 2025-01-01")
)

valid_dataset_0 = xr.Dataset(
coords=dict(time=time_coord),
)
valid_dataset_1 = xr.Dataset(
data_vars=dict(pressure=xr.DataArray([1, 2, 3], dims="time", attrs=pressure_attrs)),
coords=dict(time=time_coord),
)
valid_dataset_2 = xr.Dataset(
data_vars=dict(
chl=xr.DataArray(
[1, 2, 3], dims="time", attrs=dict(description="It is air pressure")
)
),
coords=dict(time=time_coord),
)

invalid_dataset_0 = xr.Dataset(
attrs=dict(),
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=dict())),
coords=dict(time=time_coord),
)

invalid_dataset_1 = xr.Dataset(
attrs=dict(),
data_vars=dict(
chl=xr.DataArray(
[1, 2, 3],
dims="time",
attrs=dict(standard_name="air_pressure_at_sea_level"),
)
),
coords=dict(time=time_coord),
)
invalid_dataset_2 = xr.Dataset(
attrs=dict(),
data_vars=dict(
chl=xr.DataArray(
[1, 2, 3], dims="time", attrs=dict(long_name="mean sea level pressure")
)
),
coords=dict(time=time_coord),
)
invalid_dataset_3 = xr.Dataset(
attrs=dict(),
data_vars=dict(chl=xr.DataArray([1, 2, 3], dims="time", attrs=pressure_attrs)),
coords=dict(time=time_coord),
)

VarDescTest = RuleTester.define_test(
"var-desc",
VarDesc,
valid=[
RuleTest(dataset=valid_dataset_0),
RuleTest(dataset=valid_dataset_1),
RuleTest(dataset=valid_dataset_2, kwargs={"attrs": ["description"]}),
],
invalid=[
RuleTest(
dataset=invalid_dataset_0,
expected=[
"Missing attribute 'standard_name'.",
"Missing attribute 'long_name'.",
],
),
RuleTest(
dataset=invalid_dataset_1, expected=["Missing attribute 'long_name'."]
),
RuleTest(
dataset=invalid_dataset_2, expected=["Missing attribute 'standard_name'."]
),
RuleTest(
dataset=invalid_dataset_3,
kwargs={"attrs": ["description"]},
expected=["Missing attribute 'description'."],
),
],
)
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import numpy as np
import xarray as xr

from xrlint.plugins.core.rules.flags import Flags
from xrlint.plugins.core.rules.var_flags import VarFlags
from xrlint.testing import RuleTest, RuleTester

valid_dataset_0 = xr.Dataset()
Expand Down Expand Up @@ -73,9 +73,9 @@
np.float64
)

FlagsTest = RuleTester.define_test(
"flags",
Flags,
VarFlagsTest = RuleTester.define_test(
"var-flags",
VarFlags,
valid=[
RuleTest(dataset=valid_dataset_0),
RuleTest(dataset=valid_dataset_1),
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import xarray as xr

from xrlint.plugins.core.rules.var_units_attr import VarUnitsAttr
from xrlint.plugins.core.rules.var_units import VarUnits
from xrlint.testing import RuleTest, RuleTester

valid_dataset_1 = xr.Dataset()
Expand All @@ -19,9 +19,9 @@
invalid_dataset_3.v.attrs = {"units": 1}


VarUnitsAttrTest = RuleTester.define_test(
"var-units-attr",
VarUnitsAttr,
VarUnitsTest = RuleTester.define_test(
"var-units",
VarUnits,
valid=[
RuleTest(dataset=valid_dataset_1),
RuleTest(dataset=valid_dataset_2),
Expand Down
7 changes: 5 additions & 2 deletions tests/plugins/core/test_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,19 @@ def test_rules_complete(self):
plugin = export_plugin()
self.assertEqual(
{
"content-desc",
"conventions",
"coords-for-dims",
"dataset-title-attr",
"flags",
"grid-mappings",
"lat-coordinate",
"lon-coordinate",
"no-empty-attrs",
"time-coordinate",
"no-empty-chunks",
"var-units-attr",
"var-desc",
"var-flags",
"var-units",
},
set(plugin.rules.keys()),
)
Expand Down
Loading