Skip to content

Commit 28225e8

Browse files
authored
Resolve "Validation of open enumerations does not search for @Custom attribute" (#669)
* search for custom attribute with open enum * fix tests * use feature branch from pynxtools-xps * reorganize error messages * support custom attribute in validate_nexus * use pynxtools-xps main branch again * differntiate between HDF5 and template validation in the case of missing custom attribute * simplify logic for testing ignored sections * function docstrings * add docs for ignore_sections in test framework
1 parent 3dd00ce commit 28225e8

File tree

7 files changed

+421
-83
lines changed

7 files changed

+421
-83
lines changed

docs/how-tos/pynxtools/using-pynxtools-test-framework.md

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -63,17 +63,48 @@ def test_foo_reader(nxdl, reader_name, files_or_dir, tmp_path, caplog):
6363
# of the log files of the reference -nxs file and the one created in the test.
6464
```
6565

66-
Alongside the test data in `tests/data`, it is also possible to add other types of test data inside the test directory of the plugin.
66+
The `ReaderTest.convert_to_nexus` method tries to convert all files in the `files_or_dir` directory to a NeXus file that is compliant with the application definition (`nxdl`), using a specific pynxtools reader (`reader_name`). In this example, the `foo` reader is used to convert to files following the `NXfoo` application definition.
6767

68-
You can also pass additional parameters to `test.convert_to_nexus`:
68+
There are some possibilities to configure this test for your specific plugin:
6969

70-
- `caplog_level` (str): This parameter determines the level at which the caplog is set during testing. This can be either "ERROR" (by default) or "WARNING". If it is "WARNING", the test will also fail if any warnings are reported by the reader.
70+
- You can configure the test data that is used. Typically, this data should be located in `tests/data`, but it is also possible to use other data inside or even outside the test directory of the plugin.
71+
- You can also pass additional parameters to `test.convert_to_nexus`:
72+
- `caplog_level` (str): This parameter determines the level at which the caplog is set during testing. This can be either "ERROR" (by default) or "WARNING". If it is "WARNING", the test will also fail if any warnings are reported by the reader.
7173

72-
- `ignore_undocumented` (boolean): If true, the test skips the verification of undocumented keys. Otherwise, a warning message for undocumented keys is logged.
74+
- `ignore_undocumented` (boolean): If true, the test skips the verification of undocumented keys. Otherwise, a warning message for undocumented keys is logged.
75+
76+
Afterwards, the `ReaderTest.convert_to_nexus` method uses the NeXus annotator tool [`read_nexus`](../../learn/pynxtools/nexus-validation.md#read_nexus-nexus-file-reader-and-debugger) (which is part of `pynxtools`) to create log files both of the reference NeXus file located in `files_or_dir` as well as the freshly created NeXus files. These log files are compared line-by-line to check that the created NeXus file is indeed the same as the reference file.
77+
78+
This test can also be configured:
79+
80+
- You can pass a keyword argument `ignore_lines` to `check_reproducibility_of_nexus`. `ignore_lines` is expected to be a list of lines for which the comparison shall be skipped. Specifically, any line that starts with any of the strings in `ignore_lines` is ignored.
81+
- In adddition, you can disable the comparison for a given line for a NeXus concept in the `read_nexus` output using the `ignore_sections` keyword. As an example, a typical section for a NeXus field in the output looks like this:
82+
83+
```
84+
DEBUG:
85+
===== FIELD (//entry/start_time): <HDF5 dataset "start_time": shape (), type "|O">
86+
DEBUG: ===== FIELD (//entry/start_time): <HDF5 dataset "start_time": shape (), type "|O">
87+
value: 2018-05-01T07:22:00+02:00
88+
DEBUG: value: 2018-05-01T07:22:00+02:00
89+
classpath: ['NXentry', 'NX_DATE_TIME']
90+
DEBUG: classpath: ['NXentry', 'NX_DATE_TIME']
91+
classes:
92+
NXarpes.nxdl.xml:/ENTRY/start_time
93+
NXentry.nxdl.xml:/start_time
94+
DEBUG: classes:
95+
NXarpes.nxdl.xml:/ENTRY/start_time
96+
NXentry.nxdl.xml:/start_time
97+
<<REQUIRED>>
98+
DEBUG: <<REQUIRED>>
99+
documentation (NXarpes.nxdl.xml:/ENTRY/start_time):
100+
DEBUG: documentation (NXarpes.nxdl.xml:/ENTRY/start_time):
101+
```
102+
103+
If you do want to disable the comparison for the value of `entry/start_time`, you can pass a dictionary to `ignore_sections`. In this example, the dictionary `{"FIELD (//entry/start_time)": ["value:"]}` would disable the comparison of the `value` line. Any other line in this section can be disabled by adding more strings to the list (e.g. `DEBUG - value:`), whereas additional sections can be ignored by adding to the `ignore_sections` dictionary.
73104
74105
## How to write an integration test for a NOMAD example in a reader plugin
75106
76-
It is also possible to ship NOMAD Example Uploads directly with the reader plugin. As an example, `pynxtools-mpes` comes with its own NOMAD example (see [here](https://github.com/FAIRmat-NFDI/pynxtools-mpes/tree/bring-in-examples/src/pynxtools_mpes/nomad)) using the `ExampleUploadEntryPoint` of NOMAD (see [here](https://nomad-lab.eu/prod/v1/staging/docs/howto/plugins/example_uploads.html) for more documentation).
107+
It is also possible to ship NOMAD Example Uploads directly with the reader plugin. As an example, `pynxtools-mpes` comes with its own [NOMAD examplehere](https://github.com/FAIRmat-NFDI/pynxtools-mpes/tree/bring-in-examples/src/pynxtools_mpes/nomad) using the [`ExampleUploadEntryPoint`](https://nomad-lab.eu/prod/v1/staging/docs/howto/plugins/example_uploads.html) of NOMAD.
77108
78109
The `testing` sub-package of `pynxtools` provides two functionalities for testing the `ExampleUploadEntryPoint` defined in a `pynxtools` plugin:
79110

src/pynxtools/data/NXtest.nxdl.xml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,12 @@
9999
<item value="1st type open"/>
100100
<item value="2nd type open"/>
101101
</enumeration>
102+
<attribute name="attribute_with_open_enum" optional="true">
103+
<enumeration open="true">
104+
<item value="1st option"/>
105+
<item value="2nd option"/>
106+
</enumeration>
107+
</attribute>
102108
</field>
103109
<attribute name="group_attribute">
104110
</attribute>

src/pynxtools/dataconverter/helpers.py

Lines changed: 136 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
import logging
2222
import os
2323
import re
24-
from collections.abc import Mapping, Sequence
24+
from collections.abc import Mapping, MutableMapping, Sequence
2525
from datetime import datetime, timezone
2626
from enum import Enum, auto
2727
from functools import cache, lru_cache
@@ -87,7 +87,9 @@ class ValidationProblem(Enum):
8787
UnitWithoutDocumentation = auto()
8888
InvalidUnit = auto()
8989
InvalidEnum = auto()
90-
OpenEnumWithNewItem = auto()
90+
OpenEnumWithCustom = auto()
91+
OpenEnumWithCustomFalse = auto()
92+
OpenEnumWithMissingCustom = auto()
9193
MissingRequiredGroup = auto()
9294
MissingRequiredField = auto()
9395
MissingRequiredAttribute = auto()
@@ -152,12 +154,27 @@ def _log(self, path: str, log_type: ValidationProblem, value: Optional[Any], *ar
152154

153155
elif log_type == ValidationProblem.InvalidEnum:
154156
logger.warning(
155-
f"The value at {path} should be one of the following: {value}."
157+
f"The value '{args[0]}' at {path} should be one of the following: {value}."
156158
)
157-
elif log_type == ValidationProblem.OpenEnumWithNewItem:
159+
elif log_type == ValidationProblem.OpenEnumWithCustom:
158160
logger.info(
159-
f"The value at {path} does not match with the enumerated items from the open enumeration: {value}."
161+
f"The value '{args[0]}' at {path} does not match with the enumerated items from the open enumeration: {value}."
160162
)
163+
elif log_type == ValidationProblem.OpenEnumWithCustomFalse:
164+
logger.warning(
165+
f"The value '{args[0]}' at {path} does not match with the enumerated items from the open enumeration: {value}. "
166+
"When a different value is used, the boolean 'custom' attribute cannot be False."
167+
)
168+
elif log_type == ValidationProblem.OpenEnumWithMissingCustom:
169+
log_text = (
170+
f"The value '{args[0]}' at {path} does not match with the enumerated items from the open enumeration: {value}. "
171+
"When a different value is used, a boolean 'custom=True' attribute must be added."
172+
)
173+
if args[1] is True:
174+
log_text += " It was added here automatically."
175+
logger.info(log_text)
176+
else:
177+
logger.warning(log_text)
161178
elif log_type == ValidationProblem.MissingRequiredGroup:
162179
logger.warning(f"The required group {path} hasn't been supplied.")
163180
elif log_type == ValidationProblem.MissingRequiredField:
@@ -287,9 +304,10 @@ def collect_and_log(
287304
# info messages should not fail validation
288305
if log_type in (
289306
ValidationProblem.UnitWithoutDocumentation,
290-
ValidationProblem.OpenEnumWithNewItem,
291307
ValidationProblem.CompressionStrengthZero,
292308
ValidationProblem.MissingNXclass,
309+
ValidationProblem.OpenEnumWithCustom,
310+
ValidationProblem.OpenEnumWithMissingCustom,
293311
):
294312
if self.logging and message not in self.data["info"]:
295313
self._log(path, log_type, value, *args, **kwargs)
@@ -804,15 +822,11 @@ def convert_int_to_float(value):
804822
return value
805823

806824

807-
def is_valid_data_field(
808-
value: Any, nxdl_type: str, nxdl_enum: list, nxdl_enum_open: bool, path: str
809-
) -> Any:
825+
def is_valid_data_field(value: Any, nxdl_type: str, path: str) -> Any:
810826
"""Checks whether a given value is valid according to the type defined in the NXDL."""
811827

812-
def validate_data_value(
813-
value: Any, nxdl_type: str, nxdl_enum: list, nxdl_enum_open: bool, path: str
814-
) -> Any:
815-
"""Validate and possibly convert a primitive value according to NXDL type/enum rules."""
828+
def validate_data_value(value: Any, nxdl_type: str, path: str) -> Any:
829+
"""Validate and possibly convert a primitive value according to NXDL type rules."""
816830
accepted_types = NEXUS_TO_PYTHON_DATA_TYPES[nxdl_type]
817831
original_value = value
818832

@@ -843,26 +857,6 @@ def validate_data_value(
843857
path, ValidationProblem.InvalidDatetime, value
844858
)
845859

846-
if nxdl_enum is not None:
847-
if (
848-
isinstance(value, np.ndarray)
849-
and isinstance(nxdl_enum, list)
850-
and isinstance(nxdl_enum[0], list)
851-
):
852-
enum_value = list(value)
853-
else:
854-
enum_value = value
855-
856-
if enum_value not in nxdl_enum:
857-
if nxdl_enum_open:
858-
collector.collect_and_log(
859-
path, ValidationProblem.OpenEnumWithNewItem, nxdl_enum
860-
)
861-
else:
862-
collector.collect_and_log(
863-
path, ValidationProblem.InvalidEnum, nxdl_enum
864-
)
865-
866860
return value
867861

868862
if isinstance(value, dict) and set(value.keys()) == {"compress", "strength"}:
@@ -878,18 +872,120 @@ def validate_data_value(
878872
path, ValidationProblem.InvalidCompressionStrength, value
879873
)
880874
# In this case, we remove the compression.
881-
return validate_data_value(
882-
value["compress"], nxdl_type, nxdl_enum, nxdl_enum_open, path
883-
)
875+
return validate_data_value(value["compress"], nxdl_type, path)
884876

885877
# Apply standard validation to compressed value
886-
value["compress"] = validate_data_value(
887-
compressed_value, nxdl_type, nxdl_enum, nxdl_enum_open, path
888-
)
878+
value["compress"] = validate_data_value(compressed_value, nxdl_type, path)
889879

890880
return value
891881

892-
return validate_data_value(value, nxdl_type, nxdl_enum, nxdl_enum_open, path)
882+
return validate_data_value(value, nxdl_type, path)
883+
884+
885+
def get_custom_attr_path(path: str) -> str:
886+
"""
887+
Generate the path for the 'custom' attribute for open enumerations for a
888+
given path.
889+
890+
If a NeXus concept has an open enumeration and a different value than the suggested ones are used,
891+
892+
- for fields, an attribute @custom=True.
893+
- for attributes, an additional attribute @my_attribute_custom=True (where my_attribute is the name
894+
of the attribute with the open enumeration)
895+
896+
shall be added to the file. This function creates the path for this custom attribute.
897+
898+
Args:
899+
path (str): The original path string.
900+
901+
Returns:
902+
str: The modified path string representing the custom attribute path.
903+
"""
904+
if path.split("/")[-1].startswith("@"):
905+
attr_name = path.split("/")[-1][1:] # remove "@"
906+
return f"{path}_custom"
907+
return f"{path}/@custom"
908+
909+
910+
def is_valid_enum(
911+
value: Any,
912+
nxdl_enum: list,
913+
nxdl_enum_open: bool,
914+
path: str,
915+
mapping: MutableMapping,
916+
):
917+
"""Validate a value against an NXDL enumeration and handle custom attributes.
918+
919+
This function checks whether a given value conforms to the specified NXDL
920+
enumeration. If the enumeration is open (`nxdl_enum_open`), it may create or
921+
check a corresponding custom attribute in the `mapping`.
922+
923+
Args:
924+
value (Any): The value to validate.
925+
nxdl_enum (list): The NXDL enumeration to validate against.
926+
nxdl_enum_open (bool): Whether the enumeration is open to custom values.
927+
path (str): The path of the value in the dataset.
928+
mapping (MutableMapping): The object (dict or HDF5 group) holding custom attributes.
929+
930+
"""
931+
932+
if isinstance(value, dict) and set(value.keys()) == {"compress", "strength"}:
933+
value = value["compress"]
934+
935+
if nxdl_enum is not None:
936+
if (
937+
isinstance(value, np.ndarray)
938+
and isinstance(nxdl_enum, list)
939+
and isinstance(nxdl_enum[0], list)
940+
):
941+
enum_value = list(value)
942+
else:
943+
enum_value = value
944+
945+
if enum_value not in nxdl_enum:
946+
if nxdl_enum_open:
947+
custom_path = get_custom_attr_path(path)
948+
949+
if isinstance(mapping, h5py.Group):
950+
parent_path, attr_name = custom_path.rsplit("@", 1)
951+
custom_attr = mapping.get(parent_path).attrs.get(attr_name)
952+
custom_added_auto = False
953+
else:
954+
custom_attr = mapping.get(custom_path)
955+
custom_added_auto = True
956+
957+
if custom_attr == True: # noqa: E712
958+
collector.collect_and_log(
959+
path,
960+
ValidationProblem.OpenEnumWithCustom,
961+
nxdl_enum,
962+
value,
963+
)
964+
elif custom_attr == False: # noqa: E712
965+
collector.collect_and_log(
966+
path,
967+
ValidationProblem.OpenEnumWithCustomFalse,
968+
nxdl_enum,
969+
value,
970+
)
971+
972+
elif custom_attr is None:
973+
try:
974+
mapping[custom_path] = True
975+
except ValueError:
976+
# we are in the HDF5 validation, cannot set custom attribute.
977+
pass
978+
collector.collect_and_log(
979+
path,
980+
ValidationProblem.OpenEnumWithMissingCustom,
981+
nxdl_enum,
982+
value,
983+
custom_added_auto,
984+
)
985+
else:
986+
collector.collect_and_log(
987+
path, ValidationProblem.InvalidEnum, nxdl_enum, value
988+
)
893989

894990

895991
def split_class_and_name_of(name: str) -> tuple[Optional[str], str]:

0 commit comments

Comments
 (0)