Validation based on hdf tree traversal by domna · Pull Request #333 · FAIRmat-NFDI/pynxtools

domna · 2024-05-17T10:30:48Z

This is a new functionality: validation not of the template, but of an existing NeXus HDF5 file.

Generally, it seems it's much more straightforward than the verification in the dict, because we have much less specialities we need to consider. However, it could be that the performance is not as good, because we traverse the hdf each time. But we build a cached recursive function to balance that off.

ToDo:

Add documentation

coveralls · 2024-05-17T10:34:11Z

Pull Request Test Coverage Report for Build 9517068942

Details

1 of 1 (100.0%) changed or added relevant line in 1 file are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage remained the same at 96.916%

Files with Coverage Reduction	New Missed Lines	%
tests/dataconverter/test_nexus_tree.py	1	87.23%

Totals
Change from base Build 9516791059:	0.0%
Covered Lines:	597
Relevant Lines:	616

💛 - Coveralls

src/pynxtools/dataconverter/validation.py

RubelMozumder

Few comments.

src/pynxtools/dataconverter/helpers.py

src/pynxtools/dataconverter/convert.py

src/pynxtools/dataconverter/helpers.py

src/pynxtools/dataconverter/nexus_tree.py

RubelMozumder

continue of the last comment!

src/pynxtools/dataconverter/nexus_tree.py

src/pynxtools/testing/nexus_conversion.py

RubelMozumder

Currently, I found a few warning for NXmpes

WARNING: A link was used for /entry/sample/bias_env/voltmeter, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/gas_pressure_env/pressure_gauge, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/temperature_env/temperature_sensor, but no '@target' attribute was found.

IMO, which should not be a warning because they are recommended fields. But it is not a problem if it does not affect our test.

Also, some unit warning in xps example. Though units are there in the nexus file. Probably the reason you have not used the unit registry from pynxtools.

The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/raw/@units = counts has no documentation.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/cycle0_scan0/@units = counts has no documentation.

Regarding SPM, I will skip the validation. The application definition will be restructured very soon.

Other than that, this PR looks good to me.

src/pynxtools/dataconverter/validate_file.py

src/pynxtools/dataconverter/validation.py

lukaspie · 2025-08-14T11:15:15Z

Currently, I found a few warning for NXmpes
WARNING: A link was used for /entry/sample/bias_env/voltmeter, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/gas_pressure_env/pressure_gauge, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/temperature_env/temperature_sensor, but no '@target' attribute was found.
IMO, which should not be a warning because they are recommended fields. But it is not a problem if it does not affect our test.

One should write the target attribute when one uses a link, so it is reasonable to give a warning if it is not given in the file. Note that for the template validation, if @target is not present, we add it automatically.

For the tests that produce this warning, we should just regenerate the test files.

Also, some unit warning in xps example. Though units are there in the nexus file. Probably the reason you have not used the unit registry from pynxtools.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/raw/@units = counts has no documentation.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/cycle0_scan0/@units = counts has no documentation.

These are somewhat expected because NXdata/AXISNAME does not have any units attached. So the units actually are undocumented, so the log messages are expected.

lukaspie · 2025-08-14T15:36:56Z

Regarding SPM, I will skip the validation. The application definition will be restructured very soon.

I think in this particular case you do not need to change anything in pynxtools-spm, but it would be great if you could just regenerate the files once again. As you can see here, the tests only fail because we are now automaticall writing a @target attribute whenever links is used and therefore the logs are different. @RubelMozumder Would be great if you could simply update all SPM files accordingly, so that all tests pass here.

RubelMozumder

LGTM!
As you mentioned, I created the test files that work fine.
But, the test are only passing for hdf-based-validation branch. So, you can merge it, I have created PR rearding prepared for this PR.

rettigl

This introduces a lot new code, where some of it, in particular in validation.py seem to be sometimes duplicates from existing code. This will make future maintenance rather difficult. Is it possible to move common code to a different place and reuse?

Also, while playing around, I recognized some errors which are not recognized:
This wrong dict:

  "/ENTRY/CALIBRATION[energy_referencing]":{
    "calibrated_axis": "@link:/entry/data",
    "calibrated_axis/@units": "eV",
    "physical_quantity": "energy",
    "reference_peak": "@attrs:metadata/scan_info/reference_energy",
    "binding_energy": 0.0,
    "binding_energy/@units": "eV"
  },

Generates the error
ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group.
during validation, but the link is still being set, and this a units attribute is being written to /entry/data. This is not being picked up neither by the converter nor the validator. The wrong type of /entry/energy_referencing/calibrated_axis is neither reported by the validator.

dev-requirements.txt

src/pynxtools/dataconverter/helpers.py

src/pynxtools/testing/nexus_conversion.py

src/pynxtools/dataconverter/helpers.py

src/pynxtools/dataconverter/validation.py

lukaspie · 2025-08-15T09:58:33Z

pynxtools-spm tests are still failing, but they work with FAIRmat-NFDI/pynxtools-spm#43

RubelMozumder · 2025-08-15T11:35:57Z

pynxtools-spm tests are still failing, but they work with FAIRmat-NFDI/pynxtools-spm#43

That PR was prepared for the branch hdf-base-validation. If you merge this PR to the master branch, I can merge that PR as well. Later, you can release a patch version for pynxtools.

lukaspie · 2025-08-19T15:24:48Z

Also, while playing around, I recognized some errors which are not recognized: This wrong dict:
  "/ENTRY/CALIBRATION[energy_referencing]":{
    "calibrated_axis": "@link:/entry/data",
    "calibrated_axis/@units": "eV",
    "physical_quantity": "energy",
    "reference_peak": "@attrs:metadata/scan_info/reference_energy",
    "binding_energy": 0.0,
    "binding_energy/@units": "eV"
  },
Generates the error ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group. during validation, but the link is still being set, and this a units attribute is being written to /entry/data. This is not being picked up neither by the converter nor the validator. The wrong type of /entry/energy_referencing/calibrated_axis is neither reported by the validator.

Indeed, in #642 (specifically in commit 5941b1f), we decided that such keys shall not be removed, even if they are erroneous. I think this was the wrong decision then and we should remove such keys in order to still produce a valid file. I have implemented with 28371b0 and 372a879 now that such keys that are linked to the wrong NeXus type are removed. In order not to create nonsense in the file, we also need to remove any further subkeys (e.g., additional attributes for fields)

Here's the result for your example:

"/ENTRY/CALIBRATION[energy_referencing]":{
  "calibrated_axis": "@link:/entry/data",
  "calibrated_axis/@units": "eV",
  "physical_quantity": "energy",
  "reference_peak": "@attrs:metadata/scan_info/reference_energy",
  "binding_energy": 0.0,
  "binding_energy/@units": "eV"
},

ERROR: The type ('group') of '/ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis' conflicts with the concept /ENTRY/energy_referencing/calibrated_axis, which is of type 'field'.
ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group.
WARNING: The group /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis/@units will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis/@target will not be written.

And for the other way around

"/ENTRY/CALIBRATION[energy_referencing]":"@link:/entry/data/energy",

ERROR: The type ('field') of '/ENTRY[entry]/CALIBRATION[energy_referencing]' conflicts with the concept /ENTRY/energy_referencing, which is of type 'group'.
ERROR: Expected a group at /ENTRY[entry]/CALIBRATION[energy_referencing], but found a field or attribute.
WARNING: The field /ENTRY[entry]/CALIBRATION[energy_referencing] will not be written.
WARNING: The attribute /ENTRY[entry]/CALIBRATION[energy_referencing]/@target will not be written.
ERROR: The type ('field') of '/ENTRY[entry]/CALIBRATION[energy_referencing]' conflicts with the concept /ENTRY/CALIBRATION, which is of type 'group'.

Note that in the latter case, we actually find two different conflicts (with the named group energy_referencing(NXcalibration) as well as with the unnamed (NXcalibration).

I think this should work nicely and also in the resulting files there's no confusion anymore.

rettigl

Thanks for looking into my comments. I did not check all changes again, but tested my issues from before, and they seem to be solved. So LGTM.

domna force-pushed the hdf-based-validation branch from 3483bd4 to 4fff6a2 Compare May 17, 2024 11:26

domna mentioned this pull request Jun 3, 2024

Bugs in NXmpes definition FAIRmat-NFDI/nexus_definitions#230

Closed

domna force-pushed the hdf-based-validation branch from c040e52 to 9eab38c Compare June 7, 2024 15:09

domna self-assigned this Jun 7, 2024

domna mentioned this pull request Jun 10, 2024

Verification #133

Closed

13 tasks

domna mentioned this pull request Jun 28, 2024

Validation fixes #350

Merged

domna requested review from RubelMozumder, mkuehbach and sanbrock July 18, 2024 13:27

domna removed their assignment Jul 24, 2024

lukaspie mentioned this pull request Aug 14, 2024

Nexus validation tutorial #402

Merged

2 tasks

rettigl mentioned this pull request Jan 16, 2025

An append mode for already existing hdf5 files #146

Open

RubelMozumder mentioned this pull request Jan 22, 2025

read_nexus limitations #526

Open

6 tasks

lukaspie force-pushed the hdf-based-validation branch 4 times, most recently from 62eedbf to ba7fa5b Compare July 30, 2025 14:10

lukaspie mentioned this pull request Aug 5, 2025

implement more rigorous checks of compressed payloads #686

Merged

lukaspie force-pushed the hdf-based-validation branch 3 times, most recently from fcee6b3 to f626a83 Compare August 7, 2025 15:03

lukaspie reviewed Aug 11, 2025

View reviewed changes

src/pynxtools/dataconverter/validation.py Show resolved Hide resolved

lukaspie reviewed Aug 11, 2025

View reviewed changes

src/pynxtools/dataconverter/validation.py Outdated Show resolved Hide resolved

lukaspie force-pushed the hdf-based-validation branch from d13c475 to 08e9530 Compare August 11, 2025 10:00

RubelMozumder reviewed Aug 13, 2025

View reviewed changes

src/pynxtools/dataconverter/nexus_tree.py Outdated Show resolved Hide resolved

RubelMozumder reviewed Aug 13, 2025

View reviewed changes

src/pynxtools/dataconverter/nexus_tree.py Outdated Show resolved Hide resolved

src/pynxtools/dataconverter/nexus_tree.py Outdated Show resolved Hide resolved

src/pynxtools/testing/nexus_conversion.py Show resolved Hide resolved

RubelMozumder mentioned this pull request Aug 13, 2025

do not propagate test log to CLI #695

Merged

lukaspie force-pushed the hdf-based-validation branch from 2478274 to ad32358 Compare August 13, 2025 14:13

RubelMozumder requested changes Aug 14, 2025

View reviewed changes

src/pynxtools/dataconverter/validate_file.py Outdated Show resolved Hide resolved

src/pynxtools/dataconverter/validation.py Show resolved Hide resolved

ignore groups without NX_class attribute (and fields/attributes) within

1d5e94b

lukaspie force-pushed the hdf-based-validation branch from ad32358 to 1d5e94b Compare August 14, 2025 15:01

lukaspie added 3 commits August 14, 2025 17:05

remove duplicate import

68f51de

mypy fixes

4db564b

simpliy one test case

df77ac7

lukaspie requested a review from rettigl August 14, 2025 15:38

RubelMozumder approved these changes Aug 14, 2025

View reviewed changes

rettigl requested changes Aug 15, 2025

View reviewed changes

lukaspie added 5 commits August 15, 2025 10:31

clean up unneeded imports

3433ccd

warning message for invalid entries

ede91a0

rename node.type to node.nx_type

f871da9

revert package downgrade

1c83210

extract function for checking reserved prefixes

85012e9

remove unusable code for checking link types

e149934

use a unified function for checking reserved suffixes

769bfd8

clean up template if group/field was linked to field/group

28371b0

lukaspie force-pushed the hdf-based-validation branch from 7a0b8c0 to 28371b0 Compare August 19, 2025 15:33

lukaspie requested a review from rettigl August 19, 2025 15:33

lukaspie added 3 commits August 20, 2025 10:01

report missing if an invalid link was used for required concept

d1fa472

fix remaning test case

ecb741f

align output for appdef and baseclass checks

372a879

rettigl approved these changes Aug 20, 2025

View reviewed changes

lukaspie merged commit f1cfba8 into master Aug 21, 2025
15 of 16 checks passed

lukaspie deleted the hdf-based-validation branch August 21, 2025 08:53

Conversation

domna commented May 17, 2024 • edited by lukaspie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 9517068942

Details

💛 - Coveralls

Uh oh!

Uh oh!

Uh oh!

RubelMozumder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RubelMozumder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RubelMozumder left a comment • edited by lukaspie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukaspie commented Aug 14, 2025

Uh oh!

lukaspie commented Aug 14, 2025

Uh oh!

RubelMozumder left a comment

Choose a reason for hiding this comment

Uh oh!

rettigl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lukaspie commented Aug 15, 2025

Uh oh!

RubelMozumder commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukaspie commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rettigl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

domna commented May 17, 2024 •

edited by lukaspie

Loading

coveralls commented May 17, 2024 •

edited

Loading

RubelMozumder left a comment •

edited by lukaspie

Loading

RubelMozumder commented Aug 15, 2025 •

edited

Loading

lukaspie commented Aug 19, 2025 •

edited

Loading