Validation based on hdf tree traversal#333
Conversation
Pull Request Test Coverage Report for Build 9517068942Details
💛 - Coveralls |
3483bd4 to
4fff6a2
Compare
c040e52 to
9eab38c
Compare
62eedbf to
ba7fa5b
Compare
fcee6b3 to
f626a83
Compare
d13c475 to
08e9530
Compare
RubelMozumder
left a comment
There was a problem hiding this comment.
continue of the last comment!
2478274 to
ad32358
Compare
There was a problem hiding this comment.
Currently, I found a few warning for NXmpes
WARNING: A link was used for /entry/sample/bias_env/voltmeter, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/gas_pressure_env/pressure_gauge, but no '@target' attribute was found.
WARNING: A link was used for /entry/sample/temperature_env/temperature_sensor, but no '@target' attribute was found.
IMO, which should not be a warning because they are recommended fields. But it is not a problem if it does not affect our test.
Also, some unit warning in xps example. Though units are there in the nexus file. Probably the reason you have not used the unit registry from pynxtools.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/raw/@units = counts has no documentation.
The unit /1_as_loaded__Survey/instrument/electronanalyzer/detector/raw_data/cycle0_scan0/@units = counts has no documentation.
Regarding SPM, I will skip the validation. The application definition will be restructured very soon.
Other than that, this PR looks good to me.
One should write the target attribute when one uses a link, so it is reasonable to give a warning if it is not given in the file. Note that for the template validation, if For the tests that produce this warning, we should just regenerate the test files.
These are somewhat expected because |
ad32358 to
1d5e94b
Compare
I think in this particular case you do not need to change anything in pynxtools-spm, but it would be great if you could just regenerate the files once again. As you can see here, the tests only fail because we are now automaticall writing a |
RubelMozumder
left a comment
There was a problem hiding this comment.
LGTM!
As you mentioned, I created the test files that work fine.
But, the test are only passing for hdf-based-validation branch. So, you can merge it, I have created PR rearding prepared for this PR.
rettigl
left a comment
There was a problem hiding this comment.
This introduces a lot new code, where some of it, in particular in validation.py seem to be sometimes duplicates from existing code. This will make future maintenance rather difficult. Is it possible to move common code to a different place and reuse?
Also, while playing around, I recognized some errors which are not recognized:
This wrong dict:
"/ENTRY/CALIBRATION[energy_referencing]":{
"calibrated_axis": "@link:/entry/data",
"calibrated_axis/@units": "eV",
"physical_quantity": "energy",
"reference_peak": "@attrs:metadata/scan_info/reference_energy",
"binding_energy": 0.0,
"binding_energy/@units": "eV"
},
Generates the error
ERROR: Expected a field at /ENTRY[entry]/CALIBRATION[energy_referencing]/calibrated_axis, but found a group.
during validation, but the link is still being set, and this a units attribute is being written to /entry/data. This is not being picked up neither by the converter nor the validator. The wrong type of /entry/energy_referencing/calibrated_axis is neither reported by the validator.
|
|
That PR was prepared for the branch |
Indeed, in #642 (specifically in commit 5941b1f), we decided that such keys shall not be removed, even if they are erroneous. I think this was the wrong decision then and we should remove such keys in order to still produce a valid file. I have implemented with Here's the result for your example: And for the other way around Note that in the latter case, we actually find two different conflicts (with the named group I think this should work nicely and also in the resulting files there's no confusion anymore. |
7a0b8c0 to
28371b0
Compare
rettigl
left a comment
There was a problem hiding this comment.
Thanks for looking into my comments. I did not check all changes again, but tested my issues from before, and they seem to be solved. So LGTM.
This is a new functionality: validation not of the template, but of an existing NeXus HDF5 file.
Generally, it seems it's much more straightforward than the verification in the dict, because we have much less specialities we need to consider. However, it could be that the performance is not as good, because we traverse the hdf each time. But we build a cached recursive function to balance that off.
ToDo: