Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
a2df01c
First ideas of validation based on hdf tree traversal
domna May 17, 2024
76310cb
Improved get_children_names function
domna May 21, 2024
fa2cb26
Read extends keyword from file
domna Jun 4, 2024
31d0cbc
Insert extends parents into the inheritance chain
domna Jun 6, 2024
cfffe29
Automatically populate tree from appdef parents
domna Jun 6, 2024
6c98778
Only populate tree if parents are present
domna Jun 6, 2024
58de3fb
Docstring improvements
domna Jun 6, 2024
b759166
If minOccurs == 0, set the group to optional
domna Jun 7, 2024
c938beb
Add extended NXtest
domna Jun 7, 2024
b9842df
Fix function name
domna Jun 7, 2024
7ce64ec
Do actual field checks
domna Jun 7, 2024
8f34dda
Add cli and units support
domna Jun 10, 2024
311f26f
Include units files in package
domna Jun 12, 2024
2054374
Add working validation
domna Jun 12, 2024
e972a61
Use node_type in find_node_for
domna Jun 12, 2024
f88ebd9
Fix tests
domna Jun 12, 2024
2262db3
Fixes from merging
domna Aug 12, 2024
6dd0f27
Import ureg for units validation
domna Aug 12, 2024
efecd29
Adding some test files for hdf5_validator.
RubelMozumder Aug 13, 2024
ae4ec4e
pytest for hdf5 validation.
RubelMozumder Aug 14, 2024
ca06732
test for nexus file validation.
RubelMozumder Aug 14, 2024
b94d8a5
VisitingCode-1
RubelMozumder Oct 17, 2024
6d4c70a
fix rebase issues
lukaspie Jul 23, 2025
3d5534d
clean up namefitting
lukaspie Jul 23, 2025
38e87d3
add ignore-undocumented flag for verify_nexus
lukaspie Jul 23, 2025
936eca1
implement special checks for NXcollection
lukaspie Jul 23, 2025
53d940b
implement check for reserved prefixes
lukaspie Jul 23, 2025
55aa22e
implement check for reserved suffixes
lukaspie Jul 23, 2025
cbd78a9
haandle 'object' dtype separately (for lists from HDF5 files)
lukaspie Jul 23, 2025
67b50b3
proper handling of NXdata groups
lukaspie Jul 23, 2025
a408342
handle lists and enumerated lists
lukaspie Jul 23, 2025
966b53e
separate function from cli tool to use in test framework
lukaspie Jul 23, 2025
87cdee8
fix selection of nxdl roots
lukaspie Jul 23, 2025
42981b9
remove unneeded NXDL test files
lukaspie Jul 23, 2025
9edfb37
start fixing tests
lukaspie Jul 23, 2025
191050a
check for required groups
lukaspie Jul 24, 2025
145d0d6
fix test for reserved prefixes
lukaspie Jul 24, 2025
5eb9250
adjust error messages, check for documented in variadic concepts, ch…
lukaspie Jul 25, 2025
8438d3c
clear collector between runs
lukaspie Jul 25, 2025
0cf6bc4
use the same error message for missing concepts
lukaspie Jul 25, 2025
0e76e1d
further fixes for finding undocumented terms
lukaspie Jul 25, 2025
b42b1c7
fixes for checking documented concepts
lukaspie Jul 28, 2025
cfa557d
rename functionality to 'validate'
lukaspie Jul 28, 2025
adeecf1
check target attribute
lukaspie Jul 28, 2025
9de28a8
follow links, check target attribute
lukaspie Jul 28, 2025
879f187
validate link targets
lukaspie Jul 28, 2025
d651e7c
use custom visititems for link checking
lukaspie Jul 28, 2025
7b36f93
fix required children checking
lukaspie Jul 28, 2025
24ad3fb
use correct function in test framework
lukaspie Jul 28, 2025
22ba545
add docstrings
lukaspie Jul 28, 2025
520bb21
remove unneeded changes
lukaspie Jul 28, 2025
d20aa9d
mypy fixes
lukaspie Jul 28, 2025
b2380b8
use hints for NXdata fields, log only once
lukaspie Jul 29, 2025
b2ca51f
use caplog_level for validation in testing framework
lukaspie Jul 29, 2025
2b811cf
add test for compressed payload from template
lukaspie Jul 30, 2025
36f5eed
unpack compressed tuples
lukaspie Jul 30, 2025
ace0a1e
check enums that are lists
lukaspie Aug 7, 2025
0f6dcdd
check for target attribute in template link
lukaspie Aug 7, 2025
3bf51c1
rename function argument
lukaspie Aug 7, 2025
b93de90
update docs
lukaspie Aug 8, 2025
1efa653
remove test_nxdl path in dev_tools
lukaspie Aug 11, 2025
f17a877
make clean_str_attr function more robust
lukaspie Aug 11, 2025
b9443dd
return non-strings as-is
lukaspie Aug 11, 2025
f815e89
New rules around `@target` attribute
lukaspie Aug 11, 2025
be2aefc
fix a couple more test cases
lukaspie Aug 11, 2025
612bafd
Apply suggestions from code review
lukaspie Aug 13, 2025
64593b4
replace custom func with a function from dev_tools
lukaspie Aug 13, 2025
cf04151
define level map at object level
lukaspie Aug 13, 2025
0bbbe5b
refactor docs
lukaspie Aug 14, 2025
1d5e94b
ignore groups without NX_class attribute (and fields/attributes) within
lukaspie Aug 14, 2025
68f51de
remove duplicate import
lukaspie Aug 14, 2025
4db564b
mypy fixes
lukaspie Aug 14, 2025
df77ac7
simpliy one test case
lukaspie Aug 14, 2025
3433ccd
clean up unneeded imports
lukaspie Aug 15, 2025
ede91a0
warning message for invalid entries
lukaspie Aug 15, 2025
f871da9
rename node.type to node.nx_type
lukaspie Aug 15, 2025
1c83210
revert package downgrade
lukaspie Aug 15, 2025
85012e9
extract function for checking reserved prefixes
lukaspie Aug 15, 2025
e149934
remove unusable code for checking link types
lukaspie Aug 15, 2025
769bfd8
use a unified function for checking reserved suffixes
lukaspie Aug 18, 2025
28371b0
clean up template if group/field was linked to field/group
lukaspie Aug 19, 2025
d1fa472
report missing if an invalid link was used for required concept
lukaspie Aug 20, 2025
ecb741f
fix remaning test case
lukaspie Aug 20, 2025
372a879
align output for appdef and baseclass checks
lukaspie Aug 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ babel==2.17.0
# via mkdocs-material
backrefs==5.9
# via mkdocs-material
cachetools==6.1.0
# via pynxtools (pyproject.toml)
certifi==2025.7.14
# via requests
cfgv==3.4.0
Expand Down
40 changes: 30 additions & 10 deletions docs/learn/pynxtools/nexus-validation.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,43 @@
# Validation of NeXus file
# Validation of NeXus files

!!! info "This page is intended to give more information about the validation tools that are part of `pynxtools`. Please also have a look at our comprehensive [how-to guide](../../how-tos/pynxtools/validate-nexus-file.md) on NeXus validation."

One of the main advantages of using pynxtools is that it comes with its own validation tools. That is, it can be used to validate that a given NeXus/HDF5 file is compliant with a NeXus application definition.
One of the main advantages of using `pynxtools` is that it comes with its own validation tools. That is, it can be used to validate that a given NeXus/HDF5 file is compliant with a NeXus application definition.

## As part of the dataconverter

During [data conversion](./dataconverter-and-readers.md), before writing the HDF5 file, the data is first checked against the provided application definition.
During [data conversion](./dataconverter-and-readers.md) within `pynxtools`, before writing the HDF5 file, the data is first checked against the provided application definition.

<!--## verify-nexus: Testing existing NeXus/HDF5 files
This CLI tool can be used to validate _existing_ HDF5 files that claim to be NeXus-compliant. See [here](reference/cli-api.md#verify-nexus) for the API documentation.-->
## validate_nexus: Validate existing NeXus/HDF5 files

While we encourage NeXus users to convert their data using the `pynxtools` data converter, we also realize that a lot of NeXus files are created using other applications. For such use cases,
`pynxtools` provides a standalone validator (called `validate_nexus`). This CLI tool can be used to validate _existing_ HDF5 files against the NeXus application definition they claim to be comply with. Read more in the [API documentation](../../reference/cli-api.md#validate_nexus).

The following example dataset can be used to test the `validate_nexus` module:

[201805_WSe2_arpes.nxs](https://github.com/FAIRmat-NFDI/pynxtools/blob/master/src/pynxtools/data/201805_WSe2_arpes.nxs){:target="_blank" .md-button }

This is an angular-resolved photoelectron spectroscopy (ARPES) dataset that is formatted according to the [`NXarpes` application definition](https://manual.nexusformat.org/classes/applications/NXarpes.html#nxarpes).

If you have `pynxtools` installed, you can call the validator on this file using the command

```bash
validate_nexus 201805_WSe2_arpes.nxs
```

You will see some warning messages that will give you an impression of the kind of messages the validator tool provides.

## read_nexus: NeXus file reader and debugger

This utility outputs a debug log for a given NeXus file by annotating the data and metadata entries with the schema definitions from the respective NeXus base classes and application definitions to which the file refers to. See [here](../../reference/cli-api.md#nexus-file-validation) for the API documentation.

The following example dataset can be used to test the `read_nexus` module: [src/pynxtools/data/201805_WSe2_arpes.nxs](https://github.com/FAIRmat-NFDI/pynxtools/blob/master/src/pynxtools/data/201805_WSe2_arpes.nxs). This is an angular-resolved photoelectron spectroscopy (ARPES) dataset that is formatted according to the [NXarpes application definition of NeXus](https://manual.nexusformat.org/classes/applications/NXarpes.html#nxarpes).
If you have `pynxtools` installed, you can call the tool on the file mentioned above using the command

```bash
read_nexus 201805_WSe2_arpes.nxs
```

!!! info "Using a different set of NeXus definitions"
??? info "Using a different set of NeXus definitions"

The environment variable "NEXUS_DEF_PATH" can be set to a directory which contains the NeXus definitions as NXDL XML files. If this environment variable is not defined, the module will use the definitions in its bundle (see `src/pynxtools/definitions`)._

Expand All @@ -26,7 +46,7 @@ The following example dataset can be used to test the `read_nexus` module: [src/
export 'NEXUS_DEF_PATH'=<folder_path_that_contains_nexus_defs>
```

!!! info "A note to Windows users"
??? info "A note to Windows users"

If you run `read_nexus` from `git bash`, you need to set the environmental variable
`MSYS_NO_PATHCONV` to avoid the [path translation in Windows Git MSys](https://stackoverflow.com/questions/7250130/how-to-stop-mingw-and-msys-from-mangling-path-names-given-at-the-command-line#34386471).
Expand All @@ -40,10 +60,10 @@ The following example dataset can be used to test the `read_nexus` module: [src/

## Other approaches (not part of pynxtools)

Aside from the tools we developed within FAIRmat, the [official NeXus website](https://manual.nexusformat.org/validation.html) lists additional programs for the validation of NeXus files:
Aside from the tools we develop within FAIRmat, the [official NeXus website](https://manual.nexusformat.org/validation.html) lists additional programs for the validation of NeXus files:

1. [cnxvalidate: NeXus validation tool written in C](https://github.com/nexusformat/cnxvalidate)
2. [punx: Python Utilities for NeXus HDF5 files](https://github.com/prjemian/punx)
3. [nexpy/nxvalidate: A python API for validating NeXus file](https://github.com/nexpy/nxvalidate)

We will not discuss the details of these programs here, but you can find some information about the in the how-to guide linked above.
We will not discuss the details of these programs here, but you can find some information about them when following these links.
15 changes: 9 additions & 6 deletions docs/reference/cli-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@ Note that simply calling `dataconverter` defaults to `dataconverter convert`.
:list_subcommands: True

## NeXus file validation
<!-- ::: mkdocs-click
:module: "pynxtools.dataconverter.verify
:command: verify_nexus
:prog_name: verify_nexus
:depth: 1

::: mkdocs-click
:module: pynxtools.dataconverter.validate_file
:command: validate_cli
:prog_name: validate_nexus
:depth: 2
:style: table
:list_subcommands: True -->
:list_subcommands: True

## NeXus annotator

::: mkdocs-click
:module: pynxtools.nexus.nexus
Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ dependencies = [
"toposort>=1.10.0",
"anytree",
"pint",
"cachetools",
]

[project.urls]
Expand Down Expand Up @@ -108,6 +109,7 @@ simple_nexus_example = "pynxtools.nomad.entrypoints:simple_nexus_example"
read_nexus = "pynxtools.nexus.nexus:main"
dataconverter = "pynxtools.dataconverter.convert:main_cli"
generate_eln = "pynxtools.eln_mapper.eln_mapper:get_eln"
validate_nexus = "pynxtools.dataconverter.validate_file:validate_cli"

[tool.setuptools.package-data]
pynxtools = ["definitions/**/*.xml", "definitions/**/*.xsd"]
Expand Down
4 changes: 2 additions & 2 deletions src/pynxtools/data/NXtest.nxdl.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,14 @@
</group>
<group type="NXdata" name="specified_group" nameType="specified">
<doc>A group with a name and nameType="specified".</doc>
<field name="specified_field" nameType="specified" type="NX_FLOAT" units="NX_ANY">
<field name="specified_field" nameType="specified" optional="true" type="NX_FLOAT" units="NX_ANY">
<attribute name="specified_attr_in_field" nameType="specified"/>
</field>
<attribute name="specified_attr"/>
</group>
<group type="NXdata" name="any_groupGROUP" nameType="any">
<doc>A group with a name and nameType="any".</doc>
<field name="any_fieldFIELD" nameType="any" optional="true" type="NX_FLOAT" units="NX_ANY">
<field name="any_fieldFIELD" nameType="any" type="NX_FLOAT" units="NX_ANY">
<attribute name="any_attrATTR_in_field" nameType="any"/>
</field>
<attribute name="any_attrATTR" nameType="any"/>
Expand Down
6 changes: 3 additions & 3 deletions src/pynxtools/dataconverter/convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,8 +216,8 @@ def convert(
fair : bool, default False
If True, a warning is given that there are undocumented paths
in the template.
undocumented : bool, default False
If True, an undocumented warning is given.
ignore_undocumented : bool, default False
If True, all undocumented items are ignored in the validation.
skip_verify: bool, default False
Skips verification routine if set to True

Expand Down Expand Up @@ -324,7 +324,7 @@ def main_cli():
"--ignore-undocumented",
is_flag=True,
default=False,
help="Ignore all undocumented fields during validation.",
help="Ignore all undocumented concepts during validation.",
)
@click.option(
"--fail",
Expand Down
Loading
Loading