|
| 1 | +# NeXus definitions in pynxtools |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +`pynxtools` converts experimental data into the NeXus format and validates the resulting HDF5 files against NeXus application definitions. These definitions formally describe how experimental data and metadata must be structured in a NeXus file. |
| 6 | + |
| 7 | +The NeXus definitions themselves are not part of the pynxtools source code. Instead, they are maintained in a dedicated repository and included in pynxtools as a Git submodule: |
| 8 | + |
| 9 | +```bash |
| 10 | +src/pynxtools/definitions |
| 11 | +``` |
| 12 | + |
| 13 | +This page explains: |
| 14 | + |
| 15 | +- what the definitions repository contains, |
| 16 | + |
| 17 | +- why it is included as a submodule, |
| 18 | + |
| 19 | +- how it is used inside `pynxtools`, |
| 20 | + |
| 21 | +- and how to manage it using the provided helper script. |
| 22 | + |
| 23 | +## What the NeXus definitions are |
| 24 | + |
| 25 | +NeXus defines a standardized structure for scientific data. The structure is defined in XML files written in the NeXus Definition Language (NXDL). These XML files define: |
| 26 | + |
| 27 | +- base classes (common structural components and respective semantic concepts), |
| 28 | +- application definitions (experiment-specific schemas), |
| 29 | +- contributed definitions from the community. These can be base classes or application definitions. |
| 30 | + |
| 31 | +These definitions specify naming, hierarchy, constraints on the requiredness of individual concepts, and allowed metadata for NeXus files. |
| 32 | + |
| 33 | +In practice, they serve as: |
| 34 | + |
| 35 | +- the schema against which data is validated, |
| 36 | +- the reference for generating templates and mappings, |
| 37 | +- the source of truth for how experimental data must be organized. |
| 38 | + |
| 39 | +Validation and interoperability depend directly on the exact version of these definitions. |
| 40 | + |
| 41 | +!!! info "To learn more about the different versions of the NeXus definitions, see [Reference > NeXus definitions](../../reference/definitions.md)." |
| 42 | + |
| 43 | +## Role of the definitions in `pynxtools` |
| 44 | + |
| 45 | +`pynxtools` relies on a fixed, versioned set of NeXus definitions to ensure reproducibility and consistency across data conversions. |
| 46 | + |
| 47 | +Specifically, the definitions are used to: |
| 48 | + |
| 49 | +- generate internal representations of application definitions, |
| 50 | +- validate generated NeXus files, |
| 51 | +- ensure consistent interpretation of metadata across plugins, |
| 52 | +- make conversions reproducible by tying results to a specific definitions version. |
| 53 | + |
| 54 | +Because definitions evolve independently from `pynxtools`, the repository is included as a submodule rather than copied into the code base. |
| 55 | + |
| 56 | +This has several advantages: |
| 57 | + |
| 58 | +- updates can be performed independently of `pynxtools` releases, |
| 59 | +- the exact definitions commit used for conversion is tracked, |
| 60 | +- different branches or commits can be tested when developing new application definitions. |
| 61 | + |
| 62 | +The version currently in use is written to `nexus-version.txt`, allowing downstream users or workflows to reconstruct which definitions were used. |
| 63 | + |
| 64 | +## Why a Git submodule is used |
| 65 | + |
| 66 | +The definitions repository is large and changes independently from the Python code. Using a submodule ensures that: |
| 67 | + |
| 68 | +- Every `pynxtools` commit references an exact definitions commit. |
| 69 | +- Users obtain reproducible behavior when cloning the repository. |
| 70 | +- Developers can temporarily test newer or experimental definitions without changing the recorded version. |
| 71 | + |
| 72 | +The superproject (`pynxtools`) therefore defines the authoritative version of the definitions. |
| 73 | + |
| 74 | +## Managing the definitions submodule |
| 75 | + |
| 76 | +The submodule should be managed through [a dedicated script](https://github.com/FAIRmat-NFDI/pynxtools/blob/master/scripts/definitions.sh): |
| 77 | + |
| 78 | +```bash |
| 79 | +scripts/definitions.sh |
| 80 | +``` |
| 81 | + |
| 82 | +The script provides a small abstraction over common Git submodule operations and ensures that: |
| 83 | + |
| 84 | +- the correct definitions version is checked out, |
| 85 | +- `.gitmodules` remains consistent, |
| 86 | +- `nexus-version.txt` is updated automatically. |
| 87 | + |
| 88 | +### Commands |
| 89 | + |
| 90 | +- **Update to the tracked branch**: |
| 91 | + |
| 92 | + ```bash |
| 93 | + ./scripts/definitions.sh update |
| 94 | + ``` |
| 95 | + |
| 96 | + Updates the definitions submodule to the latest commit of the tracked branch (if configured). Internally this runs: `git submodule update --remote`. Use this when you intentionally want to move to a newer definitions version. |
| 97 | + |
| 98 | +- **Reset to the recorded version**: |
| 99 | + |
| 100 | + ```bash |
| 101 | + ./scripts/definitions.sh reset |
| 102 | + ``` |
| 103 | + |
| 104 | + Resets the submodule to the exact commit recorded in the `pynxtools` repository. |
| 105 | + This is the safe operation when you want to switch branches, discard local experiments, |
| 106 | + or restore a clean state. |
| 107 | + |
| 108 | + Importantly, this does not update to the latest definitions version. It restores the version pinned by the current `pynxtools` commit. |
| 109 | + |
| 110 | +- **Checkout a specific revision**: |
| 111 | + |
| 112 | + ```bash |
| 113 | + ./scripts/definitions.sh checkout <REV> |
| 114 | + ``` |
| 115 | + |
| 116 | + `<REV>` can be any Git reference that is resolvable by `git rev-parse` (e.g., a commit hash, a tag, or a branch name). |
| 117 | + The behavior depends on the type: |
| 118 | + |
| 119 | + | Revision type | Result | |
| 120 | + |---|---| |
| 121 | + | commit or tag | submodule in detached HEAD | |
| 122 | + | branch | branch is checked out and tracked | |
| 123 | + |
| 124 | + If a branch is checked out, the script updates `.gitmodules` so that future update calls follow that branch. If a commit or tag is used, branch tracking is removed. |
| 125 | + |
| 126 | + This allows temporary testing of definitions changes without permanently modifying the repository state. |
| 127 | + |
| 128 | +## Summary |
| 129 | + |
| 130 | +The definitions submodule provides the schema layer that enables `pynxtools` to generate and validate NeXus-compliant data. By pinning a specific definitions commit, `pynxtools` guarantees reproducibility while still allowing developers to test newer or experimental definitions when required. |
| 131 | + |
| 132 | +The helper script exists to make these operations explicit and safe, avoiding common pitfalls of manual submodule handling while keeping the underlying Git behavior transparent. |
0 commit comments