Skip to content

Commit 367b6cd

Browse files
lukaspiemkuehbachlukaspie
authored
align definitions scripts with typical git behavior, add docs (#735)
Co-authored-by: Markus Kühbach <mkuehbach@users.noreply.github.com> Co-authored-by: lukaspie <lukaspie@github.com>
1 parent b23c8af commit 367b6cd

File tree

7 files changed

+228
-82
lines changed

7 files changed

+228
-82
lines changed

.cspell/custom-dictionary.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,8 @@ edgeitems
9595
ekey
9696
electronanalyzer
9797
elems
98-
ellips
9998
elist
99+
ellips
100100
ellipsometry
101101
emittance
102102
execinfo
@@ -107,6 +107,7 @@ fluence
107107
fxcef
108108
getlink
109109
getroottree
110+
gitmodules
110111
groupgroup
111112
groupnames
112113
hashkey
@@ -184,6 +185,7 @@ showgrid
184185
showlegend
185186
straße
186187
submoduled
188+
superproject
187189
tnxdl
188190
tofile
189191
tommaso

.pre-commit-config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,12 +19,12 @@ repos:
1919
args: [--py36-plus] # modernizes syntax for Python 3.6+
2020

2121
- repo: https://github.com/kynan/nbstripout
22-
rev: 0.6.0
22+
rev: 0.9.0
2323
hooks:
2424
- id: nbstripout # removes notebook outputs before committing
2525

2626
- repo: https://github.com/streetsidesoftware/cspell-cli
27-
rev: v9.3.0
27+
rev: v9.6.0
2828
hooks:
2929
- id: cspell # spellchecking
3030
pass_filenames: false

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ We are offering a small guide to getting started with NeXus, `pynxtools`, and NO
6565

6666
#### pynxtools
6767

68+
- [NeXus definitions in `pynxtools`](learn/pynxtools/nexus-definitions.md)
6869
- [Data conversion in `pynxtools`](learn/pynxtools/dataconverter-and-readers.md)
6970
- [Validation of NeXus files](learn/pynxtools/nexus-validation.md)
7071
- [The `MultiFormatReader` as a reader superclass](learn/pynxtools/multi-format-reader.md)
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# NeXus definitions in pynxtools
2+
3+
## Overview
4+
5+
`pynxtools` converts experimental data into the NeXus format and validates the resulting HDF5 files against NeXus application definitions. These definitions formally describe how experimental data and metadata must be structured in a NeXus file.
6+
7+
The NeXus definitions themselves are not part of the pynxtools source code. Instead, they are maintained in a dedicated repository and included in pynxtools as a Git submodule:
8+
9+
```bash
10+
src/pynxtools/definitions
11+
```
12+
13+
This page explains:
14+
15+
- what the definitions repository contains,
16+
17+
- why it is included as a submodule,
18+
19+
- how it is used inside `pynxtools`,
20+
21+
- and how to manage it using the provided helper script.
22+
23+
## What the NeXus definitions are
24+
25+
NeXus defines a standardized structure for scientific data. The structure is defined in XML files written in the NeXus Definition Language (NXDL). These XML files define:
26+
27+
- base classes (common structural components and respective semantic concepts),
28+
- application definitions (experiment-specific schemas),
29+
- contributed definitions from the community. These can be base classes or application definitions.
30+
31+
These definitions specify naming, hierarchy, constraints on the requiredness of individual concepts, and allowed metadata for NeXus files.
32+
33+
In practice, they serve as:
34+
35+
- the schema against which data is validated,
36+
- the reference for generating templates and mappings,
37+
- the source of truth for how experimental data must be organized.
38+
39+
Validation and interoperability depend directly on the exact version of these definitions.
40+
41+
!!! info "To learn more about the different versions of the NeXus definitions, see [Reference > NeXus definitions](../../reference/definitions.md)."
42+
43+
## Role of the definitions in `pynxtools`
44+
45+
`pynxtools` relies on a fixed, versioned set of NeXus definitions to ensure reproducibility and consistency across data conversions.
46+
47+
Specifically, the definitions are used to:
48+
49+
- generate internal representations of application definitions,
50+
- validate generated NeXus files,
51+
- ensure consistent interpretation of metadata across plugins,
52+
- make conversions reproducible by tying results to a specific definitions version.
53+
54+
Because definitions evolve independently from `pynxtools`, the repository is included as a submodule rather than copied into the code base.
55+
56+
This has several advantages:
57+
58+
- updates can be performed independently of `pynxtools` releases,
59+
- the exact definitions commit used for conversion is tracked,
60+
- different branches or commits can be tested when developing new application definitions.
61+
62+
The version currently in use is written to `nexus-version.txt`, allowing downstream users or workflows to reconstruct which definitions were used.
63+
64+
## Why a Git submodule is used
65+
66+
The definitions repository is large and changes independently from the Python code. Using a submodule ensures that:
67+
68+
- Every `pynxtools` commit references an exact definitions commit.
69+
- Users obtain reproducible behavior when cloning the repository.
70+
- Developers can temporarily test newer or experimental definitions without changing the recorded version.
71+
72+
The superproject (`pynxtools`) therefore defines the authoritative version of the definitions.
73+
74+
## Managing the definitions submodule
75+
76+
The submodule should be managed through [a dedicated script](https://github.com/FAIRmat-NFDI/pynxtools/blob/master/scripts/definitions.sh):
77+
78+
```bash
79+
scripts/definitions.sh
80+
```
81+
82+
The script provides a small abstraction over common Git submodule operations and ensures that:
83+
84+
- the correct definitions version is checked out,
85+
- `.gitmodules` remains consistent,
86+
- `nexus-version.txt` is updated automatically.
87+
88+
### Commands
89+
90+
- **Update to the tracked branch**:
91+
92+
```bash
93+
./scripts/definitions.sh update
94+
```
95+
96+
Updates the definitions submodule to the latest commit of the tracked branch (if configured). Internally this runs: `git submodule update --remote`. Use this when you intentionally want to move to a newer definitions version.
97+
98+
- **Reset to the recorded version**:
99+
100+
```bash
101+
./scripts/definitions.sh reset
102+
```
103+
104+
Resets the submodule to the exact commit recorded in the `pynxtools` repository.
105+
This is the safe operation when you want to switch branches, discard local experiments,
106+
or restore a clean state.
107+
108+
Importantly, this does not update to the latest definitions version. It restores the version pinned by the current `pynxtools` commit.
109+
110+
- **Checkout a specific revision**:
111+
112+
```bash
113+
./scripts/definitions.sh checkout <REV>
114+
```
115+
116+
`<REV>` can be any Git reference that is resolvable by `git rev-parse` (e.g., a commit hash, a tag, or a branch name).
117+
The behavior depends on the type:
118+
119+
| Revision type | Result |
120+
|---|---|
121+
| commit or tag | submodule in detached HEAD |
122+
| branch | branch is checked out and tracked |
123+
124+
If a branch is checked out, the script updates `.gitmodules` so that future update calls follow that branch. If a commit or tag is used, branch tracking is removed.
125+
126+
This allows temporary testing of definitions changes without permanently modifying the repository state.
127+
128+
## Summary
129+
130+
The definitions submodule provides the schema layer that enables `pynxtools` to generate and validate NeXus-compliant data. By pinning a specific definitions commit, `pynxtools` guarantees reproducibility while still allowing developers to test newer or experimental definitions when required.
131+
132+
The helper script exists to make these operations explicit and safe, avoiding common pitfalls of manual submodule handling while keeping the underlying Git behavior transparent.

docs/reference/definitions.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# NeXus definitions
22

3+
!!! info "To learn more about how the NeXus definitions are integrated in `pynxtools`, see [Learn > pynxtools > NeXus definitions in pynxtools](../learn/pynxtools/nexus-definitions.md)."
4+
35
We link two references here.
46
The first links to the official definitions by the [NIAC](http://www.nexusformat.org) and the second one links to latest FAIRmat definitions.
57

mkdocs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ nav:
3131
- learn/nexus/nexus-rules.md
3232
# - learn/nexus/multiple-appdefs.md
3333
- pynxtools:
34+
- learn/pynxtools/nexus-definitions.md
3435
- learn/pynxtools/dataconverter-and-readers.md
3536
- learn/pynxtools/nexus-validation.md
3637
- learn/pynxtools/multi-format-reader.md

0 commit comments

Comments
 (0)