Skip to content

Commit 628094e

Browse files
Fixes
1 parent 34e7da6 commit 628094e

File tree

2 files changed

+18
-15
lines changed

2 files changed

+18
-15
lines changed

paper/paper.bib

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ @article{Draxl:2019
2424
title={The NOMAD laboratory: from data sharing to artificial intelligence},
2525
author={Draxl, Claudia and Scheffler, Matthias},
2626
journal={Journal of Physics: Materials},
27+
doi = {10.1088/2515-7639/ab13bb},
2728
volume={2},
2829
number={3},
2930
pages={036001},
@@ -35,6 +36,7 @@ @article{Ghiringhelli:2017
3536
title={Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats},
3637
author={Ghiringhelli, Luca M and Carbogno, Christian and Levchenko, Sergey and Mohamed, Fawzi and Huhs, Georg and L{\"u}ders, Martin and Oliveira, Micael and Scheffler, Matthias},
3738
journal={npj computational materials},
39+
doi = {10.1038/s41524-017-0048-5},
3840
volume={3},
3941
number={1},
4042
pages={46},
@@ -214,6 +216,7 @@ @software{Druskat:2021
214216
@article{Hoyer:2017,
215217
title = {xarray: {N-D} labeled arrays and datasets in {Python}},
216218
author = {Hoyer, S. and J. Hamman},
219+
doi = {10.5334/jors.148},
217220
journal = {J. Open Res. Software},
218221
year = {2017}
219222
}
@@ -311,7 +314,8 @@ @InProceedings{ McKinney:2010
311314
@misc{Behnel:2005,
312315
title={ {l}xml: XML and HTML with Python},
313316
author={Behnel, Stefan and Faassen, Martijn and Bicking, Ian},
314-
year={2005}
317+
year={2005},
318+
url = {https://lxml.de}
315319
}
316320

317321
@article{Hjorth:2017,
@@ -336,8 +340,7 @@ @software{Pint:2012
336340
title = {Pint: {O}perate and manipulate physical quantities in {P}ython},
337341
howpublished = {\url{https://github.com/hgrecco/pint}},
338342
url = {https://github.com/hgrecco/pint},
339-
year = {2012},
340-
note = {[Accessed 17-06-2025]},
343+
year = {2012}
341344
}
342345

343346
@software{Click:2014,
@@ -357,7 +360,7 @@ @software{Clarke:2019
357360
}
358361

359362
@software{H5py:2008,
360-
author = {Andrew Collette et al.},
363+
author = {Andrew Collette and et al.},
361364
title = {h5py: HDF5 for Python},
362365
month = may,
363366
year = 2008,

paper/paper.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'pynxtools: A framework for generating NeXus files from formats across disciplines.'
2+
title: 'pynxtools: A framework for generating NeXus files from formats across disciplines'
33
tags:
44
- Python
55
- NeXus
@@ -131,47 +131,47 @@ bibliography: paper.bib
131131

132132
# Summary
133133

134-
Scientific data across physics, materials science, and materials engineering often lacks adherence to FAIR principles [@Wilkinson:2016; @Jacobsen:2020; @Barker:2022; @Wilkinson:2025] due to incompatible instrument-specific formats and diverse standardization practices. pynxtools is a Python software development framework with a command line interface (CLI) that standardizes data conversion for scientific experiments in materials characterization to the NeXus format [@Koennecke:2015; @Koennecke:2006; @Klosowski:1997] across diverse scientific domains. NeXus uses NeXus application definitions as their data storage specifications. pynxtools provides a fixed, versioned set of NeXus application definitions that ensures convergence and alignment in data specifications across atom probe tomography, electron microscopy, optical spectroscopy, photoemission spectroscopy, scanning probe microscopy, X-ray diffraction. Through its modular plugin architecture, pynxtools provides maps for instrument-specific raw data, and electronic lab notebook metadata, to these unified definitions, while performing validation to ensure data correctness and NeXus compliance. By simplifying the adoption of standardized application definitions, the framework enables true data interoperability and FAIR data management across multiple experimental techniques.
134+
Scientific data across physics, materials science, and materials engineering often lacks adherence to FAIR principles [@Wilkinson:2016; @Jacobsen:2020; @Barker:2022; @Wilkinson:2025] due to incompatible instrument-specific formats and diverse standardization practices. `pynxtools` is a Python software development framework with a command line interface (CLI) that standardizes data conversion for scientific experiments in materials characterization to the NeXus format [@Koennecke:2015; @Koennecke:2006; @Klosowski:1997] across diverse scientific domains. NeXus defines data storage specifications for different experimental techniques through application definitions. `pynxtools` provides a fixed, versioned set of NeXus application definitions that ensures convergence and alignment in data specifications across atom probe tomography, electron microscopy, optical spectroscopy, photoemission spectroscopy, scanning probe microscopy, and X-ray diffraction. Through its modular plugin architecture `pynxtools` provides maps for instrument-specific raw data and electronic lab notebook metadata to these unified definitions, while performing validation to ensure data correctness and NeXus compliance. By simplifying the adoption of standardized application definitions, the framework enables true data interoperability and FAIR data management across multiple experimental techniques.
135135

136136
# Statement of need
137137

138-
Achieving FAIR (Findable, Accessible, Interoperable, and Reproducible) data principles in experimental physics and materials science requires consistent implementation of standardized data formats. NeXus provides comprehensive data specifications for structured storage of scientific data. pynxtools simplifies the use of NeXus for developers and researchers by providing guided workflows and automated validation to ensure complete compliance. Existing tools [@Koennecke:2024; @Jemian:2025] provide solutions with individual capabilities, but none offers a comprehensive end-to-end workflow for proper NeXus adoption. pynxtools addresses this critical gap by providing a framework that enforces complete NeXus application definition compliance through automated validation, detailed error reporting for missing required data points, and clear implementation pathways via configuration files and extensible plugins. This approach transforms NeXus from a complex specification into a practical solution, enabling researchers to achieve true data interoperability without deep technical expertise in the underlying standards.
138+
Achieving FAIR (Findable, Accessible, Interoperable, and Reproducible) data principles in experimental physics and materials science requires consistent implementation of standardized data formats. NeXus provides comprehensive data specifications for structured storage of scientific data. `pynxtools` simplifies the use of NeXus for developers and researchers by providing guided workflows and automated validation to ensure complete compliance. Existing tools [@Koennecke:2024; @Jemian:2025] provide solutions with individual capabilities, but none offers a comprehensive end-to-end workflow for proper NeXus adoption. `pynxtools` addresses this critical gap by providing a framework that enforces complete NeXus application definition compliance through automated validation, detailed error reporting for missing required data points, and clear implementation pathways via configuration files and extensible plugins. This approach transforms NeXus from a complex specification into a practical solution, enabling researchers to achieve true data interoperability without deep technical expertise in the underlying standards.
139139

140140
# Dataconverter and validation
141141

142-
The _dataconverter_, core module of pynxtools, combines instrument output files and data from electronic lab notebooks into NeXus-compliant HDF5 files. The converter performs three key operations: reading experimental data through specialized readers, validating against NeXus application definitions to ensure compliance with existence, shape, and format constraints, and writing valid NeXus/HDF5 output files.
142+
The `dataconverter`, core module of pynxtools, combines instrument output files and data from electronic lab notebooks into NeXus-compliant HDF5 files. The converter performs three key operations: extracting experimental data through specialized readers, validating against NeXus application definitions to ensure compliance with existence, shape, and format constraints, and writing valid NeXus/HDF5 output files.
143143

144144
The `dataconverter` provides a CLI to produce NeXus files where users can use one of the built-in readers for generic functionality or technique-specific reader plugins, which are distributed as separate Python packages.
145145

146146
For developers, the `dataconverter` provides an abstract `reader` class for building plugins that process experiment-specific formats and populate the NeXus specification. It passes a `Template`, a subclass of Python’s dictionary, to the `reader` as a form to fill. The `Template` ensures structural compliance with the chosen NeXus application definition and organizes data by NeXus's required, recommended, and optional levels.
147147

148-
The _dataconverter_ validates _reader_ output against the selected NeXus application definition, checking for instances of required concepts, complex dependencies (like inheritance and nested group rules), and data integrity (type, shape, constraints). It reports errors for invalid required concepts and emits CLI warnings for unmatched or invalid data, aiding practical NeXus file creation.
148+
The `dataconverter` validates `reader` output against the selected NeXus application definition, checking for instances of required concepts, complex dependencies (like inheritance and nested group rules), and data integrity (type, shape, constraints). It reports errors for invalid required concepts and emits CLI warnings for unmatched or invalid data, aiding practical NeXus file creation.
149149

150150
All reader plugins are tested using the pynxtools.testing suite, which runs automatically via GitHub CI to ensure compatibility with the dataconverter, the NeXus specification, and integration across plugins.
151151

152152
The dataconverter includes an ELN generator that creates either a fillable `YAML` file or a `NOMAD` [@Scheidgen:2023] ELN schema based on a selected NeXus application definition.
153153

154154
# NeXus reader and annotator
155155

156-
_read_nexus_ enables semantic access to NeXus files by linking data items to NeXus concepts, allowing applications to locate relevant data without hardcoding file paths. It supports concept-based queries that return all data items associated with a specific NeXus Vocabulary term. Each data item is annotated by traversing its group path and resolving its corresponding NeXus concept, included inherited definitions.
156+
`read_nexus` enables semantic access to NeXus files by linking data items to NeXus concepts, allowing applications to locate relevant data without hardcoding file paths. It supports concept-based queries that return all data items associated with a specific NeXus Vocabulary term. Each data item is annotated by traversing its group path and resolving its corresponding NeXus concept, included inherited definitions.
157157

158158
Items not part of the NeXus schema are explicitly marked as such, aiding in validation and debugging. Targeted documentation of individual data items is supported through path-specific annotation. The tool also identifies and summarizes the file’s default plottable data based on the NXdata definition.
159159

160160
# `NOMAD` integration
161161

162-
While pynxtools works as a standalone tool, it can also be integrated directly into Research Data Management Systems (RDMS). Out of the box, the package functions as a plugin within the `NOMAD` platform [@Scheidgen:2023; @Draxl:2019]. This enables data in the NeXus format to be integrated into `NOMAD`'s metadata model, making it searchable and interoperable with other data from theory and experiment. The plugin consists of several key components (so called entry points):
162+
While `pynxtools` works as a standalone tool, it can also be integrated directly into Research Data Management Systems (RDMS). Out of the box, the package functions as a plugin within the `NOMAD` platform [@Scheidgen:2023; @Draxl:2019]. This enables data in the NeXus format to be integrated into `NOMAD`'s metadata model, making it searchable and interoperable with other data from theory and experiment. The plugin consists of several key components (so called entry points):
163163

164-
pynxtools extends `NOMAD`'s data schema (called _Metainfo_ [@Ghiringhelli:2017]) by integrating NeXus definitions as a `NOMAD` _Schema Package_, adding NeXus-specific quantities and enabling interoperability through links to other standardized data representations in `NOMAD`. The _dataconverter_ is integrated into `NOMAD`, making the conversion of data to NeXus accessible via the `NOMAD` GUI. The _dataconverter_ also processes manually entered `NOMAD` ELN data in the conversion.
164+
`pynxtools` extends `NOMAD`'s data schema (called `Metainfo` [@Ghiringhelli:2017]) by integrating NeXus definitions as a `NOMAD` `Schema Package`, adding NeXus-specific quantities and enabling interoperability through links to other standardized data representations in `NOMAD`. The `dataconverter` is integrated into `NOMAD`, making the conversion of data to NeXus accessible via the `NOMAD` GUI. The `dataconverter` also processes manually entered `NOMAD` ELN data in the conversion.
165165

166-
The `NOMAD` Parser module in pynxtools (_NexusParser_) extracts structured data from NeXus HDF5 files to populate `NOMAD` with _Metainfo_ object instances as defined by the pynxtools schema package. This enables ingestion of NeXus data directly into `NOMAD`. Parsed data is post-processed using `NOMAD`'s _Normalization_ pipeline. This includes automatic handling of units, linking references (including sample and instrument identifiers defined elsewhere in `NOMAD`), and populating derived quantities needed for advanced search and visualization.
166+
The `NOMAD` Parser module in `pynxtools` (`NexusParser`) extracts structured data from NeXus HDF5 files to populate `NOMAD` with `Metainfo` object instances as defined by the `pynxtools` schema package. This enables ingestion of NeXus data directly into `NOMAD`. Parsed data is post-processed using `NOMAD`'s `Normalization` pipeline. This includes automatic handling of units, linking references (including sample and instrument identifiers defined elsewhere in `NOMAD`), and populating derived quantities needed for advanced search and visualization.
167167

168-
`pynxtools` contains an integrated _Search Application_ for NeXus data within `NOMAD`, powered by `Elasticsearch` [@elasticsearch:2025]. This provides a search dashboard whereby users can efficiently filter uploaded data based on parameters like experiment type, upload timestamp, and domain- and technique-specific quantities. The entire `pynxtools` workflow (conversion, parsing, and normalization) is exemplified in a representative `NOMAD` _Example Upload_ that is shipped with the package. This example helps new users understand the workflow and serves as a template to adapt the plugin to new NeXus applications.
168+
`pynxtools` contains an integrated `Search Application` for NeXus data within `NOMAD`, powered by `Elasticsearch` [@elasticsearch:2025]. This provides a search dashboard whereby users can efficiently filter uploaded data based on parameters like experiment type, upload timestamp, and domain- and technique-specific quantities. The entire `pynxtools` workflow (conversion, parsing, and normalization) is exemplified in a representative `NOMAD` `Example Upload` that is shipped with the package. This example helps new users understand the workflow and serves as a template to adapt the plugin to new NeXus applications.
169169

170170
# Funding
171171
The work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project 460197019 (FAIRmat).
172172

173173
# Acknowledgements
174174

175-
We acknowledge the following software packages our package depends on: [@H5py:2008], [@Harris:2020], [@Click:2014], [@Druskat:2021], [@Hoyer:2017], [@Hoyer:2025], [@Pandas:2020], [@McKinney:2010], [@Behnel:2005], [@Clarke:2019], [@Hjorth:2017], [@Pint:2012].
175+
We acknowledge the following software packages our package depends on: `h5py` [@H5py:2008], `numpy` [@Harris:2020], `click` [@Click:2014] , `CFF` [@Druskat:2021], `xarray` [@Hoyer:2017], [@Hoyer:2025], `pandas` [@Pandas:2020], `scipy` [@McKinney:2010], `lxml` [@Behnel:2005], `mergedeep` [@Clarke:2019], `Atomic Simulation Environment` [@Hjorth:2017], `pint` [@Pint:2012].
176176

177177
# References

0 commit comments

Comments
 (0)