Skip to content

Commit 71650ef

Browse files
committed
add flag append_mode, complete documentation, add tests accordingly
1 parent 83e7363 commit 71650ef

File tree

6 files changed

+64
-13
lines changed

6 files changed

+64
-13
lines changed

.cspell/custom-dictionary.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,7 @@ hashvalue
116116
hdfdict
117117
hdfgroup
118118
hdfobject
119+
hfive
119120
hixu
120121
hypothes
121122
idname

.pre-commit-config.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ repos:
1111
rev: v1.19.1
1212
hooks:
1313
- id: mypy # static type checking
14+
additional_dependencies:
15+
- types-PyYAML
1416

1517
- repo: https://github.com/asottile/pyupgrade
1618
rev: v3.21.1

docs/learn/pynxtools/dataconverter-append-mode.md

Lines changed: 31 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,44 @@ There are cases where users wish to compose a NeXus/HDF5 file with data from mul
44

55
- A file should contain multiple `NXentry` instances where each instance applies a different application definition.
66
- Content under `NXentry` instances is composed from running a specific pynxtools parser plugin plus additional content
7-
that is injected via software that might not be written in Python.
7+
that is injected via software other than `pynxtools` or not even software that is written in Python.
88

99
Enabling such use cases while minimizing data copying is the idea behind the append mode of the dataconverter. It is activated by
1010
passing the `--append` flag during [command line invocation](../../tutorial/converting-data-to-nexus.md).
1111

12-
**The append mode must not be understood as a functionality that allows overwriting data.** Convinced that data should be immutable,
13-
once generated, the append mode feature works with the following key assumptions:
12+
## Possibilities and limitations
13+
14+
**The append mode must not be understood as a functionality that allows an overwriting of existent data.**
15+
We are convinced that written data should be immutable. Therefore, using the append mode demands to accept the following assumptions:
1416

1517
- Only groups, datasets, or attributes not yet existent can be added when in append mode.
16-
The implementation catches attempts of overwriting existent such type of HDF5 objects,
18+
The implementation catches attempts of overwriting existent HDF5 objects,
1719
emitting respective logging messages.
1820
- When in append mode, the internal validation of the `template` dictionary is switched off,
1921
irrespective if `--skip-verify` is passed or not.
20-
Instead, users should validate [the HDF5 file a posteriori](../../how-tos/pynxtools/validate-nexus-files.md).
21-
- Despite the HDF5 library offers the functionality, a reshaping of HDF5 datasets is not supported.
22+
Instead, users should validate [the HDF5 file](../../how-tos/pynxtools/validate-nexus-files.md) when having the file compositing completed.
23+
- The HDF5 library's functionality to reshape existent HDF5 datasets is not supported by `pynxtools`.
24+
25+
## Interpreting root level attributes
26+
27+
Note that `pynxtools` sets several attributes at the root level of a NeXus/HDF5 file. These values are defined by whichever tool writes them first.
28+
A subsequent writing to the HDF5 file in append mode does not modify these. This makes the interpretation of the following attributes ambiguous
29+
`NeXus_repository`, `NeXus_release`, `HDF5_Version`, `h5py_version`, `creator`, `creator_version`, `file_time` and `file_update_time`.
30+
31+
When in append mode, `pynxtools` adds the root level attribute `append_mode = "True"` which flags the file as an artifact that was composed
32+
from at least one pynxtools tool running in append mode. Note that the absence of this flag does not guarantee that the file was written
33+
by `pynxtools` or its plugins, as also other software could have written the NeXus file.
34+
35+
Until the NeXus standard allows users to link or define these attributes at the HDF5 object level, i.e. for groups, datasets, and attributes, separately,
36+
we advise to no mix tools that write content that adheres to different versions of the NeXus definitions. Note that the `validate` functionality
37+
of `pynxtools` can currently not detect which objects within an HDF5 file were written with which NeXus or tool version. The validation concludes from
38+
the combination of the `ENTRY/definition`, `ENTRY/definition/@version`, and `/@NeXus_version` attributes.
39+
40+
## Time-stamped HDF5 objects
41+
42+
Note that the HDF5 library has the low-level feature to timestamp individual HDF5 objects. By default though, this feature is deactivated
43+
as per decision of the HDF5 Consortium. The choice was made to prevent that changing timestamp values change the hash of the entire file content.
44+
Note that the `pynxtools-em` plugin includes a [`hfive_base` parser](https://github.com/FAIRmat-NFDI/pynxtools-em/blob/main/src/pynxtools_em/parsers/hfive_base.py)
45+
that can compute hashes from the content of individual HDF5 objects. Users are advised to blacklist timestamp attributes like `file_time`, and `file_update_time`
46+
when comparing the binary content of two HDF5 files using this parser.
2247

src/pynxtools/dataconverter/convert.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -227,7 +227,11 @@ def convert(
227227
**kwargs,
228228
)
229229

230-
helpers.add_default_root_attributes(data=data, filename=os.path.basename(output))
230+
helpers.add_default_root_attributes(
231+
data=data,
232+
filename=os.path.basename(output),
233+
append=True if "append" in kwargs else False,
234+
)
231235
Writer(
232236
data=data,
233237
nxdl_f_path=nxdl_f_path,

src/pynxtools/dataconverter/helpers.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1259,7 +1259,7 @@ def convert_to_hill(atoms_typ):
12591259
return atom_list + list(atoms_typ)
12601260

12611261

1262-
def add_default_root_attributes(data, filename):
1262+
def add_default_root_attributes(data, filename, append: bool = False):
12631263
"""
12641264
Takes a dict/Template and adds NXroot fields/attributes that are inherently available
12651265
"""
@@ -1284,6 +1284,8 @@ def update_and_warn(key: str, value: str):
12841284
update_and_warn("/@NeXus_release", get_nexus_version())
12851285
update_and_warn("/@HDF5_Version", ".".join(map(str, h5py.h5.get_libversion())))
12861286
update_and_warn("/@h5py_version", h5py.__version__)
1287+
if append:
1288+
update_and_warn("/@append_mode", "True")
12871289

12881290

12891291
def write_nexus_def_to_entry(data, entry_name: str, nxdl_def: str):

tests/dataconverter/test_writer.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
import pytest
2626

2727
from pynxtools.dataconverter.exceptions import InvalidDictProvided
28+
from pynxtools.dataconverter.helpers import add_default_root_attributes
2829
from pynxtools.dataconverter.template import Template
2930
from pynxtools.dataconverter.writer import Writer
3031

@@ -38,10 +39,12 @@
3839
@pytest.fixture(name="writer")
3940
def fixture_writer(filled_test_data, tmp_path):
4041
"""pytest fixture to setup Writer object to be used by tests with dummy data."""
42+
output_file_path = os.path.join(tmp_path, "test.nxs")
43+
add_default_root_attributes(filled_test_data, filename=output_file_path)
4144
writer = Writer(
4245
filled_test_data,
4346
os.path.join(os.getcwd(), "src", "pynxtools", "data", "NXtest.nxdl.xml"),
44-
os.path.join(tmp_path, "test.nxs"),
47+
output_file_path,
4548
)
4649
yield writer
4750
del writer
@@ -55,10 +58,11 @@ def test_init(writer):
5558
def test_write(writer):
5659
"""Test for the Writer's write function. Checks whether entries given above get written out."""
5760
writer.write()
58-
test_nxs = h5py.File(writer.output_path, "r")
59-
assert test_nxs["/my_entry/nxodd_name/int_value"][()] == 2
60-
assert test_nxs["/my_entry/nxodd_name/int_value"].attrs["units"] == "eV"
61-
assert test_nxs["/my_entry/nxodd_name/posint_value"].shape == (3,) # pylint: disable=no-member
61+
with h5py.File(writer.output_path, "r") as test_nxs:
62+
assert "/append_mode" not in test_nxs
63+
assert test_nxs["/my_entry/nxodd_name/int_value"][()] == 2
64+
assert test_nxs["/my_entry/nxodd_name/int_value"].attrs["units"] == "eV"
65+
assert test_nxs["/my_entry/nxodd_name/posint_value"].shape == (3,) # pylint: disable=no-member
6266

6367

6468
def test_write_link(writer):
@@ -129,6 +133,7 @@ def fixture_normal_write_then_attempt_append(writer):
129133
template["/ENTRY[my_entry]/definition/@version"] = "2.4.6"
130134
template["/ENTRY[my_entry]/required_group/description"] = "An example description"
131135
template["/already/existing_value"] = np.zeros((125000,), dtype=np.float64)
136+
add_default_root_attributes(template, filename=writer.output_path, append=True)
132137
overwrite = Writer(
133138
template,
134139
os.path.join(os.getcwd(), "src", "pynxtools", "data", "NXtest.nxdl.xml"),
@@ -143,13 +148,25 @@ def test_overwrite(writer_overwrite, caplog):
143148
"""Test whether append is correctly working for the writer."""
144149
writer_overwrite.write()
145150

151+
with h5py.File(writer_overwrite.output_path, "r") as test_nxs:
152+
assert "append_mode" in test_nxs["/"].attrs
153+
assert test_nxs["/"].attrs["append_mode"] == "True"
154+
146155
with caplog.at_level(logging.INFO):
147156
observed_infos = [
148157
r.getMessage() for r in caplog.records if r.levelno == logging.INFO
149158
]
150159

151160
prefix = "Prevented the overwriting of"
152161
expected_infos = [
162+
f"{prefix} attribute /@HDF5_Version",
163+
f"{prefix} attribute /@NX_class",
164+
f"{prefix} attribute /@NeXus_release",
165+
f"{prefix} attribute /@NeXus_repository",
166+
f"{prefix} attribute /@file_name",
167+
f"{prefix} attribute /@file_time",
168+
f"{prefix} attribute /@file_update_time",
169+
f"{prefix} attribute /@h5py_version",
153170
f"{prefix} attribute /ENTRY[my_entry]/@NX_class",
154171
f"{prefix} dataset /ENTRY[my_entry]/definition",
155172
f"{prefix} dataset /ENTRY[my_entry]/required_group/description",

0 commit comments

Comments
 (0)