Tame the eventual memory hungryness of the dataconverter and NOMAD parser for large datasets

- [ ] Observed for 15.5 GiB NeXus/HDF5 NXem dataset, essentially all content in one 3d array which uncompressed is 32GiB, the array if stored chunked, upon parsing multiple times the uncompressed payload is allocated causing the parsing to break on small systems.

<img width="1806" height="915" alt="Image" src="https://github.com/user-attachments/assets/c326cd5d-f343-4401-a2c8-905d88b097f1" />

Suggestions, we should check for inefficiencies, assuming what happens when a dataset is large:
- [ ] Compute summary statistics and finiteness via taking advantage of chunked layout, chunk-by-chunk
- [ ] Could be useful to drop processing statistics for contiguously stored datasets if larger than fraction of maximum allocated memory accessible to the user in the deployment, like "drop of load"
- [ ] Access h5 objects metadata directly rather than via np.shape() calls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tame the eventual memory hungryness of the dataconverter and NOMAD parser for large datasets #737

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tame the eventual memory hungryness of the dataconverter and NOMAD parser for large datasets #737

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions