You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Observed for 15.5 GiB NeXus/HDF5 NXem dataset, essentially all content in one 3d array which uncompressed is 32GiB, the array if stored chunked, upon parsing multiple times the uncompressed payload is allocated causing the parsing to break on small systems.
Suggestions, we should check for inefficiencies, assuming what happens when a dataset is large:
Compute summary statistics and finiteness via taking advantage of chunked layout, chunk-by-chunk
Could be useful to drop processing statistics for contiguously stored datasets if larger than fraction of maximum allocated memory accessible to the user in the deployment, like "drop of load"
Access h5 objects metadata directly rather than via np.shape() calls