Skip to content

Our HDF5 implementation continues to be a memory hog #347

@oesteban

Description

@oesteban

What happened?

Despite efforts, the idea of having HDF5 as a memory-mapping device is failing. However, it is critical that memory mapping works in this case.

Potential explanations (as per codex, all sounding pretty reasonable):

  • The BaseDataset API hard-requires in-memory numpy.ndarray objects for dataobj and affine, and from_filename eagerly converts every HDF5 dataset into full NumPy arrays before constructing the object. This prevents use of HDF5 datasets or memory-mapped arrays as backends and guarantees that all volumes occupy RAM, defeating the intended low-memory design. We should allow BaseDataset to use lazy/backed arrays instead of forcing numpy.ndarray.
  • Writing to HDF5 via to_filename always materializes every field (including the full data array) in memory and never updates _filepath to serve as a backing store, so even after writing, there is no mechanism to drop the in-memory copy or reopen lazily. This duplicative write path increases peak memory usage rather than reducing it. We should rework to_filename to create and reuse on-disk backing stores
  • The DWI initializer removes b=0 volumes via boolean masking (self.dataobj = self.dataobj[..., ~b0_mask]), which copies the full array; the b=0 reference is also computed from the in-memory data. Combined with the base class constraints, this further increases transient memory during the construction of diffusion datasets.

What command did you use?

n/a

What version of the software are you running?

main

How are you running this software?

Local installation ("bare-metal")

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

No

Please copy and paste any relevant log output.

Additional information / screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions