-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What happened?
Despite efforts, the idea of having HDF5 as a memory-mapping device is failing. However, it is critical that memory mapping works in this case.
Potential explanations (as per codex, all sounding pretty reasonable):
- The BaseDataset API hard-requires in-memory
numpy.ndarrayobjects fordataobjandaffine, andfrom_filenameeagerly converts every HDF5 dataset into full NumPy arrays before constructing the object. This prevents use of HDF5 datasets or memory-mapped arrays as backends and guarantees that all volumes occupy RAM, defeating the intended low-memory design. We should allowBaseDatasetto use lazy/backed arrays instead of forcingnumpy.ndarray. - Writing to HDF5 via
to_filenamealways materializes every field (including the full data array) in memory and never updates_filepathto serve as a backing store, so even after writing, there is no mechanism to drop the in-memory copy or reopen lazily. This duplicative write path increases peak memory usage rather than reducing it. We should reworkto_filenameto create and reuse on-disk backing stores - The DWI initializer removes b=0 volumes via boolean masking (
self.dataobj = self.dataobj[..., ~b0_mask]), which copies the full array; the b=0 reference is also computed from the in-memory data. Combined with the base class constraints, this further increases transient memory during the construction of diffusion datasets.
What command did you use?
n/aWhat version of the software are you running?
main
How are you running this software?
Local installation ("bare-metal")
Is your data BIDS valid?
Yes
Are you reusing any previously computed results?
No
Please copy and paste any relevant log output.
Additional information / screenshots
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
Backlog