Our HDF5 implementation continues to be a memory hog

### What happened?

Despite efforts, the idea of having HDF5 as a memory-mapping device is failing. However, it is critical that memory mapping works in this case.

Potential explanations (as per codex, all sounding pretty reasonable):

-    The BaseDataset API hard-requires in-memory `numpy.ndarray` objects for `dataobj` and `affine`, and `from_filename` eagerly converts every HDF5 dataset into full NumPy arrays before constructing the object. This prevents use of HDF5 datasets or memory-mapped arrays as backends and guarantees that all volumes occupy RAM, defeating the intended low-memory design. We should allow `BaseDataset` to use lazy/backed arrays instead of forcing `numpy.ndarray`.
-    Writing to HDF5 via `to_filename` always materializes every field (including the full data array) in memory and never updates `_filepath` to serve as a backing store, so even after writing, there is no mechanism to drop the in-memory copy or reopen lazily. This duplicative write path increases peak memory usage rather than reducing it. We should rework `to_filename` to create and reuse on-disk backing stores
-    The DWI initializer removes b=0 volumes via boolean masking (`self.dataobj = self.dataobj[..., ~b0_mask]`), which copies the full array; the b=0 reference is also computed from the in-memory data. Combined with the base class constraints, this further increases transient memory during the construction of diffusion datasets.

### What command did you use?

```shell
n/a
```

### What version of the software are you running?

main

### How are you running this software?

Local installation ("bare-metal")

### Is your data BIDS valid?

Yes

### Are you reusing any previously computed results?

No

### Please copy and paste any relevant log output.

```shell

```

### Additional information / screenshots

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Our HDF5 implementation continues to be a memory hog #347

What happened?

What command did you use?

What version of the software are you running?

How are you running this software?

Is your data BIDS valid?

Are you reusing any previously computed results?

Please copy and paste any relevant log output.

Additional information / screenshots

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Our HDF5 implementation continues to be a memory hog #347

Description

What happened?

What command did you use?

What version of the software are you running?

How are you running this software?

Is your data BIDS valid?

Are you reusing any previously computed results?

Please copy and paste any relevant log output.

Additional information / screenshots

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions