Skip to content

Design issues of parallelization #158

@oesteban

Description

@oesteban

What happened?

In advancing #142, the spin off of #157 and reflecting upon serialization in #82, I believe our parallelization approach can improve memory fingerprint dramatically with little cost to run time.

In particular, we are using joblib to run replicas of the model in chunks. It would be more efficient to pass only the HDF5 path on hard disk and place the chunking logic within each worker's body, and selecting the chunk by also passing a worker's identifier.

In essence, we have everything we need within the data object's HDF5 in terms of data, we know the index to leave out (or None if single fit), we can pass all the model's arguments, and we can pass the chunk index for the logic.

What command did you use?

n/a

What version of the software are you running?

main

How are you running this software?

Docker

Is your data BIDS valid?

Yes

Are you reusing any previously computed results?

No

Please copy and paste any relevant log output.

Additional information / screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions