Skip to content

ENH: Centralize fMRIPrep's and MRIQC's guidelines for Docker & DataLad #47

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 35 additions & 19 deletions docs/apps/datalad.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,4 @@
Apps may be able to identify if the input dataset is handled with
*DataLad* or *Git-Annex*, and pull down linked data that has not
been fetched yet.
One example of one such application is *MRIQC*, and all the examples
on this documentation page will refer to it.

!!! important "Summary"
!!! abstract "Summary"

Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets
within containers can be tricky.
Expand All @@ -18,6 +12,12 @@ on this documentation page will refer to it.

## *DataLad* and *Docker*

Apps may be able to identify if the input dataset is handled with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could/should I propose (separate PR on top?) in this section and/or https://github.com/nipreps/nipreps.github.io/blob/HEAD/docs/apps/singularity.md file to mention our https://github.com/ReproNim/containers which contains (automatically updates) pre-created singularity images for all bids-apps (thus including fmriprep, mriqc), and providing some helpers to streamline their use and more guaranteed reproducibility (isolated environment execution etc).

Note that wrapper also tries to support non-Linux systems (OSX) where we could run singularity under docker. Or could also be used on Linux if there is no singularity installation.

https://github.com/ReproNim/containers?tab=readme-ov-file#runnable-script provides a "typical" use example based on mriqc.

https://github.com/OpenNeuroDerivatives/ by @jbwexler (and @effigies ?) use that ReproNim/containers as a subdatset archive of the images with reproman run for execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also that should include YODA aspects whenever talking about containers... with them it becomes possible to encapsulate all digital objects nicely and reproducibly (there is no guarantee that docker:// would later give you the images used etc)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Mentioning ReproNim on a separate PR would be fantastic.

[*DataLad*](https://www.datalad.org/) or [*git-annex*](https://git-annex.branchable.com), and pull down linked data that has not
been fetched yet.
One example of one such application is *MRIQC*, and all the examples
on this documentation page will refer to it.

When executing *MRIQC* within *Docker* on a *DataLad* dataset
(for instance, installed from [*OpenNeuro*](https://openneuro.org)),
we will need to ensure the following settings are observed:
Expand All @@ -27,9 +27,29 @@ we will need to ensure the following settings are observed:
* the uid who is *executing MRIQC* within the container must
have sufficient permissions to write in the tree.

### Setting execution uid
!!! tip "Check *ReproNim* if the suggestions here did not work"

The actions suggested here are expected to work in most circumstances,
but your system may have specific circumstances that require additional
or alternative approaches.
For instance, [the *ReproNim* project](https://www.repronim.org/) maintains
[ReproNim/containers](https://github.com/ReproNim/containers), a
*DataLad* dataset with ready-to-use Singularity images for released *BIDS Apps*, *NeuroDesktop* applications,
and other containers.
Its [`README.md`](https://github.com/ReproNim/containers?tab=readme-ov-file#runnable-script) guides through an approach via that dataset with *built-in* execution helper taking care about bind-mounts,
proxying critical *Git* configuration and potentially executing *Singularity* images via *Docker* (e.g., under OSX).

In the particular case of *MRIQC*, please consider updating (if necessary)
and fetching the required data before execution and then
add the `--no-datalad-get` argument to workaround issues with
*DataLad*.

### Setting a regular user's execution uid

If the uid is not correct, we will likely encounter the following error:
If the execution uid does not match the uid of the user who installed
the *DataLad* dataset, we will likely encounter the following error
with relatively recent
[*Git* versions (+2.35.2)](https://github.blog/open-source/git/git-security-vulnerability-announced/#):

```
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
Expand All @@ -40,20 +60,16 @@ git config --global --add safe.directory /data
git-annex: automatic initialization failed due to above problems']
```

Confusingly, following the suggestion from *DataLad* directly on the host
(`git config --global --add safe.directory /data`) will not work in this
Confusingly, following the suggestion from *DataLad*
(just propagated from *Git*) of executing
`git config --global --add safe.directory /data` will not work in this
case, because this line must be executed within the container.
However, containers are *transient* and the setting this configuration
on *Git* will not be propagated between executions unless advanced
actions are taken (such as mounting a *HOME* folder with the necessary settings).

Instead, we can override the default user executing within the container
(which is `root`, or uid = 0).
This can be achieved with
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):

```
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
```

We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set.
Let's update the last example in the previous
[*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine):

Expand Down
Loading
Loading