nipreps · oesteban · Jan 7, 2025 · Jan 5, 2025 · Jan 6, 2025 · Jan 6, 2025
diff --git a/docs/apps/datalad.md b/docs/apps/datalad.md
@@ -1,10 +1,4 @@
-Apps may be able to identify if the input dataset is handled with
-*DataLad* or *Git-Annex*, and pull down linked data that has not
-been fetched yet.
-One example of one such application is *MRIQC*, and all the examples
-on this documentation page will refer to it.
-
-!!! important "Summary"
+!!! abstract "Summary"
 
     Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets
     within containers can be tricky.
@@ -18,6 +12,12 @@ on this documentation page will refer to it.
 
 ## *DataLad* and *Docker*
 
+Apps may be able to identify if the input dataset is handled with
+[*DataLad*](https://www.datalad.org/) or [*git-annex*](https://git-annex.branchable.com), and pull down linked data that has not
+been fetched yet.
+One example of one such application is *MRIQC*, and all the examples
+on this documentation page will refer to it.
+
 When executing *MRIQC* within *Docker* on a *DataLad* dataset
 (for instance, installed from [*OpenNeuro*](https://openneuro.org)),
 we will need to ensure the following settings are observed:
@@ -27,9 +27,12 @@ we will need to ensure the following settings are observed:
 * the uid who is *executing MRIQC* within the container must
   have sufficient permissions to write in the tree.
 
-### Setting execution uid
+### Setting a regular user's execution uid
 
-If the uid is not correct, we will likely encounter the following error:
+If the execution uid does not match the uid of the user who installed
+the *DataLad* dataset, we will likely encounter the following error
+with relatively recent
+[*Git* versions (+2.35.2)](https://github.blog/open-source/git/git-security-vulnerability-announced/#):
 
 ```
 datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
@@ -40,20 +43,16 @@ git config --global --add safe.directory /data
 git-annex: automatic initialization failed due to above problems']
 ```
 
-Confusingly, following the suggestion from *DataLad* directly on the host
-(`git config --global --add safe.directory /data`) will not work in this
+Confusingly, following the suggestion from *DataLad*
+(just propagated from *Git*) of executing
+`git config --global --add safe.directory /data` will not work in this
 case, because this line must be executed within the container.
+However, containers are *transient* and the setting this configuration
+on *Git* will not be propagated between executions unless advanced
+actions are taken (such as mounting a *HOME* folder with the necessary settings).
 
 Instead, we can override the default user executing within the container
 (which is `root`, or uid = 0).
-This can be achieved with
-[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):
-
-```
---user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
-```
-
-We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set.
 Let's update the last example in the previous
 [*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine):
 

diff --git a/docs/apps/docker.md b/docs/apps/docker.md
@@ -1,4 +1,4 @@
-!!! important "Summary"
+!!! abstract "Summary"
 
     Here, we describe how to run *NiPreps* with Docker containers.
     To illustrate the process, we will show the execution of *fMRIPrep*, but these guidelines extend to any other end-user *NiPrep*.
@@ -41,6 +41,15 @@ For more examples and ideas, visit:
 
 After checking your Docker Engine is capable of running Docker images, you are ready to pull your first *NiPreps* container image.
 
+!!! tip "Troubleshooting"
+
+    If you encounter issues while executing a containerized application,
+    it is critical to identify where the fault is sourced.
+    For issues emerging from the *Docker Engine*, please read the
+    [corresponding troubleshooting guidelines](https://docs.docker.com/desktop/troubleshoot-and-support/troubleshoot/#volumes).
+    Once verified the problem is not related to the container system,
+    then follow the specific application debugging guidelines.
+
 ## Docker images
 
 For every new version of the particular *NiPrep* app that is released, a corresponding Docker image is generated.
@@ -82,73 +91,210 @@ This tutorial also provides valuable troubleshooting insights and advice on what
 
 If you need a finer control over the container execution, or you feel comfortable with the Docker Engine, avoiding the extra software layer of the wrapper might be a good decision.
 
-**Accessing filesystems in the host within the container**:
-Containers are confined in a sandbox, so they can't access the host in any ways
-unless you explicitly prescribe acceptable accesses to the host. The
-Docker Engine provides mounting filesystems into the container with the
-`-v` argument and the following syntax:
-`-v some/path/in/host:/absolute/path/within/container:ro`, where the
-trailing `:ro` specifies that the mount is read-only. The mount
-permissions modifiers can be omitted, which means the mount will have
-read-write permissions. In general, you'll want to at least provide two
-mount-points: one set in read-only mode for the input data and one
-read/write to store the outputs. Potentially, you'll want to provide
-one or two more mount-points: one for the working directory, in case you
-need to debug some issue or reuse pre-cached results; and a
-[TemplateFlow](https://www.templateflow.org) folder to preempt the
-download of your favorite templates in every run.
-
-**Running containers as a user**:
-By default, Docker will run the
-container as **root**. Some share systems my limit this feature and only
-allow running containers as a user. When the container is run as
-**root**, files written out to filesystems mounted from the host will
-have the user id `1000` by default. In other words, you'll need to be
-able to run as root in the host to change permissions or manage these
-files. Alternatively, running as a user allows preempting these
-permissions issues. It is possible to run as a user with the `-u`
-argument. In general, we will want to use the same user ID as the
-running user in the host to ensure the ownership of files written during
-the container execution. Therefore, you will generally run the container
-with `-u $( id -u )`.
-
-You may also invoke `docker` directly:
+### Accessing filesystems in the host within the container
+
+Containers are confined in a sandbox, so they can't access the host
+in any ways unless you explicitly prescribe acceptable accesses
+to the host.
+The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax:
+`-v some/path/in/host:/absolute/path/within/container:ro`,
+where the trailing `:ro` specifies that the mount is read-only.
+The mount permissions modifiers can be omitted, which means the mount
+will have read-write permissions.
+In general, you'll want to at least provide two mount-points:
+one set in read-only mode for the input data and one read/write
+to store the outputs:
+
+``` {.shell hl_lines="2 3"}
+$ docker run -ti --rm \
+    -v path/to/data:/data:ro \        # read-only, for data
+    -v path/to/output:/out \          # read-write, for outputs
+    nipreps/fmriprep:<latest-version> \
+    /data /out/out \
+    participant
+```
 
-``` Shell
+When **debugging** or **reusing pre-cached intermediate results**,
+you'll also need to mount some working directory that otherwise
+is not exposed by the application.
+In the case of *NiPreps*, we typically inform the *BIDS Apps*
+to override the work directory by setting the `-w`/`--work-dir`
+argument (please note that this is not defined by the *BIDS Apps*
+specifications and it may change across applications):
+
+``` {.shell hl_lines="4 8"}
+$ docker run -ti --rm \
+    -v path/to/data:/data:ro \
+    -v path/to/output:/out \
+    -v path/to/work:/work \              # mount from host
+    nipreps/fmriprep:<latest-version> \
+    /data /out/out \
+    participant
+    -w /work                             # override default directory
+```
+
+*BIDS Apps* relying on [TemplateFlow](https://www.templateflow.org)
+for atlases and templates management may require
+the *TemplateFlow Archive* be mounted from the host.
+Mounting the *Archive* from the host is an effective way
+to preempt the download of your favorite templates in every run:
+
+``` {.shell hl_lines="5 6"}
 $ docker run -ti --rm \
     -v path/to/data:/data:ro \
     -v path/to/output:/out \
+    -v path/to/work:/work \
+    -v path/to/tf-cache:/opt/templateflow \  # mount from host
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home
     nipreps/fmriprep:<latest-version> \
     /data /out/out \
     participant
+    -w /work
+```
+
+!!! warning "*Docker for Windows* requires enabling Shared Drives"
+
+    On *Windows* installations, the `-v` argument will not work
+    by default because it is necessary to enable shared drives.
+    Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them.
+
+### Running containers as a user
+By default, Docker will run the container with the
+user id (uid) **0**, which is reserved for the default **root**
+account in *Linux*.
+In other words, by default *Docker* will use the superuser account
+to execute the container and will write files with the corresponding
+uid=0 unless configured otherwise.
+Executing as superuser may derive in permissions and security issues,
+for example, [with *DataLad* (discussed later)](datalad.md#).
+One paramount example of permissions issues where beginners typically
+run into is deleting files after a containerized execution.
+If the uid is not overridden, the outputs of a containerized execution
+will be owned by **root** and group **root**.
+Therefore, normal users will not be able to modify the output and
+superuser permissions will be required to deleted data generated
+by the containerized application.
+Some shared systems only allow running containers as a normal user
+because the user will not be able to action on the outputs otherwise.
+
+Either way (whether the container is available with default settings
+or the execution has been customized to normal users),
+running as a normal user allows preempting these permissions issues.
+This can be achieved with
+[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):
+
+```
+--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
 ```
 
-For example: :
+We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set:
 
+``` {.shell hl_lines="4"}
+$ docker run -ti --rm \
+    -v path/to/data:/data:ro \
+    -v path/to/output:/out \
+    -u $(id -u):$(id -g) \                   # set execution uid:gid
+    -v path/to/tf-cache:/opt/templateflow \  # mount from host
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home
+    nipreps/fmriprep:<latest-version> \
+    /data /out/out \
+    participant
 ```
+
+For example:
+
+``` Shell
 $ docker run -ti --rm \
     -v $HOME/ds005:/data:ro \
     -v $HOME/ds005/derivatives:/out \
     -v $HOME/tmp/ds005-workdir:/work \
+    -u $(id -u):$(id -g) \
+    -v $HOME/.cache/templateflow:/opt/templateflow \
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \
     nipreps/fmriprep:<latest-version> \
     /data /out/fmriprep-<latest-version> \
     participant \
     -w /work
 ```
 
+### Application-specific options
+
 Once the Docker Engine arguments are written, the remainder of the
-command line follows the [usage](https://fmriprep.readthedocs.io/en/latest/usage.html).
-In other words, the first section of the command line is all equivalent to the
-`fmriprep` executable in a *bare-metal* installation: :
+command line follows the interface defined by the specific
+*BIDS App* (for instance,
+[*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html)
+or [*MRIQC*](https://mriqc.readthedocs.io/en/latest/running.html#command-line-interface)).
 
-``` Shell
-$ docker run -ti --rm \                      # These lines
-    -v $HOME/ds005:/data:ro \                # are equivalent to
-    -v $HOME/ds005/derivatives:/out \        # a call to the App's
-    -v $HOME/tmp/ds005-workdir:/work \       # entry-point.
-    nipreps/fmriprep:<latest-version> \  #
-    \
-    /data /out/fmriprep-<latest-version> \   # These lines correspond
-    participant \                            # to the particular BIDS
-    -w /work                                 # App arguments.
-```
+The first section of a call comprehends arguments specific to *Docker*,
+and configure the execution of the container:
+
+``` {.shell hl_lines="1-7"}
+$ docker run -ti --rm \
+    -v $HOME/ds005:/data:ro \
+    -v $HOME/ds005/derivatives:/out \
+    -v $HOME/tmp/ds005-workdir:/work \
+    -u $(id -u):$(id -g) \
+    -v $HOME/.cache/templateflow:/opt/templateflow \
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \
+    nipreps/fmriprep:<latest-version> \
+    /data /out/fmriprep-<latest-version> \
+    participant \
+    -w /work
+```
+
+Then, we specify the container image that we execute:
+
+``` {.shell hl_lines="8"}
+$ docker run -ti --rm \
+    -v $HOME/ds005:/data:ro \
+    -v $HOME/ds005/derivatives:/out \
+    -v $HOME/tmp/ds005-workdir:/work \
+    -u $(id -u):$(id -g) \
+    -v $HOME/.cache/templateflow:/opt/templateflow \
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \
+    nipreps/fmriprep:<latest-version> \
+    /data /out/fmriprep-<latest-version> \
+    participant \
+    -w /work
+```
+
+Finally, the application-specific options can be added.
+We already described the work directory setting before, in the case
+of *NiPreps* such as *MRIQC* and *fMRIPrep*.
+Some options are *BIDS Apps* standard, such as
+the *analysis level* (`participant` or `group`)
+and specific participant identifier(s) (`--participant-label`):
+
+``` {.shell hl_lines="9-12"}
+$ docker run -ti --rm \
+    -v $HOME/ds005:/data:ro \
+    -v $HOME/ds005/derivatives:/out \
+    -v $HOME/tmp/ds005-workdir:/work \
+    -u $(id -u):$(id -g) \
+    -v $HOME/.cache/templateflow:/opt/templateflow \
+    -e TEMPLATEFLOW_HOME=/opt/templateflow \
+    nipreps/fmriprep:<latest-version> \
+    /data /out/fmriprep-<latest-version> \
+    participant \
+    --participant-label 001 002 \
+    -w /work
+```
+
+### Resource constraints
+
+*Docker* may be executed with limited resources.
+Please [read the documentation](https://docs.docker.com/engine/containers/resource_constraints/)
+to limit resources such as memory, memory policies, number of CPUs, etc.
+
+**Memory will be a common culprit** when working with large datasets
+(+10GB).
+However, *Docker* engine is limited to 2GB of RAM by default
+for some installations of *Docker* for *MacOSX* and *Windows*.
+The general resource settings can be also modified through the *Docker Desktop*
+graphical user interface.
+On a shell, the memory limit can be overridden with:
+
+```
+$ service docker stop
+$ dockerd --storage-opt dm.basesize=30G
+```