From 82de14fd8c6b7df8a38441bddec27a171ca0715e Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Sun, 5 Jan 2025 12:06:20 +0100 Subject: [PATCH 1/6] enh: update datatalad and docker guidelines --- docs/apps/datalad.md | 37 ++++--- docs/apps/docker.md | 242 ++++++++++++++++++++++++++++++++++--------- 2 files changed, 212 insertions(+), 67 deletions(-) diff --git a/docs/apps/datalad.md b/docs/apps/datalad.md index 003d06c..1fd4b65 100644 --- a/docs/apps/datalad.md +++ b/docs/apps/datalad.md @@ -1,10 +1,4 @@ -Apps may be able to identify if the input dataset is handled with -*DataLad* or *Git-Annex*, and pull down linked data that has not -been fetched yet. -One example of one such application is *MRIQC*, and all the examples -on this documentation page will refer to it. - -!!! important "Summary" +!!! abstract "Summary" Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets within containers can be tricky. @@ -18,6 +12,12 @@ on this documentation page will refer to it. ## *DataLad* and *Docker* +Apps may be able to identify if the input dataset is handled with +*DataLad* or *Git-Annex*, and pull down linked data that has not +been fetched yet. +One example of one such application is *MRIQC*, and all the examples +on this documentation page will refer to it. + When executing *MRIQC* within *Docker* on a *DataLad* dataset (for instance, installed from [*OpenNeuro*](https://openneuro.org)), we will need to ensure the following settings are observed: @@ -27,9 +27,12 @@ we will need to ensure the following settings are observed: * the uid who is *executing MRIQC* within the container must have sufficient permissions to write in the tree. -### Setting execution uid +### Setting a regular user's execution uid -If the uid is not correct, we will likely encounter the following error: +If the execution uid does not match the uid of the user who installed +the *DataLad* dataset, we will likely encounter the following error +with relatively recent +[*Git* versions (+2.35.2)](https://github.blog/open-source/git/git-security-vulnerability-announced/#): ``` datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else. @@ -40,20 +43,16 @@ git config --global --add safe.directory /data git-annex: automatic initialization failed due to above problems'] ``` -Confusingly, following the suggestion from *DataLad* directly on the host -(`git config --global --add safe.directory /data`) will not work in this +Confusingly, following the suggestion from *DataLad* +(just propagated from *Git*) of executing +`git config --global --add safe.directory /data` will not work in this case, because this line must be executed within the container. +However, containers are *transient* and the setting this configuration +on *Git* will not be propagated between executions unless advanced +actions are taken (such as mounting a *Home* folder with the necessary settings). Instead, we can override the default user executing within the container (which is `root`, or uid = 0). -This can be achieved with -[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user): - -``` ---user=[ user | user:group | uid | uid:gid | user:gid | uid:group ] -``` - -We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set. Let's update the last example in the previous [*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine): diff --git a/docs/apps/docker.md b/docs/apps/docker.md index 42d6004..221bfaf 100644 --- a/docs/apps/docker.md +++ b/docs/apps/docker.md @@ -1,4 +1,4 @@ -!!! important "Summary" +!!! abstract "Summary" Here, we describe how to run *NiPreps* with Docker containers. To illustrate the process, we will show the execution of *fMRIPrep*, but these guidelines extend to any other end-user *NiPrep*. @@ -41,6 +41,15 @@ For more examples and ideas, visit: After checking your Docker Engine is capable of running Docker images, you are ready to pull your first *NiPreps* container image. +!!! tip "Troubleshooting" + + If you encounter issues while executing a containerized application, + it is critical to identify where the fault is sourced. + For issues emerging from the *Docker Engine*, please read the + [corresponding troubleshooting guidelines](https://docs.docker.com/desktop/troubleshoot-and-support/troubleshoot/#volumes). + Once verified the problem is not related to the container system, + then follow the specific application debugging guidelines. + ## Docker images For every new version of the particular *NiPrep* app that is released, a corresponding Docker image is generated. @@ -82,73 +91,210 @@ This tutorial also provides valuable troubleshooting insights and advice on what If you need a finer control over the container execution, or you feel comfortable with the Docker Engine, avoiding the extra software layer of the wrapper might be a good decision. -**Accessing filesystems in the host within the container**: -Containers are confined in a sandbox, so they can't access the host in any ways -unless you explicitly prescribe acceptable accesses to the host. The -Docker Engine provides mounting filesystems into the container with the -`-v` argument and the following syntax: -`-v some/path/in/host:/absolute/path/within/container:ro`, where the -trailing `:ro` specifies that the mount is read-only. The mount -permissions modifiers can be omitted, which means the mount will have -read-write permissions. In general, you'll want to at least provide two -mount-points: one set in read-only mode for the input data and one -read/write to store the outputs. Potentially, you'll want to provide -one or two more mount-points: one for the working directory, in case you -need to debug some issue or reuse pre-cached results; and a -[TemplateFlow](https://www.templateflow.org) folder to preempt the -download of your favorite templates in every run. - -**Running containers as a user**: -By default, Docker will run the -container as **root**. Some share systems my limit this feature and only -allow running containers as a user. When the container is run as -**root**, files written out to filesystems mounted from the host will -have the user id `1000` by default. In other words, you'll need to be -able to run as root in the host to change permissions or manage these -files. Alternatively, running as a user allows preempting these -permissions issues. It is possible to run as a user with the `-u` -argument. In general, we will want to use the same user ID as the -running user in the host to ensure the ownership of files written during -the container execution. Therefore, you will generally run the container -with `-u $( id -u )`. - -You may also invoke `docker` directly: +### Accessing filesystems in the host within the container + +Containers are confined in a sandbox, so they can't access the host +in any ways unless you explicitly prescribe acceptable accesses +to the host. +The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax: +`-v some/path/in/host:/absolute/path/within/container:ro`, +where the trailing `:ro` specifies that the mount is read-only. +The mount permissions modifiers can be omitted, which means the mount +will have read-write permissions. +In general, you'll want to at least provide two mount-points: +one set in read-only mode for the input data and one read/write +to store the outputs: + +``` {.shell hl_lines="2 3"} +$ docker run -ti --rm \ + -v path/to/data:/data:ro \ # read-only, for data + -v path/to/output:/out \ # read-write, for outputs + nipreps/fmriprep: \ + /data /out/out \ + participant +``` -``` Shell +When **debugging** or **reusing pre-cached intermediate results**, +you'll also need to mount some working directory that otherwise +is not exposed by the application. +In the case of *NiPreps*, we typically inform the *BIDS Apps* +to override the work directory by setting the `-w`/`--work-dir` +argument (please note that this is not defined by the *BIDS Apps* +specifications and it may change across applications): + +``` {.shell hl_lines="4 8"} +$ docker run -ti --rm \ + -v path/to/data:/data:ro \ + -v path/to/output:/out \ + -v path/to/work:/work \ # mount from host + nipreps/fmriprep: \ + /data /out/out \ + participant + -w /work # override default directory +``` + +*BIDS Apps* relying on [TemplateFlow](https://www.templateflow.org) +for atlases and templates management may require +the *TemplateFlow Archive* be mounted from the host. +Mounting the *Archive* from the host is an effective way +to preempt the download of your favorite templates in every run: + +``` {.shell hl_lines="5 6"} $ docker run -ti --rm \ -v path/to/data:/data:ro \ -v path/to/output:/out \ + -v path/to/work:/work \ + -v path/to/tf-cache:/opt/templateflow \ # mount from host + -e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home nipreps/fmriprep: \ /data /out/out \ participant + -w /work +``` + +!!! warning "*Docker for Windows* requires enabling Shared Drives" + + On *Windows* installations, the `-v` argument will not work + by default because it is necessary to enable shared drives. + Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them. + +### Running containers as a user +By default, Docker will run the container with the +user id (uid) **0**, which is reserved for the default **root** +account in *Linux*. +In other words, by default *Docker* will use the superuser account +to execute the container and will write files with the corresponding +uid=0 unless configured otherwise. +Executing as superuser may derive in permissions and security issues, +for example, [with *DataLad* (discussed later)](datalad.md#). +One paramount example of permissions issues where beginners typically +run into is deleting files after a containerized execution. +If the uid is not overridden, the outputs of a containerized execution +will be owned by **root** and group **root**. +Therefore, normal users will not be able to modify the output and +superuser permissions will be required to deleted data generated +by the containerized application. +Some shared systems only allow running containers as a normal user +because the user will not be able to action on the outputs otherwise. + +Either way (whether the container is available with default settings +or the execution has been customized to normal users), +running as a normal user allows preempting these permissions issues. +This can be achieved with +[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user): + +``` +--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ] ``` -For example: : +We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set: +``` {.shell hl_lines="4"} +$ docker run -ti --rm \ + -v path/to/data:/data:ro \ + -v path/to/output:/out \ + -u $(id -u):$(id -g) \ # set execution uid:gid + -v path/to/tf-cache:/opt/templateflow \ # mount from host + -e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home + nipreps/fmriprep: \ + /data /out/out \ + participant ``` + +For example: + +``` Shell $ docker run -ti --rm \ -v $HOME/ds005:/data:ro \ -v $HOME/ds005/derivatives:/out \ -v $HOME/tmp/ds005-workdir:/work \ + -u $(id -u):$(id -g) \ + -v $HOME/.cache/templateflow:/opt/templateflow \ + -e TEMPLATEFLOW_HOME=/opt/templateflow \ nipreps/fmriprep: \ /data /out/fmriprep- \ participant \ -w /work ``` +### Application-specific options + Once the Docker Engine arguments are written, the remainder of the -command line follows the [usage](https://fmriprep.readthedocs.io/en/latest/usage.html). -In other words, the first section of the command line is all equivalent to the -`fmriprep` executable in a *bare-metal* installation: : +command line follows the interface defined by the specific +*BIDS App* (for instance, +[*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html) +or [*MRIQC*](https://mriqc.readthedocs.io/en/latest/running.html#command-line-interface)). -``` Shell -$ docker run -ti --rm \ # These lines - -v $HOME/ds005:/data:ro \ # are equivalent to - -v $HOME/ds005/derivatives:/out \ # a call to the App's - -v $HOME/tmp/ds005-workdir:/work \ # entry-point. - nipreps/fmriprep: \ # - \ - /data /out/fmriprep- \ # These lines correspond - participant \ # to the particular BIDS - -w /work # App arguments. -``` \ No newline at end of file +The first section of a call comprehends arguments specific to *Docker*, +and configure the execution of the container: + +``` {.shell hl_lines="1-7"} +$ docker run -ti --rm \ + -v $HOME/ds005:/data:ro \ + -v $HOME/ds005/derivatives:/out \ + -v $HOME/tmp/ds005-workdir:/work \ + -u $(id -u):$(id -g) \ + -v $HOME/.cache/templateflow:/opt/templateflow \ + -e TEMPLATEFLOW_HOME=/opt/templateflow \ + nipreps/fmriprep: \ + /data /out/fmriprep- \ + participant \ + -w /work +``` + +Then, we specify the container image that we execute: + +``` {.shell hl_lines="8"} +$ docker run -ti --rm \ + -v $HOME/ds005:/data:ro \ + -v $HOME/ds005/derivatives:/out \ + -v $HOME/tmp/ds005-workdir:/work \ + -u $(id -u):$(id -g) \ + -v $HOME/.cache/templateflow:/opt/templateflow \ + -e TEMPLATEFLOW_HOME=/opt/templateflow \ + nipreps/fmriprep: \ + /data /out/fmriprep- \ + participant \ + -w /work +``` + +Finally, the application-specific options can be added. +We already described the work directory setting before, in the case +of *NiPreps* such as *MRIQC* and *fMRIPrep*. +Some options are *BIDS Apps* standard, such as +the *analysis level* (`participant` or `group`) +and specific participant identifier(s) (`--participant-label`): + +``` {.shell hl_lines="9-12"} +$ docker run -ti --rm \ + -v $HOME/ds005:/data:ro \ + -v $HOME/ds005/derivatives:/out \ + -v $HOME/tmp/ds005-workdir:/work \ + -u $(id -u):$(id -g) \ + -v $HOME/.cache/templateflow:/opt/templateflow \ + -e TEMPLATEFLOW_HOME=/opt/templateflow \ + nipreps/fmriprep: \ + /data /out/fmriprep- \ + participant \ + --participant-label 001 002 \ + -w /work +``` + +### Resource constraints + +*Docker* may be executed with limited resources. +Please [read the documentation](https://docs.docker.com/engine/containers/resource_constraints/) +to limit resources such as memory, memory policies, number of CPUs, etc. + +**Memory will be a common culprit** when working with large datasets +(+10GB). +However, *Docker* engine is limited to 2GB of RAM by default +for some installations of *Docker* for *MacOSX* and *Windows*. +The general resource settings can be also modified through the *Docker Desktop* +graphical user interface. +On a shell, the memory limit can be overridden with: + +``` +$ service docker stop +$ dockerd --storage-opt dm.basesize=30G +``` From fc90f824ae290590bea3949ba855a47dcd86d103 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Mon, 6 Jan 2025 08:43:59 +0100 Subject: [PATCH 2/6] Apply suggestions from code review Co-authored-by: Yaroslav Halchenko --- docs/apps/datalad.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/apps/datalad.md b/docs/apps/datalad.md index 1fd4b65..fcaf799 100644 --- a/docs/apps/datalad.md +++ b/docs/apps/datalad.md @@ -13,7 +13,7 @@ ## *DataLad* and *Docker* Apps may be able to identify if the input dataset is handled with -*DataLad* or *Git-Annex*, and pull down linked data that has not +[*DataLad*](https://www.datalad.org/) or [*git-annex*](https://git-annex.branchable.com), and pull down linked data that has not been fetched yet. One example of one such application is *MRIQC*, and all the examples on this documentation page will refer to it. @@ -49,7 +49,7 @@ Confusingly, following the suggestion from *DataLad* case, because this line must be executed within the container. However, containers are *transient* and the setting this configuration on *Git* will not be propagated between executions unless advanced -actions are taken (such as mounting a *Home* folder with the necessary settings). +actions are taken (such as mounting a *HOME* folder with the necessary settings). Instead, we can override the default user executing within the container (which is `root`, or uid = 0). From c66ae8e8f7200c5314ec29137f974e7529e34f38 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Mon, 6 Jan 2025 09:31:16 +0100 Subject: [PATCH 3/6] enh: include @yarikoptic's suggestions Co-authored-by: Yaroslav Halchenko --- docs/apps/datalad.md | 14 +++++++++ docs/apps/docker.md | 71 ++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 80 insertions(+), 5 deletions(-) diff --git a/docs/apps/datalad.md b/docs/apps/datalad.md index fcaf799..66714d2 100644 --- a/docs/apps/datalad.md +++ b/docs/apps/datalad.md @@ -27,6 +27,20 @@ we will need to ensure the following settings are observed: * the uid who is *executing MRIQC* within the container must have sufficient permissions to write in the tree. +!!! tip "Check *ReproNim* if the suggestions here did not work" + + The actions suggested here are relatively fundamental, and your + settings may showcase specific circumstances that render them + insufficient. + For a more detailed discussion about orchestrating containers, version + control, and reproducible pipelines, please check + [the *ReproNim* guidelines](https://www.repronim.org/). + + In the particular case of *MRIQC*, please consider updating (if necessary) + and fetching the required data before execution and then + add the `--no-datalad-get` argument to workaround issues with + *DataLad*. + ### Setting a regular user's execution uid If the execution uid does not match the uid of the user who installed diff --git a/docs/apps/docker.md b/docs/apps/docker.md index 221bfaf..3d063d3 100644 --- a/docs/apps/docker.md +++ b/docs/apps/docker.md @@ -101,6 +101,13 @@ The Docker Engine provides mounting filesystems into the container with the `-v` where the trailing `:ro` specifies that the mount is read-only. The mount permissions modifiers can be omitted, which means the mount will have read-write permissions. + +!!! warning "*Docker for Windows* requires enabling Shared Drives" + + On *Windows* installations, the `-v` argument will not work + by default because it is necessary to enable shared drives. + Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them. + In general, you'll want to at least provide two mount-points: one set in read-only mode for the input data and one read/write to store the outputs: @@ -133,7 +140,48 @@ $ docker run -ti --rm \ -w /work # override default directory ``` -*BIDS Apps* relying on [TemplateFlow](https://www.templateflow.org) +!!! tip "Best practices" + + The [*ReproNim* initiative](https://www.repronim.org/) + distributes materials and documentation of best practices + for containerized execution of neuroimaging workflows. + Most of these are organized within the + [*YODA* (Yoda's Organigram on Data Analysis)](https://github.com/myyoda) principles. + + For example, mounting `$PWD` into `$PWD` and setting that path + as current working directory can effectively resolve many issues. + This strategy may be combined with the above suggestion about + the application's work directory as follows: + + ``` {.shell hl_lines="4 5 9"} + $ docker run -ti --rm \ + -v path/to/data:/data:ro \ + -v path/to/output:/out \ + -v $PWD:$PWD \ + -w $PWD \ # DO NOT confuse with the application's work directory + nipreps/fmriprep: \ + /data /out/out \ + participant + -w $PWD/work + ``` + + Mounting `$PWD` may be used with YODA so that all necessary *parts* + in execution are reachable from under `$PWD`. + This effectively + (i) makes it easy to *transfer* configurations from + *outside* the container to the *inside* execution runtime; + (ii) the *outside*/*inside* filesystem trees are homologous, which + makes post-processing and orchestration easier; + (iii) execution in shared systems is easier as everything is + sort of *self-contained*. + + In addition to mounting `$PWD`, other advanced practices + include mounting specific configuration files (for example, a + [*Nipype* configuration file](https://miykael.github.io/nipype_tutorial/notebooks/basic_execution_configuration.html)) + into the appropriate paths within the container. + + +*BIDS Apps* relying on [*TemplateFlow*](https://www.templateflow.org) for atlases and templates management may require the *TemplateFlow Archive* be mounted from the host. Mounting the *Archive* from the host is an effective way @@ -152,13 +200,26 @@ $ docker run -ti --rm \ -w /work ``` -!!! warning "*Docker for Windows* requires enabling Shared Drives" +!!! danger "Sharing the *TemplateFlow* cache can cause race conditions in parallel execution" - On *Windows* installations, the `-v` argument will not work - by default because it is necessary to enable shared drives. - Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them. + When sharing the *TemplateFlow* *HOME* folder across several parallel + executions against a single filesystem, these instance will likely + attempt to fetch unavailable templates without sufficient time between + actions for the data to be fully downloaded (in other words, + data downloads will be *racing* each other). + + To resolve this issue, you will need to make sure all necessary + templates are already downloaded within the cache folder. + If the *TemplateFlow Client* is properly installed in your system, + this is possible with the following command line + (example shows how to fully download `MNI152NLin2009cAsym`: + + ``` Shell + $ templateflow get MNI152NLin2009cAsym + ``` ### Running containers as a user + By default, Docker will run the container with the user id (uid) **0**, which is reserved for the default **root** account in *Linux*. From 3f79fc29a1cfa825b6f0d598e5ed6e70ea61aba4 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Tue, 7 Jan 2025 13:46:49 +0100 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: Chris Markiewicz Co-authored-by: Yaroslav Halchenko --- docs/apps/datalad.md | 15 +++++++++------ docs/apps/docker.md | 38 ++++++++++++++++++-------------------- 2 files changed, 27 insertions(+), 26 deletions(-) diff --git a/docs/apps/datalad.md b/docs/apps/datalad.md index 66714d2..ccedffd 100644 --- a/docs/apps/datalad.md +++ b/docs/apps/datalad.md @@ -29,12 +29,15 @@ we will need to ensure the following settings are observed: !!! tip "Check *ReproNim* if the suggestions here did not work" - The actions suggested here are relatively fundamental, and your - settings may showcase specific circumstances that render them - insufficient. - For a more detailed discussion about orchestrating containers, version - control, and reproducible pipelines, please check - [the *ReproNim* guidelines](https://www.repronim.org/). + The actions suggested here are expected to work in most circumstances, + but your system may have specific circumstances that require additional + or alternative approaches. + For instance, [the *ReproNim* project](https://www.repronim.org/) maintains + [ReproNim/containers](https://github.com/ReproNim/containers), a + *DataLad* dataset with ready-to-use Singularity images for released *BIDS Apps*, *NeuroDesktop* applications, + and other containers. + Its [`README.md`](https://github.com/ReproNim/containers?tab=readme-ov-file#runnable-script) guides through an approach via that dataset with *built-in* execution helper taking care about bind-mounts, + proxying critical *Git* configuration and potentially executing *Singularity* images via *Docker* (e.g., under OSX). In the particular case of *MRIQC*, please consider updating (if necessary) and fetching the required data before execution and then diff --git a/docs/apps/docker.md b/docs/apps/docker.md index 3d063d3..8bfa819 100644 --- a/docs/apps/docker.md +++ b/docs/apps/docker.md @@ -93,9 +93,8 @@ If you need a finer control over the container execution, or you feel comfortabl ### Accessing filesystems in the host within the container -Containers are confined in a sandbox, so they can't access the host -in any ways unless you explicitly prescribe acceptable accesses -to the host. +Containers are confined in a sandbox, so they can't access the data on the host +unless explicitly enabled. The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax: `-v some/path/in/host:/absolute/path/within/container:ro`, where the trailing `:ro` specifies that the mount is read-only. @@ -121,9 +120,10 @@ $ docker run -ti --rm \ participant ``` -When **debugging** or **reusing pre-cached intermediate results**, -you'll also need to mount some working directory that otherwise -is not exposed by the application. +We recommend mounting a work (or scratch) directory for intermediate workflow results. +This is particularly useful for **debugging** or **reusing pre-cached intermediate results**, +but can also be useful to control where these (large) directories get created, +as the default location for files created inside a docker container may not have sufficient space. In the case of *NiPreps*, we typically inform the *BIDS Apps* to override the work directory by setting the `-w`/`--work-dir` argument (please note that this is not defined by the *BIDS Apps* @@ -155,12 +155,10 @@ $ docker run -ti --rm \ ``` {.shell hl_lines="4 5 9"} $ docker run -ti --rm \ - -v path/to/data:/data:ro \ - -v path/to/output:/out \ - -v $PWD:$PWD \ - -w $PWD \ # DO NOT confuse with the application's work directory + -v $PWD:$PWD \ # Mount the current directory with its own name + -w $PWD \ # DO NOT confuse with the application's work directory nipreps/fmriprep: \ - /data /out/out \ + inputs/raw-data outputs/fmriprep \ # With YODA, the inputs are inside the working directory participant -w $PWD/work ``` @@ -185,7 +183,7 @@ $ docker run -ti --rm \ for atlases and templates management may require the *TemplateFlow Archive* be mounted from the host. Mounting the *Archive* from the host is an effective way -to preempt the download of your favorite templates in every run: +to prevent multiple downloads of templates that are not bundled in the image: ``` {.shell hl_lines="5 6"} $ docker run -ti --rm \ @@ -226,21 +224,21 @@ account in *Linux*. In other words, by default *Docker* will use the superuser account to execute the container and will write files with the corresponding uid=0 unless configured otherwise. -Executing as superuser may derive in permissions and security issues, +Executing as superuser may result in permissions and security issues, for example, [with *DataLad* (discussed later)](datalad.md#). One paramount example of permissions issues where beginners typically run into is deleting files after a containerized execution. If the uid is not overridden, the outputs of a containerized execution will be owned by **root** and group **root**. Therefore, normal users will not be able to modify the output and -superuser permissions will be required to deleted data generated +superuser permissions will be required to delete data generated by the containerized application. Some shared systems only allow running containers as a normal user -because the user will not be able to action on the outputs otherwise. +because the user will not be able to operate on the outputs otherwise. -Either way (whether the container is available with default settings -or the execution has been customized to normal users), -running as a normal user allows preempting these permissions issues. +Whether the container is available with default settings, +or the execution has been customized to normal users, +running as a normal user avoids these permissions issues. This can be achieved with [*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user): @@ -286,8 +284,8 @@ command line follows the interface defined by the specific [*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html) or [*MRIQC*](https://mriqc.readthedocs.io/en/latest/running.html#command-line-interface)). -The first section of a call comprehends arguments specific to *Docker*, -and configure the execution of the container: +The first section of a call consists of arguments specific to *Docker*, +which configure the execution of the container: ``` {.shell hl_lines="1-7"} $ docker run -ti --rm \ From 3c9b2533e5b597911001af500d21e545df6b92a1 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Tue, 7 Jan 2025 13:54:23 +0100 Subject: [PATCH 5/6] enh: Docker italicalization & link to installation docs Related-To: https://github.com/nipreps/nipreps.github.io/pull/47/files#r1904836364 Co-Authored-By: McKenzie P. Hagen --- docs/apps/docker.md | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/docs/apps/docker.md b/docs/apps/docker.md index 8bfa819..bc90cd5 100644 --- a/docs/apps/docker.md +++ b/docs/apps/docker.md @@ -5,10 +5,11 @@ ## Before you start: install Docker -Probably, the most popular framework to execute containers is Docker. +Probably, the most popular framework to execute containers is *Docker*. If you are to run a *NiPrep* on your PC/laptop, this is the **RECOMMENDED** way of execution. -Please make sure you follow the Docker installation instructions. -You can check your Docker Runtime installation running their `hello-world` image: +Please make sure you follow the +[*Docker Engine*'s installation instructions](https://docs.docker.com/engine/install/). +You can check your installation running their `hello-world` image: ```Shell $ docker run --rm hello-world @@ -39,7 +40,8 @@ For more examples and ideas, visit: https://docs.docker.com/get-started/ ``` -After checking your Docker Engine is capable of running Docker images, you are ready to pull your first *NiPreps* container image. +After checking your *Docker Engine* is capable of running *Docker* +images, you are ready to pull your first *NiPreps* container image. !!! tip "Troubleshooting" @@ -50,10 +52,10 @@ After checking your Docker Engine is capable of running Docker images, you are r Once verified the problem is not related to the container system, then follow the specific application debugging guidelines. -## Docker images +## *Docker* images -For every new version of the particular *NiPrep* app that is released, a corresponding Docker image is generated. -The Docker image *becomes* a container when the execution engine loads the image and adds an extra layer that makes it *runnable*. In order to run *NiPreps* Docker images, the Docker Runtime must be installed. +For every new version of the particular *NiPreps* application that is released, a corresponding *Docker* image is generated. +The Docker image *becomes* a container when the execution engine loads the image and adds an extra layer that makes it *runnable*. In order to run *NiPreps*' *Docker* images, the *Docker Engine* must be installed. @@ -63,7 +65,7 @@ Taking *fMRIPrep* to illustrate the usage, first you might want to make sure of $ docker pull nipreps/fmriprep: ``` -You can run *NiPreps* interacting directly with the Docker Engine via the `docker run` interface. +You can run *NiPreps* interacting directly with the *Docker Engine* via the `docker run` interface. ## Running a *NiPrep* with a lightweight wrapper @@ -81,21 +83,21 @@ RUNNING: docker run --rm -it -v /path/to/data/dir:/data:ro \ ... ``` -`fmriprep-docker` implements [the unified command-line interface of BIDS Apps](framework.md#a-unified-command-line-interface), and automatically translates directories into Docker mount points for you. +`fmriprep-docker` implements [the unified command-line interface of *BIDS Apps*](framework.md#a-unified-command-line-interface), and automatically translates directories into *Docker* mount points for you. We have published a [step-by-step tutorial](http://reproducibility.stanford.edu/fmriprep-tutorial-running-the-docker-image/) illustrating how to run `fmriprep-docker`. This tutorial also provides valuable troubleshooting insights and advice on what to do after *fMRIPrep* has run. -## Running a *NiPrep* directly interacting with the Docker Engine +## Running a *NiPrep* directly interacting with the *Docker Engine* -If you need a finer control over the container execution, or you feel comfortable with the Docker Engine, avoiding the extra software layer of the wrapper might be a good decision. +If you need a finer control over the container execution, or you feel comfortable with the *Docker Engine*, avoiding the extra software layer of the wrapper might be a good decision. ### Accessing filesystems in the host within the container Containers are confined in a sandbox, so they can't access the data on the host unless explicitly enabled. -The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax: +The *Docker Engine* provides mounting filesystems into the container with the `-v` argument and the following syntax: `-v some/path/in/host:/absolute/path/within/container:ro`, where the trailing `:ro` specifies that the mount is read-only. The mount permissions modifiers can be omitted, which means the mount @@ -218,7 +220,7 @@ $ docker run -ti --rm \ ### Running containers as a user -By default, Docker will run the container with the +By default, *Docker* will run the container with the user id (uid) **0**, which is reserved for the default **root** account in *Linux*. In other words, by default *Docker* will use the superuser account @@ -278,7 +280,7 @@ $ docker run -ti --rm \ ### Application-specific options -Once the Docker Engine arguments are written, the remainder of the +Once the *Docker Engine* arguments are written, the remainder of the command line follows the interface defined by the specific *BIDS App* (for instance, [*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html) @@ -347,7 +349,7 @@ to limit resources such as memory, memory policies, number of CPUs, etc. **Memory will be a common culprit** when working with large datasets (+10GB). -However, *Docker* engine is limited to 2GB of RAM by default +However, the *Docker Engine* is limited to 2GB of RAM by default for some installations of *Docker* for *MacOSX* and *Windows*. The general resource settings can be also modified through the *Docker Desktop* graphical user interface. From b3b52d8ef079a60b5b97113a59fc8ce7ad27a1d7 Mon Sep 17 00:00:00 2001 From: Oscar Esteban Date: Tue, 7 Jan 2025 13:56:19 +0100 Subject: [PATCH 6/6] Apply suggestions from code review Co-authored-by: Chris Markiewicz --- docs/apps/docker.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/apps/docker.md b/docs/apps/docker.md index bc90cd5..f6f8df8 100644 --- a/docs/apps/docker.md +++ b/docs/apps/docker.md @@ -182,7 +182,7 @@ $ docker run -ti --rm \ *BIDS Apps* relying on [*TemplateFlow*](https://www.templateflow.org) -for atlases and templates management may require +for atlas and template management may require the *TemplateFlow Archive* be mounted from the host. Mounting the *Archive* from the host is an effective way to prevent multiple downloads of templates that are not bundled in the image: