-
Notifications
You must be signed in to change notification settings - Fork 11
ENH: Centralize fMRIPrep's and MRIQC's guidelines for Docker & DataLad #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
82de14f
fc90f82
c66ae8e
3f79fc2
3c9b253
b3b52d8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
!!! important "Summary" | ||
!!! abstract "Summary" | ||
|
||
Here, we describe how to run *NiPreps* with Docker containers. | ||
To illustrate the process, we will show the execution of *fMRIPrep*, but these guidelines extend to any other end-user *NiPrep*. | ||
|
@@ -41,6 +41,15 @@ For more examples and ideas, visit: | |
|
||
After checking your Docker Engine is capable of running Docker images, you are ready to pull your first *NiPreps* container image. | ||
|
||
!!! tip "Troubleshooting" | ||
|
||
If you encounter issues while executing a containerized application, | ||
it is critical to identify where the fault is sourced. | ||
For issues emerging from the *Docker Engine*, please read the | ||
[corresponding troubleshooting guidelines](https://docs.docker.com/desktop/troubleshoot-and-support/troubleshoot/#volumes). | ||
Once verified the problem is not related to the container system, | ||
then follow the specific application debugging guidelines. | ||
|
||
## Docker images | ||
|
||
For every new version of the particular *NiPrep* app that is released, a corresponding Docker image is generated. | ||
|
@@ -82,73 +91,210 @@ This tutorial also provides valuable troubleshooting insights and advice on what | |
|
||
If you need a finer control over the container execution, or you feel comfortable with the Docker Engine, avoiding the extra software layer of the wrapper might be a good decision. | ||
|
||
**Accessing filesystems in the host within the container**: | ||
Containers are confined in a sandbox, so they can't access the host in any ways | ||
unless you explicitly prescribe acceptable accesses to the host. The | ||
Docker Engine provides mounting filesystems into the container with the | ||
`-v` argument and the following syntax: | ||
`-v some/path/in/host:/absolute/path/within/container:ro`, where the | ||
trailing `:ro` specifies that the mount is read-only. The mount | ||
permissions modifiers can be omitted, which means the mount will have | ||
read-write permissions. In general, you'll want to at least provide two | ||
mount-points: one set in read-only mode for the input data and one | ||
read/write to store the outputs. Potentially, you'll want to provide | ||
one or two more mount-points: one for the working directory, in case you | ||
need to debug some issue or reuse pre-cached results; and a | ||
[TemplateFlow](https://www.templateflow.org) folder to preempt the | ||
download of your favorite templates in every run. | ||
|
||
**Running containers as a user**: | ||
By default, Docker will run the | ||
container as **root**. Some share systems my limit this feature and only | ||
allow running containers as a user. When the container is run as | ||
**root**, files written out to filesystems mounted from the host will | ||
have the user id `1000` by default. In other words, you'll need to be | ||
able to run as root in the host to change permissions or manage these | ||
files. Alternatively, running as a user allows preempting these | ||
permissions issues. It is possible to run as a user with the `-u` | ||
argument. In general, we will want to use the same user ID as the | ||
running user in the host to ensure the ownership of files written during | ||
the container execution. Therefore, you will generally run the container | ||
with `-u $( id -u )`. | ||
|
||
You may also invoke `docker` directly: | ||
### Accessing filesystems in the host within the container | ||
|
||
Containers are confined in a sandbox, so they can't access the host | ||
in any ways unless you explicitly prescribe acceptable accesses | ||
to the host. | ||
The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax: | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`-v some/path/in/host:/absolute/path/within/container:ro`, | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
where the trailing `:ro` specifies that the mount is read-only. | ||
The mount permissions modifiers can be omitted, which means the mount | ||
will have read-write permissions. | ||
In general, you'll want to at least provide two mount-points: | ||
one set in read-only mode for the input data and one read/write | ||
to store the outputs: | ||
|
||
``` {.shell hl_lines="2 3"} | ||
$ docker run -ti --rm \ | ||
-v path/to/data:/data:ro \ # read-only, for data | ||
-v path/to/output:/out \ # read-write, for outputs | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/out \ | ||
participant | ||
``` | ||
|
||
``` Shell | ||
When **debugging** or **reusing pre-cached intermediate results**, | ||
you'll also need to mount some working directory that otherwise | ||
is not exposed by the application. | ||
In the case of *NiPreps*, we typically inform the *BIDS Apps* | ||
to override the work directory by setting the `-w`/`--work-dir` | ||
argument (please note that this is not defined by the *BIDS Apps* | ||
specifications and it may change across applications): | ||
|
||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` {.shell hl_lines="4 8"} | ||
$ docker run -ti --rm \ | ||
-v path/to/data:/data:ro \ | ||
-v path/to/output:/out \ | ||
-v path/to/work:/work \ # mount from host | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/out \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw -- isn't there There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I had the freesurfer license issue in mind. At the moment it's defined in fMRIPrep's documentation but we probably want to bubble it up here. |
||
participant | ||
-w /work # override default directory | ||
``` | ||
|
||
*BIDS Apps* relying on [TemplateFlow](https://www.templateflow.org) | ||
for atlases and templates management may require | ||
the *TemplateFlow Archive* be mounted from the host. | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Mounting the *Archive* from the host is an effective way | ||
to preempt the download of your favorite templates in every run: | ||
|
||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` {.shell hl_lines="5 6"} | ||
$ docker run -ti --rm \ | ||
-v path/to/data:/data:ro \ | ||
-v path/to/output:/out \ | ||
-v path/to/work:/work \ | ||
-v path/to/tf-cache:/opt/templateflow \ # mount from host | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/out \ | ||
participant | ||
-w /work | ||
``` | ||
|
||
!!! warning "*Docker for Windows* requires enabling Shared Drives" | ||
|
||
On *Windows* installations, the `-v` argument will not work | ||
by default because it is necessary to enable shared drives. | ||
Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them. | ||
|
||
### Running containers as a user | ||
By default, Docker will run the container with the | ||
user id (uid) **0**, which is reserved for the default **root** | ||
account in *Linux*. | ||
In other words, by default *Docker* will use the superuser account | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unless the actual docker there is
I am yet to try podman on hpc/for hpc, using primarily for services, but googled into https://github.com/NERSC/podman-hpc ... worth researching and at least pointing users that there is another OCI solution podman which might give them easier means to run compute on their infrastructure. |
||
to execute the container and will write files with the corresponding | ||
uid=0 unless configured otherwise. | ||
Executing as superuser may derive in permissions and security issues, | ||
for example, [with *DataLad* (discussed later)](datalad.md#). | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
One paramount example of permissions issues where beginners typically | ||
run into is deleting files after a containerized execution. | ||
If the uid is not overridden, the outputs of a containerized execution | ||
will be owned by **root** and group **root**. | ||
Therefore, normal users will not be able to modify the output and | ||
superuser permissions will be required to deleted data generated | ||
by the containerized application. | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Some shared systems only allow running containers as a normal user | ||
because the user will not be able to action on the outputs otherwise. | ||
|
||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Either way (whether the container is available with default settings | ||
or the execution has been customized to normal users), | ||
running as a normal user allows preempting these permissions issues. | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This can be achieved with | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user): | ||
|
||
``` | ||
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ] | ||
``` | ||
|
||
For example: : | ||
We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set: | ||
|
||
``` {.shell hl_lines="4"} | ||
$ docker run -ti --rm \ | ||
-v path/to/data:/data:ro \ | ||
-v path/to/output:/out \ | ||
-u $(id -u):$(id -g) \ # set execution uid:gid | ||
-v path/to/tf-cache:/opt/templateflow \ # mount from host | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/out \ | ||
participant | ||
``` | ||
|
||
For example: | ||
|
||
``` Shell | ||
$ docker run -ti --rm \ | ||
-v $HOME/ds005:/data:ro \ | ||
-v $HOME/ds005/derivatives:/out \ | ||
-v $HOME/tmp/ds005-workdir:/work \ | ||
-u $(id -u):$(id -g) \ | ||
-v $HOME/.cache/templateflow:/opt/templateflow \ | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/fmriprep-<latest-version> \ | ||
participant \ | ||
-w /work | ||
``` | ||
|
||
### Application-specific options | ||
|
||
Once the Docker Engine arguments are written, the remainder of the | ||
command line follows the [usage](https://fmriprep.readthedocs.io/en/latest/usage.html). | ||
In other words, the first section of the command line is all equivalent to the | ||
`fmriprep` executable in a *bare-metal* installation: : | ||
command line follows the interface defined by the specific | ||
*BIDS App* (for instance, | ||
[*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html) | ||
or [*MRIQC*](https://mriqc.readthedocs.io/en/latest/running.html#command-line-interface)). | ||
|
||
``` Shell | ||
$ docker run -ti --rm \ # These lines | ||
-v $HOME/ds005:/data:ro \ # are equivalent to | ||
-v $HOME/ds005/derivatives:/out \ # a call to the App's | ||
-v $HOME/tmp/ds005-workdir:/work \ # entry-point. | ||
nipreps/fmriprep:<latest-version> \ # | ||
\ | ||
/data /out/fmriprep-<latest-version> \ # These lines correspond | ||
participant \ # to the particular BIDS | ||
-w /work # App arguments. | ||
``` | ||
The first section of a call comprehends arguments specific to *Docker*, | ||
and configure the execution of the container: | ||
oesteban marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
``` {.shell hl_lines="1-7"} | ||
$ docker run -ti --rm \ | ||
-v $HOME/ds005:/data:ro \ | ||
-v $HOME/ds005/derivatives:/out \ | ||
-v $HOME/tmp/ds005-workdir:/work \ | ||
-u $(id -u):$(id -g) \ | ||
-v $HOME/.cache/templateflow:/opt/templateflow \ | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/fmriprep-<latest-version> \ | ||
participant \ | ||
-w /work | ||
``` | ||
|
||
Then, we specify the container image that we execute: | ||
|
||
``` {.shell hl_lines="8"} | ||
$ docker run -ti --rm \ | ||
-v $HOME/ds005:/data:ro \ | ||
-v $HOME/ds005/derivatives:/out \ | ||
-v $HOME/tmp/ds005-workdir:/work \ | ||
-u $(id -u):$(id -g) \ | ||
-v $HOME/.cache/templateflow:/opt/templateflow \ | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/fmriprep-<latest-version> \ | ||
participant \ | ||
-w /work | ||
``` | ||
|
||
Finally, the application-specific options can be added. | ||
We already described the work directory setting before, in the case | ||
of *NiPreps* such as *MRIQC* and *fMRIPrep*. | ||
Some options are *BIDS Apps* standard, such as | ||
the *analysis level* (`participant` or `group`) | ||
and specific participant identifier(s) (`--participant-label`): | ||
|
||
``` {.shell hl_lines="9-12"} | ||
$ docker run -ti --rm \ | ||
-v $HOME/ds005:/data:ro \ | ||
-v $HOME/ds005/derivatives:/out \ | ||
-v $HOME/tmp/ds005-workdir:/work \ | ||
-u $(id -u):$(id -g) \ | ||
-v $HOME/.cache/templateflow:/opt/templateflow \ | ||
-e TEMPLATEFLOW_HOME=/opt/templateflow \ | ||
nipreps/fmriprep:<latest-version> \ | ||
/data /out/fmriprep-<latest-version> \ | ||
participant \ | ||
--participant-label 001 002 \ | ||
-w /work | ||
``` | ||
|
||
### Resource constraints | ||
|
||
*Docker* may be executed with limited resources. | ||
Please [read the documentation](https://docs.docker.com/engine/containers/resource_constraints/) | ||
to limit resources such as memory, memory policies, number of CPUs, etc. | ||
|
||
**Memory will be a common culprit** when working with large datasets | ||
(+10GB). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another shameless plug which might be of interest/help. Inspired by our reproman, and BrainLife's helper to monitor execution of compute, we created a simple helper https://github.com/con/duct which could be of help to monitor/identify memory and cpu requirements for e.g. future informed specification for job parameters or plotting resource consumption during compute. It also takes care about storing stdout/stderr outputs produced, thus making it possible (if used along with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you finally send an additional PR, I'm happy to see this documented there :) |
||
However, *Docker* engine is limited to 2GB of RAM by default | ||
for some installations of *Docker* for *MacOSX* and *Windows*. | ||
The general resource settings can be also modified through the *Docker Desktop* | ||
graphical user interface. | ||
On a shell, the memory limit can be overridden with: | ||
|
||
``` | ||
$ service docker stop | ||
$ dockerd --storage-opt dm.basesize=30G | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could/should I propose (separate PR on top?) in this section and/or https://github.com/nipreps/nipreps.github.io/blob/HEAD/docs/apps/singularity.md file to mention our https://github.com/ReproNim/containers which contains (automatically updates) pre-created singularity images for all bids-apps (thus including fmriprep, mriqc), and providing some helpers to streamline their use and more guaranteed reproducibility (isolated environment execution etc).
Note that wrapper also tries to support non-Linux systems (OSX) where we could run singularity under docker. Or could also be used on Linux if there is no singularity installation.
https://github.com/ReproNim/containers?tab=readme-ov-file#runnable-script provides a "typical" use example based on mriqc.
https://github.com/OpenNeuroDerivatives/ by @jbwexler (and @effigies ?) use that ReproNim/containers as a subdatset archive of the images with
reproman run
for execution.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also that should include YODA aspects whenever talking about containers... with them it becomes possible to encapsulate all digital objects nicely and reproducibly (there is no guarantee that docker:// would later give you the images used etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Mentioning ReproNim on a separate PR would be fantastic.