Skip to content

Commit 82de14f

Browse files
committed
enh: update datatalad and docker guidelines
1 parent 9af55c7 commit 82de14f

File tree

2 files changed

+212
-67
lines changed

2 files changed

+212
-67
lines changed

docs/apps/datalad.md

Lines changed: 18 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,4 @@
1-
Apps may be able to identify if the input dataset is handled with
2-
*DataLad* or *Git-Annex*, and pull down linked data that has not
3-
been fetched yet.
4-
One example of one such application is *MRIQC*, and all the examples
5-
on this documentation page will refer to it.
6-
7-
!!! important "Summary"
1+
!!! abstract "Summary"
82

93
Executing *BIDS-Apps* leveraging *DataLad*-controlled datasets
104
within containers can be tricky.
@@ -18,6 +12,12 @@ on this documentation page will refer to it.
1812

1913
## *DataLad* and *Docker*
2014

15+
Apps may be able to identify if the input dataset is handled with
16+
*DataLad* or *Git-Annex*, and pull down linked data that has not
17+
been fetched yet.
18+
One example of one such application is *MRIQC*, and all the examples
19+
on this documentation page will refer to it.
20+
2121
When executing *MRIQC* within *Docker* on a *DataLad* dataset
2222
(for instance, installed from [*OpenNeuro*](https://openneuro.org)),
2323
we will need to ensure the following settings are observed:
@@ -27,9 +27,12 @@ we will need to ensure the following settings are observed:
2727
* the uid who is *executing MRIQC* within the container must
2828
have sufficient permissions to write in the tree.
2929

30-
### Setting execution uid
30+
### Setting a regular user's execution uid
3131

32-
If the uid is not correct, we will likely encounter the following error:
32+
If the execution uid does not match the uid of the user who installed
33+
the *DataLad* dataset, we will likely encounter the following error
34+
with relatively recent
35+
[*Git* versions (+2.35.2)](https://github.blog/open-source/git/git-security-vulnerability-announced/#):
3336

3437
```
3538
datalad.runner.exception.CommandError: CommandError: 'git -c diff.ignoreSubmodules=none -c core.quotepath=false -c annex.merge-annex-branches=false annex find --not --in . --json --json-error-messages -c annex.dotfiles=true -- sub-0001/func/sub-0001_task-restingstate_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-emomatching_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-restingstate_acq-mb3_bold.nii.gz sub-0001/func/sub-0001_task-emomatching_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-faces_acq-mb3_bold.nii.gz sub-0001/dwi/sub-0001_dwi.nii.gz sub-0002/func/sub-0002_task-workingmemory_acq-seq_bold.nii.gz sub-0001/anat/sub-0001_T1w.nii.gz sub-0002/anat/sub-0002_T1w.nii.gz sub-0001/func/sub-0001_task-gstroop_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-faces_acq-mb3_bold.nii.gz sub-0002/func/sub-0002_task-anticipation_acq-seq_bold.nii.gz sub-0002/dwi/sub-0002_dwi.nii.gz sub-0001/func/sub-0001_task-anticipation_acq-seq_bold.nii.gz sub-0001/func/sub-0001_task-workingmemory_acq-seq_bold.nii.gz sub-0002/func/sub-0002_task-gstroop_acq-seq_bold.nii.gz' failed with exitcode 1 under /data [info keys: stdout_json] [err: 'git-annex: Git refuses to operate in this repository, probably because it is owned by someone else.
@@ -40,20 +43,16 @@ git config --global --add safe.directory /data
4043
git-annex: automatic initialization failed due to above problems']
4144
```
4245

43-
Confusingly, following the suggestion from *DataLad* directly on the host
44-
(`git config --global --add safe.directory /data`) will not work in this
46+
Confusingly, following the suggestion from *DataLad*
47+
(just propagated from *Git*) of executing
48+
`git config --global --add safe.directory /data` will not work in this
4549
case, because this line must be executed within the container.
50+
However, containers are *transient* and the setting this configuration
51+
on *Git* will not be propagated between executions unless advanced
52+
actions are taken (such as mounting a *Home* folder with the necessary settings).
4653

4754
Instead, we can override the default user executing within the container
4855
(which is `root`, or uid = 0).
49-
This can be achieved with
50-
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):
51-
52-
```
53-
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
54-
```
55-
56-
We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set.
5756
Let's update the last example in the previous
5857
[*Docker* execution section](docker.md#running-a-niprep-directly-interacting-with-the-docker-engine):
5958

docs/apps/docker.md

Lines changed: 194 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
!!! important "Summary"
1+
!!! abstract "Summary"
22

33
Here, we describe how to run *NiPreps* with Docker containers.
44
To illustrate the process, we will show the execution of *fMRIPrep*, but these guidelines extend to any other end-user *NiPrep*.
@@ -41,6 +41,15 @@ For more examples and ideas, visit:
4141

4242
After checking your Docker Engine is capable of running Docker images, you are ready to pull your first *NiPreps* container image.
4343

44+
!!! tip "Troubleshooting"
45+
46+
If you encounter issues while executing a containerized application,
47+
it is critical to identify where the fault is sourced.
48+
For issues emerging from the *Docker Engine*, please read the
49+
[corresponding troubleshooting guidelines](https://docs.docker.com/desktop/troubleshoot-and-support/troubleshoot/#volumes).
50+
Once verified the problem is not related to the container system,
51+
then follow the specific application debugging guidelines.
52+
4453
## Docker images
4554

4655
For every new version of the particular *NiPrep* app that is released, a corresponding Docker image is generated.
@@ -82,73 +91,210 @@ This tutorial also provides valuable troubleshooting insights and advice on what
8291

8392
If you need a finer control over the container execution, or you feel comfortable with the Docker Engine, avoiding the extra software layer of the wrapper might be a good decision.
8493

85-
**Accessing filesystems in the host within the container**:
86-
Containers are confined in a sandbox, so they can't access the host in any ways
87-
unless you explicitly prescribe acceptable accesses to the host. The
88-
Docker Engine provides mounting filesystems into the container with the
89-
`-v` argument and the following syntax:
90-
`-v some/path/in/host:/absolute/path/within/container:ro`, where the
91-
trailing `:ro` specifies that the mount is read-only. The mount
92-
permissions modifiers can be omitted, which means the mount will have
93-
read-write permissions. In general, you'll want to at least provide two
94-
mount-points: one set in read-only mode for the input data and one
95-
read/write to store the outputs. Potentially, you'll want to provide
96-
one or two more mount-points: one for the working directory, in case you
97-
need to debug some issue or reuse pre-cached results; and a
98-
[TemplateFlow](https://www.templateflow.org) folder to preempt the
99-
download of your favorite templates in every run.
100-
101-
**Running containers as a user**:
102-
By default, Docker will run the
103-
container as **root**. Some share systems my limit this feature and only
104-
allow running containers as a user. When the container is run as
105-
**root**, files written out to filesystems mounted from the host will
106-
have the user id `1000` by default. In other words, you'll need to be
107-
able to run as root in the host to change permissions or manage these
108-
files. Alternatively, running as a user allows preempting these
109-
permissions issues. It is possible to run as a user with the `-u`
110-
argument. In general, we will want to use the same user ID as the
111-
running user in the host to ensure the ownership of files written during
112-
the container execution. Therefore, you will generally run the container
113-
with `-u $( id -u )`.
114-
115-
You may also invoke `docker` directly:
94+
### Accessing filesystems in the host within the container
95+
96+
Containers are confined in a sandbox, so they can't access the host
97+
in any ways unless you explicitly prescribe acceptable accesses
98+
to the host.
99+
The Docker Engine provides mounting filesystems into the container with the `-v` argument and the following syntax:
100+
`-v some/path/in/host:/absolute/path/within/container:ro`,
101+
where the trailing `:ro` specifies that the mount is read-only.
102+
The mount permissions modifiers can be omitted, which means the mount
103+
will have read-write permissions.
104+
In general, you'll want to at least provide two mount-points:
105+
one set in read-only mode for the input data and one read/write
106+
to store the outputs:
107+
108+
``` {.shell hl_lines="2 3"}
109+
$ docker run -ti --rm \
110+
-v path/to/data:/data:ro \ # read-only, for data
111+
-v path/to/output:/out \ # read-write, for outputs
112+
nipreps/fmriprep:<latest-version> \
113+
/data /out/out \
114+
participant
115+
```
116116

117-
``` Shell
117+
When **debugging** or **reusing pre-cached intermediate results**,
118+
you'll also need to mount some working directory that otherwise
119+
is not exposed by the application.
120+
In the case of *NiPreps*, we typically inform the *BIDS Apps*
121+
to override the work directory by setting the `-w`/`--work-dir`
122+
argument (please note that this is not defined by the *BIDS Apps*
123+
specifications and it may change across applications):
124+
125+
``` {.shell hl_lines="4 8"}
126+
$ docker run -ti --rm \
127+
-v path/to/data:/data:ro \
128+
-v path/to/output:/out \
129+
-v path/to/work:/work \ # mount from host
130+
nipreps/fmriprep:<latest-version> \
131+
/data /out/out \
132+
participant
133+
-w /work # override default directory
134+
```
135+
136+
*BIDS Apps* relying on [TemplateFlow](https://www.templateflow.org)
137+
for atlases and templates management may require
138+
the *TemplateFlow Archive* be mounted from the host.
139+
Mounting the *Archive* from the host is an effective way
140+
to preempt the download of your favorite templates in every run:
141+
142+
``` {.shell hl_lines="5 6"}
118143
$ docker run -ti --rm \
119144
-v path/to/data:/data:ro \
120145
-v path/to/output:/out \
146+
-v path/to/work:/work \
147+
-v path/to/tf-cache:/opt/templateflow \ # mount from host
148+
-e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home
121149
nipreps/fmriprep:<latest-version> \
122150
/data /out/out \
123151
participant
152+
-w /work
153+
```
154+
155+
!!! warning "*Docker for Windows* requires enabling Shared Drives"
156+
157+
On *Windows* installations, the `-v` argument will not work
158+
by default because it is necessary to enable shared drives.
159+
Please check on this [Stackoverflow post](https://stackoverflow.com/a/51822083) how to enable them.
160+
161+
### Running containers as a user
162+
By default, Docker will run the container with the
163+
user id (uid) **0**, which is reserved for the default **root**
164+
account in *Linux*.
165+
In other words, by default *Docker* will use the superuser account
166+
to execute the container and will write files with the corresponding
167+
uid=0 unless configured otherwise.
168+
Executing as superuser may derive in permissions and security issues,
169+
for example, [with *DataLad* (discussed later)](datalad.md#).
170+
One paramount example of permissions issues where beginners typically
171+
run into is deleting files after a containerized execution.
172+
If the uid is not overridden, the outputs of a containerized execution
173+
will be owned by **root** and group **root**.
174+
Therefore, normal users will not be able to modify the output and
175+
superuser permissions will be required to deleted data generated
176+
by the containerized application.
177+
Some shared systems only allow running containers as a normal user
178+
because the user will not be able to action on the outputs otherwise.
179+
180+
Either way (whether the container is available with default settings
181+
or the execution has been customized to normal users),
182+
running as a normal user allows preempting these permissions issues.
183+
This can be achieved with
184+
[*Docker*'s `-u`/`--user` option](https://docs.docker.com/engine/containers/run/#user):
185+
186+
```
187+
--user=[ user | user:group | uid | uid:gid | user:gid | uid:group ]
124188
```
125189

126-
For example: :
190+
We can combine this option with *Bash*'s `id` command to ensure the current user's uid and group id (gid) are being set:
127191

192+
``` {.shell hl_lines="4"}
193+
$ docker run -ti --rm \
194+
-v path/to/data:/data:ro \
195+
-v path/to/output:/out \
196+
-u $(id -u):$(id -g) \ # set execution uid:gid
197+
-v path/to/tf-cache:/opt/templateflow \ # mount from host
198+
-e TEMPLATEFLOW_HOME=/opt/templateflow \ # override TF home
199+
nipreps/fmriprep:<latest-version> \
200+
/data /out/out \
201+
participant
128202
```
203+
204+
For example:
205+
206+
``` Shell
129207
$ docker run -ti --rm \
130208
-v $HOME/ds005:/data:ro \
131209
-v $HOME/ds005/derivatives:/out \
132210
-v $HOME/tmp/ds005-workdir:/work \
211+
-u $(id -u):$(id -g) \
212+
-v $HOME/.cache/templateflow:/opt/templateflow \
213+
-e TEMPLATEFLOW_HOME=/opt/templateflow \
133214
nipreps/fmriprep:<latest-version> \
134215
/data /out/fmriprep-<latest-version> \
135216
participant \
136217
-w /work
137218
```
138219

220+
### Application-specific options
221+
139222
Once the Docker Engine arguments are written, the remainder of the
140-
command line follows the [usage](https://fmriprep.readthedocs.io/en/latest/usage.html).
141-
In other words, the first section of the command line is all equivalent to the
142-
`fmriprep` executable in a *bare-metal* installation: :
223+
command line follows the interface defined by the specific
224+
*BIDS App* (for instance,
225+
[*fMRIPrep*](https://fmriprep.readthedocs.io/en/latest/usage.html)
226+
or [*MRIQC*](https://mriqc.readthedocs.io/en/latest/running.html#command-line-interface)).
143227

144-
``` Shell
145-
$ docker run -ti --rm \ # These lines
146-
-v $HOME/ds005:/data:ro \ # are equivalent to
147-
-v $HOME/ds005/derivatives:/out \ # a call to the App's
148-
-v $HOME/tmp/ds005-workdir:/work \ # entry-point.
149-
nipreps/fmriprep:<latest-version> \ #
150-
\
151-
/data /out/fmriprep-<latest-version> \ # These lines correspond
152-
participant \ # to the particular BIDS
153-
-w /work # App arguments.
154-
```
228+
The first section of a call comprehends arguments specific to *Docker*,
229+
and configure the execution of the container:
230+
231+
``` {.shell hl_lines="1-7"}
232+
$ docker run -ti --rm \
233+
-v $HOME/ds005:/data:ro \
234+
-v $HOME/ds005/derivatives:/out \
235+
-v $HOME/tmp/ds005-workdir:/work \
236+
-u $(id -u):$(id -g) \
237+
-v $HOME/.cache/templateflow:/opt/templateflow \
238+
-e TEMPLATEFLOW_HOME=/opt/templateflow \
239+
nipreps/fmriprep:<latest-version> \
240+
/data /out/fmriprep-<latest-version> \
241+
participant \
242+
-w /work
243+
```
244+
245+
Then, we specify the container image that we execute:
246+
247+
``` {.shell hl_lines="8"}
248+
$ docker run -ti --rm \
249+
-v $HOME/ds005:/data:ro \
250+
-v $HOME/ds005/derivatives:/out \
251+
-v $HOME/tmp/ds005-workdir:/work \
252+
-u $(id -u):$(id -g) \
253+
-v $HOME/.cache/templateflow:/opt/templateflow \
254+
-e TEMPLATEFLOW_HOME=/opt/templateflow \
255+
nipreps/fmriprep:<latest-version> \
256+
/data /out/fmriprep-<latest-version> \
257+
participant \
258+
-w /work
259+
```
260+
261+
Finally, the application-specific options can be added.
262+
We already described the work directory setting before, in the case
263+
of *NiPreps* such as *MRIQC* and *fMRIPrep*.
264+
Some options are *BIDS Apps* standard, such as
265+
the *analysis level* (`participant` or `group`)
266+
and specific participant identifier(s) (`--participant-label`):
267+
268+
``` {.shell hl_lines="9-12"}
269+
$ docker run -ti --rm \
270+
-v $HOME/ds005:/data:ro \
271+
-v $HOME/ds005/derivatives:/out \
272+
-v $HOME/tmp/ds005-workdir:/work \
273+
-u $(id -u):$(id -g) \
274+
-v $HOME/.cache/templateflow:/opt/templateflow \
275+
-e TEMPLATEFLOW_HOME=/opt/templateflow \
276+
nipreps/fmriprep:<latest-version> \
277+
/data /out/fmriprep-<latest-version> \
278+
participant \
279+
--participant-label 001 002 \
280+
-w /work
281+
```
282+
283+
### Resource constraints
284+
285+
*Docker* may be executed with limited resources.
286+
Please [read the documentation](https://docs.docker.com/engine/containers/resource_constraints/)
287+
to limit resources such as memory, memory policies, number of CPUs, etc.
288+
289+
**Memory will be a common culprit** when working with large datasets
290+
(+10GB).
291+
However, *Docker* engine is limited to 2GB of RAM by default
292+
for some installations of *Docker* for *MacOSX* and *Windows*.
293+
The general resource settings can be also modified through the *Docker Desktop*
294+
graphical user interface.
295+
On a shell, the memory limit can be overridden with:
296+
297+
```
298+
$ service docker stop
299+
$ dockerd --storage-opt dm.basesize=30G
300+
```

0 commit comments

Comments
 (0)