Skip to content

Commit 5e34541

Browse files
asmacdoclaude
andcommitted
Add design doc for containers-run support
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 0e50464 commit 5e34541

File tree

1 file changed

+211
-0
lines changed

1 file changed

+211
-0
lines changed

design/containers-run.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
# Design: `datalad containers-run` support in BABS
2+
3+
Issue: https://github.com/PennLINC/babs/issues/328
4+
5+
## Goal
6+
7+
Use `datalad containers-run` instead of direct `singularity run` for BIDS app
8+
execution. This provides:
9+
10+
- **Dynamic container path discovery**`datalad containers-list` reads the
11+
path from the container dataset's `.datalad/config`, so BABS works with any
12+
container dataset layout (not just the hardcoded
13+
`containers/.datalad/environments/{name}/image`).
14+
- **Provenance tracking** — datalad records exactly which container ran and how.
15+
- **Compatibility with repronim/containers** — which uses
16+
`images/bids/{name}.sif` instead of the default datalad-containers layout.
17+
18+
## How it works
19+
20+
### During `babs init`
21+
22+
1. Clone container dataset as subdataset of analysis (unchanged)
23+
2. `datalad containers-list` discovers the image path in the container dataset
24+
3. `datalad containers-add` registers the container at the analysis level with:
25+
- The discovered image path (relative to analysis)
26+
- A `call_fmt` built from the user's `singularity_args` config
27+
4. `datalad get` fetches the container image content so it's available for job
28+
symlinks
29+
30+
### During job execution (`participant_job.sh`)
31+
32+
Three-step flow replaces the single `datalad run` that called a combined
33+
singularity+zip script:
34+
35+
1. **`datalad containers-run`** — runs the BIDS app. Container is accessed via
36+
a PROJECT_ROOT symlink to the analysis dataset's fetched image.
37+
2. **`datalad run`** — calls a zip-only script (no longer contains singularity
38+
invocation or output removal).
39+
3. **`git rm --sparse`** — removes raw outputs. Needed because `datalad run
40+
--explicit` doesn't track file deletions. The `--sparse` flag is required
41+
because outputs are outside the sparse-checkout cone.
42+
43+
## Prior work
44+
45+
An initial proof-of-concept was developed before upstream's sparse-checkout PR
46+
(#337) landed: https://github.com/asmacdo/babs/pull/6
47+
48+
That POC was validated end-to-end on a SLURM cluster (Discovery) with both
49+
handmade container datasets and repronim/containers. Key findings from that
50+
work:
51+
52+
- **Concurrent `datalad get` failures** — when multiple jobs run simultaneously,
53+
they all try to fetch the same container image. Git-annex uses transfer locks,
54+
causing all but one job to fail. This is a known git-annex bug:
55+
https://git-annex.branchable.com/bugs/concurrent_get_from_separate_clones_fails/
56+
The POC solved this with ephemeral clones for the containers subdataset.
57+
58+
- **Ephemeral clones work for containers but not inputs** — container images are
59+
resolved on the host before singularity launches, so annex symlinks outside
60+
`$PWD` are fine. Input data needs to be accessible inside the container with
61+
`--containall -B $PWD`, so symlinks outside `$PWD` break.
62+
63+
- **`datalad run --explicit` doesn't track deletions** — discovered and
64+
documented as an upstream issue. The workaround (separate `git rm` step) is
65+
carried forward into this implementation.
66+
67+
The current implementation was recreated from scratch on top of the
68+
sparse-checkout changes, using the POC as reference. The PROJECT_ROOT symlink
69+
approach (from upstream) combined with `datalad get` of the image during init
70+
avoids the concurrent get problem entirely — jobs never call `datalad get` for
71+
the container image, they just follow the symlink to the pre-fetched content.
72+
Ephemeral clones remain a potential future alternative.
73+
74+
## Problems encountered
75+
76+
### `git sparse-checkout` interaction
77+
78+
The upstream sparse-checkout PR (#337) changed job clones to use
79+
`--no-checkout` + `git sparse-checkout`. This caused two issues:
80+
81+
- **`.datalad/config` not checked out** — cone mode doesn't include
82+
dot-directories by default. Fixed by adding `.datalad` to the sparse-checkout
83+
set. Without it, `containers-run` fails with "No known containers."
84+
85+
- **`git rm` blocked by sparse-checkout** — outputs created by `containers-run`
86+
are outside the cone, so `git rm` refuses to touch them. Fixed with
87+
`git rm --sparse`.
88+
89+
### Container image access in job clones
90+
91+
Job clones don't have the container image content — they clone from input RIA
92+
with `--no-checkout`. The existing PROJECT_ROOT symlink approach works: symlink
93+
from the job clone's container path to the analysis dataset's copy. But this
94+
requires the analysis dataset to have the actual content (not just an annex
95+
pointer), so `datalad get` during init is necessary.
96+
97+
### `--containall` requires `--pwd $PWD` and `-B $PWD`
98+
99+
When users specify `--containall` in `singularity_args`, singularity disables
100+
auto-binding of `$PWD` and sets the working directory to `$HOME` inside the
101+
container. This breaks relative paths to inputs/outputs. The `call_fmt` built
102+
during init always includes `-B $PWD --pwd $PWD` to handle this.
103+
104+
### datalad `{placeholder}` conflicts with shell `${variables}`
105+
106+
`datalad run` interprets `{name}` as a substitution placeholder, which
107+
conflicts with shell variable expansion like `${subid}`. The POC initially
108+
tried inlining zip commands in `datalad run` but this caused placeholder
109+
conflicts. The solution is a separate zip script called by `datalad run`.
110+
111+
In the jinja2 templates, shell variables use `{%raw%}/${subid}{%endraw%}`
112+
blocks to prevent jinja2 from interpreting them, and datalad sees the literal
113+
`${subid}` which it passes through to bash.
114+
115+
### `datalad run --explicit` doesn't track deletions
116+
117+
When outputs are deleted inside a `datalad run --explicit` command, datalad
118+
doesn't record the deletion — only additions/modifications are tracked. This is
119+
why raw output removal is a separate `git rm` step outside of datalad run.
120+
121+
This was fixed upstream in https://github.com/datalad/datalad/pull/7823 but
122+
we keep the `git rm` workaround to avoid requiring the latest datalad version.
123+
124+
## Open questions
125+
126+
### Pipeline support
127+
128+
Pipeline currently uses a separate code path (`_bootstrap_pipeline_scripts`)
129+
that hardcodes container paths and generates a combined script
130+
(`pipeline_zip.sh`) with direct `singularity run` calls per step.
131+
132+
Questions:
133+
- Should pipeline be "a pipeline of 1"? i.e., unify single-app and pipeline
134+
into one code path where single-app is just a pipeline with one step.
135+
- If not unified, pipeline needs the same `containers-list` / `containers-add`
136+
treatment per container step.
137+
- Pipeline currently puts all singularity invocations in one script called by
138+
one `datalad run`. With `containers-run`, each step would be a separate
139+
`datalad containers-run` call — better provenance but different structure.
140+
141+
### Reusing `call_fmt` / `cmdexec` from container datasets
142+
143+
Currently we build `call_fmt` from the user's `singularity_args` config:
144+
```
145+
singularity run -B $PWD --pwd $PWD {user_args} {img} {cmd}
146+
```
147+
148+
Some container datasets (like repronim/containers) already define `cmdexec` in
149+
their `.datalad/config`. Should we reuse that instead of building our own?
150+
151+
Related: datalad-container's freeze script approach
152+
(https://github.com/datalad/datalad-container/issues/287) could provide a
153+
standardized way to handle this.
154+
155+
### Container image fetch during init
156+
157+
`datalad get` of the container image during `babs init` ensures the symlink
158+
approach works at job time. But this makes init slow if the image isn't already
159+
local (e.g., pulling a multi-GB .sif from a remote).
160+
161+
Options:
162+
- Accept the cost during init (current approach)
163+
- Advise users to `datalad get` the image in their source container dataset
164+
before running `babs init`, so the fetch is a fast local copy
165+
- Add a separate `babs prepare` step for data orchestration that runs after
166+
init but before submit
167+
168+
### Container access mechanism
169+
170+
Currently using PROJECT_ROOT symlink (inherited from upstream). This depends on:
171+
- `PROJECT_ROOT` environment variable propagating through SLURM
172+
- Shared filesystem visible from compute nodes
173+
174+
An alternative is ephemeral clones (`datalad get -n --reckless ephemeral
175+
containers`), which removes these dependencies but requires changing the clone
176+
source from input RIA to the analysis dataset path. Deferred for now.
177+
178+
### Backwards-incompatible: features from `bidsapp_run.sh.jinja2` not yet ported
179+
180+
The old `bidsapp_run.sh.jinja2` template had several BABS-managed features that
181+
are not yet in the `containers-run` path. These are **backwards-incompatible**
182+
for users who depend on them and need to be restored or replaced.
183+
184+
- **TemplateFlow bind mount** — BABS read `TEMPLATEFLOW_HOME` from the
185+
environment and added `-B path:path_in_container` + `--env` to the singularity
186+
command. Could be restored in BABS, or users could handle this via
187+
`singularity_args` in their config.
188+
189+
- **FreeSurfer license bind mount** — BABS intercepted `--fs-license-file` from
190+
`bids_app_args` and converted it to a `-B` bind mount. Same options: restore
191+
in BABS or let users specify via `singularity_args`.
192+
193+
- **BIDS filter file** — dynamically generated per-job for session-level
194+
processing, restricting BIDS layout indexing to the current session. This is
195+
BIDS-app-specific (fmriprep gets `bold`, qsiprep gets `dwi`, etc.) and can't
196+
be handled by user config alone. Could potentially be supported via a hook or
197+
user-provided script mechanism (nothing like this exists in BABS today).
198+
199+
- **`${PWD}/` prefix on paths** — the old script passed absolute paths
200+
(`${PWD}/inputs/data/BIDS`). The `containers-run` path passes relative paths,
201+
which should work because `call_fmt` includes `--pwd $PWD`. Needs verification
202+
with BIDS apps that resolve paths differently.
203+
204+
### `--explicit` flag
205+
206+
We use `--explicit` on both `containers-run` and the zip `datalad run`. The
207+
POC required `--explicit` because the old subject pruning (`rm -rf` of other
208+
subjects' directories) dirtied the dataset, and datalad refuses to run on a
209+
dirty dataset without `--explicit`. Upstream's sparse-checkout PR (#337)
210+
eliminated subject pruning, so the dataset should stay clean. `--explicit` may
211+
no longer be strictly necessary — kept for safety, could test removing it.

0 commit comments

Comments
 (0)