-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Problem
BABS hardcodes the container path as containers/.datalad/environments/<name>/image. This is the default for containers-add, but breaks if -i was used to specify a custom path. For one-off container datasets specifically for use with BABS this isn't a problem, but there are also pre-existing container datasets that would be useful, ie repronim/containers which stores images at images/bids/<name>.sif.
Suggested approach
Use datalad containers-run instead of direct singularity run. This:
- Reads container path from
.datalad/config(works with any layout) - Provides native datalad provenance tracking
- Enables
cmdexeccustomization
cmdexec is a template in .datalad/config that defines how to execute the container (e.g. the default is singularity exec {img} {cmd}). It can be overridden by specifying --call-fmt at container-add time. This allows custom wrappers or execution methods without changing the calling code.
Making this work with any container dataset
For a low-friction transition, the cmdexec could execute the existing BABS-generated script, getting containers-run benefits without rewriting script generation logic.
To support any datalad-containers compatible dataset without hardcoding paths:
- BABS clones the container dataset as subdataset (
analysis/containers/) - BABS gets the image path via
datalad -f json containers-list -d containers/ - BABS runs
containers-addfrom analysis dataset, pointing tocontainers/<path>, with--call-fmtpointing to the BABS-generated script containers-runnow works from analysis
Example:
# BABS gets image path from subdataset, then:
datalad containers-add bids-mriqc \
-i containers/images/bids/bids-mriqc--24.0.2.sif \
--call-fmt "code/bids-mriqc-24-0-2_zip.sh {cmd}"Then participant_job.sh just calls:
datalad containers-run -n bids-mriqc -- "$subid"This invokes code/bids-mriqc-24-0-2_zip.sh sub-01 - the existing script stays as-is, but we get containers-run provenance tracking.
Works with DIY containers-add datasets, repronim/containers, or any other datalad-containers compatible layout.
Going further: cleaner provenance
The above works as a low-friction transition, but we can go further (and simpler). By separating the BIDS app execution from zipping, we get explicit provenance for each step - and zipping becomes trivially optional (see related issue on optional zipping).
# Step 1: containers-run (clean provenance)
datalad containers-run -m "Compute MRIQC for ${subid}" -n bids-mriqc -- mriqc ...
# Step 2 (perhaps optional): zip and drop raw files
datalad run -m "Zip MRIQC outputs for ${subid}" -- 7z a ${subid}_mriqc.zip ${subid}/
datalad drop ${subid}/This produces explicit commits:
- "Ran MRIQC" - shows exactly what BIDS app ran, including the exact singularity invocation (rather than just pointing to a script)
- "Zipped outputs" - separate step/commit (if enabled)
With datalad drop, raw files aren't duplicated - they're tracked but not present. Only zips take up space.
Note: With this approach, --call-fmtis no longer needed in the containers-add step, it seems like we could use the default cmdexec. participant_job.sh calls containers-run directly, then optionally zips:
Additionally, this removes one step of re-direction as well, ie there only 1 script, participant_job.sh, removing the need for the zip script that actually does the heavy lifting currently.