Skip to content

Feature Request: Use datalad containers-run #328

@asmacdo

Description

@asmacdo

Problem

BABS hardcodes the container path as containers/.datalad/environments/<name>/image. This is the default for containers-add, but breaks if -i was used to specify a custom path. For one-off container datasets specifically for use with BABS this isn't a problem, but there are also pre-existing container datasets that would be useful, ie repronim/containers which stores images at images/bids/<name>.sif.

Suggested approach

Use datalad containers-run instead of direct singularity run. This:

  • Reads container path from .datalad/config (works with any layout)
  • Provides native datalad provenance tracking
  • Enables cmdexec customization

cmdexec is a template in .datalad/config that defines how to execute the container (e.g. the default is singularity exec {img} {cmd}). It can be overridden by specifying --call-fmt at container-add time. This allows custom wrappers or execution methods without changing the calling code.

Making this work with any container dataset

For a low-friction transition, the cmdexec could execute the existing BABS-generated script, getting containers-run benefits without rewriting script generation logic.

To support any datalad-containers compatible dataset without hardcoding paths:

  1. BABS clones the container dataset as subdataset (analysis/containers/)
  2. BABS gets the image path via datalad -f json containers-list -d containers/
  3. BABS runs containers-add from analysis dataset, pointing to containers/<path>, with --call-fmt pointing to the BABS-generated script
  4. containers-run now works from analysis

Example:

# BABS gets image path from subdataset, then:
datalad containers-add bids-mriqc \
  -i containers/images/bids/bids-mriqc--24.0.2.sif \
  --call-fmt "code/bids-mriqc-24-0-2_zip.sh {cmd}"

Then participant_job.sh just calls:

datalad containers-run -n bids-mriqc -- "$subid"

This invokes code/bids-mriqc-24-0-2_zip.sh sub-01 - the existing script stays as-is, but we get containers-run provenance tracking.

Works with DIY containers-add datasets, repronim/containers, or any other datalad-containers compatible layout.

Going further: cleaner provenance

The above works as a low-friction transition, but we can go further (and simpler). By separating the BIDS app execution from zipping, we get explicit provenance for each step - and zipping becomes trivially optional (see related issue on optional zipping).

# Step 1: containers-run (clean provenance)
datalad containers-run -m "Compute MRIQC for ${subid}" -n bids-mriqc -- mriqc ...

# Step 2 (perhaps optional): zip and drop raw files
datalad run -m "Zip MRIQC outputs for ${subid}" -- 7z a ${subid}_mriqc.zip ${subid}/
datalad drop ${subid}/

This produces explicit commits:

  1. "Ran MRIQC" - shows exactly what BIDS app ran, including the exact singularity invocation (rather than just pointing to a script)
  2. "Zipped outputs" - separate step/commit (if enabled)

With datalad drop, raw files aren't duplicated - they're tracked but not present. Only zips take up space.

Note: With this approach, --call-fmtis no longer needed in the containers-add step, it seems like we could use the default cmdexec. participant_job.sh calls containers-run directly, then optionally zips:

Additionally, this removes one step of re-direction as well, ie there only 1 script, participant_job.sh, removing the need for the zip script that actually does the heavy lifting currently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions