Feature Request: Use datalad containers-run

## Problem

BABS hardcodes the container path as `containers/.datalad/environments/<name>/image`. This is the default for `containers-add`, but breaks if `-i` was used to specify a custom path. For one-off container datasets specifically for use with BABS this isn't a problem, but there are also pre-existing container datasets that would be useful, ie [repronim/containers](https://github.com/ReproNim/containers) which stores images at `images/bids/<name>.sif`.

## Suggested approach

Use `datalad containers-run` instead of direct `singularity run`. This:
- Reads container path from `.datalad/config` (works with any layout)
- Provides native datalad provenance tracking
- Enables `cmdexec` customization

`cmdexec` is a template in `.datalad/config` that defines how to execute the container (e.g. the default is `singularity exec {img} {cmd}`). It can be overridden by specifying `--call-fmt` at container-add time. This allows custom wrappers or execution methods without changing the calling code.

## Making this work with any container dataset

For a low-friction transition, the `cmdexec` could execute the existing BABS-generated script, getting containers-run benefits without rewriting script generation logic.

To support any datalad-containers compatible dataset without hardcoding paths:

1. BABS clones the container dataset as subdataset (`analysis/containers/`)
2. BABS gets the image path via `datalad -f json containers-list -d containers/`
3. BABS runs `containers-add` from analysis dataset, pointing to `containers/<path>`, with `--call-fmt` pointing to the BABS-generated script
4. `containers-run` now works from analysis

Example:
```bash
# BABS gets image path from subdataset, then:
datalad containers-add bids-mriqc \
  -i containers/images/bids/bids-mriqc--24.0.2.sif \
  --call-fmt "code/bids-mriqc-24-0-2_zip.sh {cmd}"
```

Then `participant_job.sh` just calls:
```bash
datalad containers-run -n bids-mriqc -- "$subid"
```

This invokes `code/bids-mriqc-24-0-2_zip.sh sub-01` - the existing script stays as-is, but we get `containers-run` provenance tracking.

Works with DIY `containers-add` datasets, repronim/containers, or any other datalad-containers compatible layout.

## Going further: cleaner provenance

The above works as a low-friction transition, but we can go further (and simpler). By separating the BIDS app execution from zipping, we get explicit provenance for each step - and zipping becomes trivially optional (see [related issue on optional zipping](https://github.com/PennLINC/babs/issues/327)).

```bash
# Step 1: containers-run (clean provenance)
datalad containers-run -m "Compute MRIQC for ${subid}" -n bids-mriqc -- mriqc ...

# Step 2 (perhaps optional): zip and drop raw files
datalad run -m "Zip MRIQC outputs for ${subid}" -- 7z a ${subid}_mriqc.zip ${subid}/
datalad drop ${subid}/
```

This produces explicit commits:
1. "Ran MRIQC" - shows exactly what BIDS app ran, including the exact singularity invocation (rather than just pointing to a script)
2. "Zipped outputs" - separate step/commit (if enabled)

With `datalad drop`, raw files aren't duplicated - they're tracked but not present. Only zips take up space.

Note: With this approach, `--call-fmt`is no longer needed in the `containers-add` step, it seems like we could use the default `cmdexec`. `participant_job.sh` calls `containers-run` directly, then optionally zips:

Additionally, this removes one step of re-direction as well, ie there only 1 script, `participant_job.sh`, removing the need for the zip script that actually does the heavy lifting currently.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Use datalad containers-run #328

Problem

Suggested approach

Making this work with any container dataset

Going further: cleaner provenance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Use datalad containers-run #328

Description

Problem

Suggested approach

Making this work with any container dataset

Going further: cleaner provenance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions