Use `git sparse-checkout` in `participant_job` for more efficient dataset cloning by tien-tong · Pull Request #337 · PennLINC/babs

tien-tong · 2026-02-27T14:38:31Z

Summary

This branch switches participant_job.sh to use git sparse-checkout when cloning, istead of checking out the full repository tree

1. participant_job: sparse-checkout

File: participant_job.sh.jinja2

Participant jobs now clone the analysis dataset with:

# Clone the dataset without checking out the working tree.
# This initializes the Git repository and DataLad metadata,
# but does not populate any files on disk.
datalad clone ... -- --no-checkout

# Enable sparse-checkout in "cone" mode (optimized for directory-level patterns).
# This allows us to specify top-level directories to materialize.
git sparse-checkout init --cone

# Restrict the working tree to only the required paths:
git sparse-checkout set code containers <input_dataset paths>

# Populate the working tree with only the paths defined above
git checkout -f

Key changes

Sparse checkout: Only required paths are materialized instead of the full tree.
Non-zipped BIDS inputs: Sparse-checkout is applied per dataset so that only:
- the current subject/session, and
- dataset_description.json
  are present.
This avoids full-dataset scans (e.g., via BIDSLayout) over unretrieved content.
Cleanup refactor: Replaces manual datalad drop / rm -rf calls with a single trap cleanup EXIT, ensuring consistent cleanup behavior.

2. Shared container image handling

File: participant_job.sh.jinja2

After sparse-checkout, jobs no longer re-fetch containers via DataLad. Instead:

The container image is symlinked from:

${PROJECT_ROOT}/analysis/containers/.datalad/environments/<container_name>/image

into the job’s local containers/ tree.

This avoids redundant container downloads across jobs and prevents failures that can occur when hundreds of jobs simultaneously attempt to datalad get the same file.

3. Zip discovery (sparse-checkout safe)

File: determine_zipfilename.sh.jinja2

Replaces filesystem-based find with a Git-tree-based approach:

git -C "${zip_search_path}" ls-tree -r --name-only HEAD

combined with a grep pattern to identify the zip file.

This allows zip filename resolution directly from the Git tree, making it compatible with --no-checkout and sparse-checkout workflows, without relying on materialized files.

4. Merge and status fixes

`merge.py`

Job branch refs are now resolved as origin/<branch> (e.g., origin/job-0001-sub-01) in the merge clone.
These refs are passed to get_git_show_ref_shasum, ensuring correct merge behavior when working with remote refs.

`base.py`

If output_ria_data_dir points to the RIA store root (without a .git directory), it is resolved via the RIA alias/data symlink so that the actual dataset path is used.

`utils.py`

get_results_branches_from_clone(clone_path)
Lists job branches via git branch -r in a clone (used by merge_ds), avoiding slow or hanging RIA listing in CI.
get_results_branches_from_ria(ria_data_dir, timeout)
Lists job branches via git ls-remote --heads with a timeout to prevent CI hangs.
update_results_status
Uses:
```
.infer_objects(copy=False).astype(bool)
```
when filling has_results, avoiding pandas dtype issues in boolean logic.

codecov-commenter · 2026-02-28T20:15:59Z

Codecov Report

❌ Patch coverage is 84.37500% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.07%. Comparing base (c56b73d) to head (1faf33a).

Files with missing lines	Patch %	Lines
babs/utils.py	83.33%	2 Missing and 2 partials ⚠️
babs/base.py	75.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #337      +/-   ##
==========================================
+ Coverage   79.02%   79.07%   +0.04%     
==========================================
  Files          16       16              
  Lines        1812     1840      +28     
  Branches      304      312       +8     
==========================================
+ Hits         1432     1455      +23     
- Misses        266      268       +2     
- Partials      114      117       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Use git sparse-checkout

000b7f2

tien-tong linked an issue Feb 27, 2026 that may be closed by this pull request

Reduce inode usage in participant_job.sh #333

Closed

fix e2e slurm

065e6c6

tien-tong added 2 commits March 1, 2026 16:45

add tests

272de03

fix tests

1faf33a

tien-tong requested a review from mattcieslak March 2, 2026 14:45

tien-tong changed the title ~~Use git sparse-checkout~~ Use git sparse-checkout in participant_job for more efficient dataset cloning Mar 2, 2026

tien-tong added the enhancement New feature or request label Mar 2, 2026

tien-tong merged commit 5740ba2 into main Mar 2, 2026
10 checks passed

tien-tong deleted the git-sparse-checkout branch March 2, 2026 14:51

yarikoptic mentioned this pull request Mar 4, 2026

Investigate/facilitate support for working with git sparse-checkouts datalad/datalad#7815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `git sparse-checkout` in `participant_job` for more efficient dataset cloning#337

Use `git sparse-checkout` in `participant_job` for more efficient dataset cloning#337
tien-tong merged 4 commits intomainfrom
git-sparse-checkout

tien-tong commented Feb 27, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Feb 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tien-tong commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. participant_job: sparse-checkout

Key changes

2. Shared container image handling

3. Zip discovery (sparse-checkout safe)

4. Merge and status fixes

merge.py

base.py

utils.py

Uh oh!

codecov-commenter commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tien-tong commented Feb 27, 2026 •

edited

Loading

`merge.py`

`base.py`

`utils.py`

codecov-commenter commented Feb 28, 2026 •

edited

Loading