enh: reduce per-task overhead by satra · Pull Request #858 · nipype/pydra

satra · 2026-02-21T14:34:33Z

closes #850

Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850:

Etelemetry sentinel (job.py, submitter.py): Replace the None initial value of Job._etelemetry_version_data with a distinct _ETELEMETRY_UNCHECKED sentinel. Previously a failed network check returned None, keeping the is None guard True forever and re-issuing the HTTP request on every task (~93 ms each).
Function bytes cache (hash.py): Add a module-level _function_bytes_cache keyed by (module, qualname, mtime_ns). inspect.getsource() + ast.parse() now runs at most once per function per session; subsequent calls cost only a single os.stat().
Skip hash-change check for non-FileSet tasks (job.py): Call TypeParser.contains_type(FileSet, ...) on each input field; if none match, skip the expensive full hash recomputation in _check_for_hash_changes(). Scalar/pure-Python values cannot mutate under Pydra.
In-memory result cache (job.py): Store the completed Result on self._cached_result at the end of run() so same-process callers (e.g. Submitter.__call__ with DebugWorker) do not need to deserialise it back from disk. The field is excluded from __getstate__ so subprocess/Slurm workers still use the disk path.
Once-per-location PersistentCache.clean_up() (hash.py): Track which cache locations have already been scanned in a class-level set (_session_cleanups_done). The O(n) iterdir() + stat() loop no longer runs after every task. path.unlink(missing_ok=True) makes concurrent cleanup by multiple Slurm nodes on shared NFS safe.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Summary

Checklist

I have added tests to cover my changes (if necessary)
I have updated documentation (if necessary)

Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850: 1. **Etelemetry sentinel** (`job.py`, `submitter.py`): Replace the `None` initial value of `Job._etelemetry_version_data` with a distinct `_ETELEMETRY_UNCHECKED` sentinel. Previously a failed network check returned `None`, keeping the `is None` guard True forever and re-issuing the HTTP request on every task (~93 ms each). 2. **Function bytes cache** (`hash.py`): Add a module-level `_function_bytes_cache` keyed by `(module, qualname, mtime_ns)`. `inspect.getsource()` + `ast.parse()` now runs at most once per function per session; subsequent calls cost only a single `os.stat()`. 3. **Skip hash-change check for non-FileSet tasks** (`job.py`): Call `TypeParser.contains_type(FileSet, ...)` on each input field; if none match, skip the expensive full hash recomputation in `_check_for_hash_changes()`. Scalar/pure-Python values cannot mutate under Pydra. 4. **In-memory result cache** (`job.py`): Store the completed `Result` on `self._cached_result` at the end of `run()` so same-process callers (e.g. `Submitter.__call__` with DebugWorker) do not need to deserialise it back from disk. The field is excluded from `__getstate__` so subprocess/Slurm workers still use the disk path. 5. **Once-per-location PersistentCache.clean_up()** (`hash.py`): Track which cache locations have already been scanned in a class-level set (`_session_cleanups_done`). The O(n) `iterdir()` + `stat()` loop no longer runs after every task. `path.unlink(missing_ok=True)` makes concurrent cleanup by multiple Slurm nodes on shared NFS safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-02-21T14:35:29Z

Codecov Report

❌ Patch coverage is 71.42857% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.58%. Comparing base (876be56) to head (cab9af2).

Files with missing lines	Patch %	Lines
pydra/utils/hash.py	67.34%	9 Missing and 7 partials ⚠️
pydra/engine/job.py	84.61%	2 Missing ⚠️

❌ Your patch check has failed because the patch coverage (71.42%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (876be56) and HEAD (cab9af2). Click for more details.

HEAD has 46 uploads less than BASE

Flag BASE (876be56) HEAD (cab9af2)

52 6

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #858       +/-   ##
===========================================
- Coverage   88.35%   34.58%   -53.77%     
===========================================
  Files          88       66       -22     
  Lines       18198    11316     -6882     
  Branches     3565     1504     -2061     
===========================================
- Hits        16079     3914    -12165     
- Misses       1737     6975     +5238     
- Partials      382      427       +45

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

enh: reduce per-task overhead#858

enh: reduce per-task overhead#858
satra wants to merge 1 commit intomainfrom
enh/profiling

satra commented Feb 21, 2026

Uh oh!

codecov bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

satra commented Feb 21, 2026

Types of changes

Summary

Checklist

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Feb 21, 2026 •

edited

Loading