Open
Conversation
Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850: 1. **Etelemetry sentinel** (`job.py`, `submitter.py`): Replace the `None` initial value of `Job._etelemetry_version_data` with a distinct `_ETELEMETRY_UNCHECKED` sentinel. Previously a failed network check returned `None`, keeping the `is None` guard True forever and re-issuing the HTTP request on every task (~93 ms each). 2. **Function bytes cache** (`hash.py`): Add a module-level `_function_bytes_cache` keyed by `(module, qualname, mtime_ns)`. `inspect.getsource()` + `ast.parse()` now runs at most once per function per session; subsequent calls cost only a single `os.stat()`. 3. **Skip hash-change check for non-FileSet tasks** (`job.py`): Call `TypeParser.contains_type(FileSet, ...)` on each input field; if none match, skip the expensive full hash recomputation in `_check_for_hash_changes()`. Scalar/pure-Python values cannot mutate under Pydra. 4. **In-memory result cache** (`job.py`): Store the completed `Result` on `self._cached_result` at the end of `run()` so same-process callers (e.g. `Submitter.__call__` with DebugWorker) do not need to deserialise it back from disk. The field is excluded from `__getstate__` so subprocess/Slurm workers still use the disk path. 5. **Once-per-location PersistentCache.clean_up()** (`hash.py`): Track which cache locations have already been scanned in a class-level set (`_session_cleanups_done`). The O(n) `iterdir()` + `stat()` loop no longer runs after every task. `path.unlink(missing_ok=True)` makes concurrent cleanup by multiple Slurm nodes on shared NFS safe. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (71.42%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.
Additional details and impacted files@@ Coverage Diff @@
## main #858 +/- ##
===========================================
- Coverage 88.35% 34.58% -53.77%
===========================================
Files 88 66 -22
Lines 18198 11316 -6882
Branches 3565 1504 -2061
===========================================
- Hits 16079 3914 -12165
- Misses 1737 6975 +5238
- Partials 382 427 +45 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes #850
Five targeted optimisations identified via cProfile on the audio-file benchmark from issue #850:
Etelemetry sentinel (
job.py,submitter.py): Replace theNoneinitial value ofJob._etelemetry_version_datawith a distinct_ETELEMETRY_UNCHECKEDsentinel. Previously a failed network check returnedNone, keeping theis Noneguard True forever and re-issuing the HTTP request on every task (~93 ms each).Function bytes cache (
hash.py): Add a module-level_function_bytes_cachekeyed by(module, qualname, mtime_ns).inspect.getsource()+ast.parse()now runs at most once per function per session; subsequent calls cost only a singleos.stat().Skip hash-change check for non-FileSet tasks (
job.py): CallTypeParser.contains_type(FileSet, ...)on each input field; if none match, skip the expensive full hash recomputation in_check_for_hash_changes(). Scalar/pure-Python values cannot mutate under Pydra.In-memory result cache (
job.py): Store the completedResultonself._cached_resultat the end ofrun()so same-process callers (e.g.Submitter.__call__with DebugWorker) do not need to deserialise it back from disk. The field is excluded from__getstate__so subprocess/Slurm workers still use the disk path.Once-per-location PersistentCache.clean_up() (
hash.py): Track which cache locations have already been scanned in a class-level set (_session_cleanups_done). The O(n)iterdir()+stat()loop no longer runs after every task.path.unlink(missing_ok=True)makes concurrent cleanup by multiple Slurm nodes on shared NFS safe.Types of changes
Summary
Checklist