Skip to content

feat: job arrays#174

Merged
cmeesters merged 85 commits intomainfrom
feat/jobarrays
Mar 26, 2026
Merged

feat: job arrays#174
cmeesters merged 85 commits intomainfrom
feat/jobarrays

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Nov 29, 2024

Summary by CodeRabbit

  • New Features
    • Asynchronous multi-job submission with SLURM job-array support, configurable array limits, and improved array-aware logging and reporting.
  • Documentation
    • Added "SLURM Job Arrays" guidance, updated CHANGELOG and release metadata.
  • Bug Fixes
    • Safer job cancellation handling and more robust status-query command assembly and status resolution for array tasks.
  • Tests
    • Extensive unit tests for array submission, status lookup, routing, chunking and edge cases.
  • Chores
    • Minor workflow input key rename and additional utility helpers.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 29, 2024

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds asynchronous, array-capable SLURM submission and status logic to the executor: new run_jobs and run_array_jobs orchestration, thread-safe reporting, array-aware status lookup and settings, supporting helpers, tests, docs, and minor CI/package metadata updates. Public API gains array-related settings and two Executor methods.

Changes

Cohort / File(s) Summary
Core executor & array logic
snakemake_executor_plugin_slurm/__init__.py
Added run_jobs, run_array_jobs, thread-safe reporting helpers, array-aware status lookup, background submission executor init, and array-tracking/logic. Public API: new Executor methods and ExecutorSettings fields (array_jobs, array_limit).
Submit string & status handling
snakemake_executor_plugin_slurm/submit_string.py, snakemake_executor_plugin_slurm/job_status_query.py, snakemake_executor_plugin_slurm/job_cancellation.py
get_submit_command accepts settings and failed_nodes and appends additional SLURM options; status query commands assembled with shlex.join; scancel filters out None IDs and returns early when nothing to cancel.
Utilities
snakemake_executor_plugin_slurm/utils.py
Added get_job_wildcards and pending_jobs_for_rule, imported Counter and DAGExecutorInterface, and exposed wildcard helper.
Tests (array + status)
tests/test_array_jobs.py, tests/test_array_status_lookup.py
Extensive new tests for array routing, sbatch chunking, per-task logfile mapping, external jobid mapping, status lookup and parent-fallback behavior, and related edge cases and logging.
Docs & changelog
docs/further.md, CHANGELOG.md
Documentation for SLURM job arrays (--slurm-array-jobs, --slurm-array-limit) added; changelog bumped to 2.5.3.
Packaging & CI
pyproject.toml, .github/workflows/announce-release.yml
Version updated to 2.5.3; announce-release workflow input renamed repotitle and template usage adjusted.
Tests (minor formatting)
tests/test_cli.py
Whitespace-only cleanup in module docstring.

Sequence Diagram(s)

sequenceDiagram
    participant Executor
    participant ThreadPool as "Submission Worker\n(ThreadPoolExecutor)"
    participant MainLoop as "Main Event Loop\n(asyncio)"
    participant SLURM as "SLURM (sbatch/sacct)"
    participant Job as "JobExecutorInterface"

    Executor->>MainLoop: prepare submission (run_jobs)
    alt group of jobs -> array
        Executor->>ThreadPool: submit run_array_jobs(tasks)
        ThreadPool->>SLURM: sbatch --array ... (create array)
        SLURM-->>ThreadPool: jobid
        ThreadPool->>MainLoop: _report_job_submission_threadsafe(job_info)
        MainLoop-->>Executor: record submitted tasks (external_jobid mapping)
    else single or grouped non-array
        Executor->>ThreadPool: submit run_job(job)
        ThreadPool->>SLURM: sbatch (single job)
        SLURM-->>ThreadPool: jobid
        ThreadPool->>MainLoop: _report_job_submission_threadsafe(job_info)
        MainLoop-->>Executor: record submitted job
    end
    MainLoop->>SLURM: query_job_status (sacct/squeue) for job/task ids
    SLURM-->>MainLoop: status info
    MainLoop->>Executor: resolve statuses (task-level vs parent-level)
    alt failure detected
        MainLoop->>Executor: _report_job_error_threadsafe(...)
    else success
        MainLoop->>Executor: report_job_success(...)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"🐰
I hopped through arrays, tasks in a line,
sbatch and sacct, each job's little sign.
Threads hum like carrots, the loop keeps the beat,
Logs snuggle in folders, each task has a seat.
Hooray for submissions — a burrow of feats!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.61% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: job arrays' directly and accurately summarizes the main feature addition across the changeset: comprehensive support for SLURM job array submissions including new run_jobs/run_array_jobs methods, array settings, status lookups, and related documentation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/jobarrays

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

111-124: Enhance error handling and logging

The new method should follow the same error handling and logging patterns as the rest of the codebase. Consider adding:

  1. Error handling for empty job lists
  2. Logging for job grouping operations
  3. Consistent error propagation

Here's a suggested implementation:

def run_jobs(self, jobs: List[JobExecutorInterface]):
+    if not jobs:
+        return
+
+    self.logger.debug(f"Grouping {len(jobs)} jobs by rule name")
     try:
         for rule_name, group in groupby(jobs, key=lambda job: job.rule.name):
             same_rule_jobs = list(group)
+            self.logger.debug(
+                f"Processing {len(same_rule_jobs)} jobs for rule '{rule_name}'"
+            )
             if len(same_rule_jobs) == 1:
                 self.run_job(same_rule_jobs[0])
             else:
                 # Temporary implementation: submit jobs individually
                 for job in same_rule_jobs:
                     self.run_job(job)
+    except Exception as e:
+        raise WorkflowError(f"Failed to process job group: {str(e)}")
🧰 Tools
🪛 Ruff (0.8.0)

114-114: Using the generator returned from itertools.groupby() more than once will do nothing on the second usage

(B031)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 531ebc6 and 3abb561.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.0)
snakemake_executor_plugin_slurm/__init__.py

114-114: Using the generator returned from itertools.groupby() more than once will do nothing on the second usage

(B031)

🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

8-8: LGTM!

The groupby import is correctly placed with other standard library imports and is required for the new job array functionality.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@johanneskoester
Copy link
Copy Markdown
Contributor Author

johanneskoester commented Dec 20, 2024

Plan: support aggregation of jobs in the Snakemake scheduler before they are send to the executor. This can be done by supporting a new standard resource min_jobs that tells the scheduler how many jobs shall be considered at least of a given rule. Of course, this resource will be automatically limited to the theoretical maximum number of jobs from that rule. The latter has to consider that a rule may occur in different topological levels of the workflow, which should be treated independently.

This way we avoid the case that jobs drop into the executor in a too fine-grained way.
Admins can even set this resource to the minimum size they want to have for array jobs on their cluster.

@cmeesters cmeesters changed the title feat: job arrays feat!: job arrays Jan 25, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

177-185: ⚠️ Potential issue

Implement temporary handling for multi-job groups.

The current implementation uses an ellipsis (...) as a placeholder, which will raise a NotImplementedError. Until the array job submission is implemented, we should handle these jobs individually.

Apply this improvement to handle multi-job groups temporarily:

             else:
                 # TODO submit as array
                 # share code with run_job
-
-                # TODO in the future: give a hint to the scheduler to select preferably
-                # many jobs from the same rule if possible, in order to have
-                # more efficient array jobs. This should be somehow tunable, because
-                # it might contradict other efficiency goals.
-                ...
+                # Temporary implementation: submit jobs individually until array support is added
+                for job in same_rule_jobs:
+                    self.run_job(job)
🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/__init__.py (2)

172-172: Add docstring to document the method's purpose and parameters.

The method lacks documentation explaining its purpose and parameters.

Apply this improvement:

     def run_jobs(self, jobs: List[JobExecutorInterface]):
+        """Process and submit multiple jobs, potentially as job arrays.
+
+        Args:
+            jobs: List of jobs to be submitted for execution.
+        """

173-173: Fix unused loop variable.

The loop control variable rule_name is not used within the loop body.

Apply this improvement:

-        for rule_name, group in groupby(jobs, key=lambda job: job.rule.name):
+        for _rule_name, group in groupby(jobs, key=lambda job: job.rule.name):
🧰 Tools
🪛 Ruff (0.8.2)

173-173: Loop control variable rule_name not used within loop body

Rename unused rule_name to _rule_name

(B007)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3abb561 and f9cb5ad.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
snakemake_executor_plugin_slurm/__init__.py

173-173: Loop control variable rule_name not used within loop body

Rename unused rule_name to _rule_name

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing
🔇 Additional comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

9-9: LGTM!

The groupby import is correctly placed with other standard library imports and is necessary for the new functionality.

@cmeesters cmeesters marked this pull request as draft March 7, 2025 18:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

179-185: ⚠️ Potential issue

Implement placeholder for array jobs

The current implementation uses an ellipsis (...) as a placeholder, which will raise a NotImplementedError. Until the array job submission is implemented, we should handle these jobs individually.

            else:
                # TODO submit as array
                # share code with run_job

                # TODO in the future: give a hint to the scheduler to select preferably
                # many jobs from the same rule if possible, in order to have
                # more efficient array jobs. This should be somehow tunable, because
                # it might contradict other efficiency goals.
-                ...
+                # Temporary implementation: submit jobs individually until array support is added
+                for job in same_rule_jobs:
+                    self.run_job(job)
🧹 Nitpick comments (1)
snakemake_executor_plugin_slurm/__init__.py (1)

173-173: Rename unused variable rule_name to _rule_name

The variable rule_name is defined in the loop statement but never used within the loop body. To follow Python conventions, prefix unused variables with an underscore.

-    def run_jobs(self, jobs: List[JobExecutorInterface]):
-        for rule_name, group in groupby(jobs, key=lambda job: job.rule.name):
+    def run_jobs(self, jobs: List[JobExecutorInterface]):
+        for _rule_name, group in groupby(jobs, key=lambda job: job.rule.name):
🧰 Tools
🪛 Ruff (0.8.2)

173-173: Loop control variable rule_name not used within loop body

Rename unused rule_name to _rule_name

(B007)

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f9cb5ad and 6e3742c.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_slurm/__init__.py (2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
snakemake_executor_plugin_slurm/__init__.py

173-173: Loop control variable rule_name not used within loop body

Rename unused rule_name to _rule_name

(B007)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: testing

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Sep 8, 2025

This PR was marked as stale because it has been open for 6 months with no activity.

@github-actions github-actions bot added the stale label Sep 8, 2025
@cmeesters cmeesters self-assigned this Jan 22, 2026
@cmeesters cmeesters added enhancement New feature or request and removed stale labels Jan 22, 2026
@cmeesters
Copy link
Copy Markdown
Member

@cademirch & @fbartusch final testing not really possible as the dependency can only be set, once the jobstep plugin release is ready.

@fbartusch
Copy link
Copy Markdown
Collaborator

@cmeesters Works on my cluster.

Copy link
Copy Markdown
Collaborator

@fbartusch fbartusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Poetry Test Failures

On my laptop I get one failling test case. I exclude tests.py, as they need a slurm cluster.

Click to expand error log
(base) fbartusch@dell:~/github/snakemake-executor-plugin-slurm$ poetry run pytest --ignore=tests/tests.py
===================================================================================== test session starts ======================================================================================
platform linux -- Python 3.12.2, pytest-8.4.2, pluggy-1.6.0
rootdir: /home/fbartusch/github/snakemake-executor-plugin-slurm
configfile: pyproject.toml
collected 117 items                                                                                                                                                                            

tests/test_array_jobs.py .................................                                                                                                                               [ 28%]
tests/test_array_status_lookup.py ...                                                                                                                                                    [ 30%]
tests/test_cli.py F.                                                                                                                                                                     [ 32%]
tests/test_parsing.py ...........                                                                                                                                                        [ 41%]
tests/test_partition_selection.py ...............................                                                                                                                        [ 68%]
tests/test_scontrol_parsing.py .....                                                                                                                                                     [ 72%]
tests/test_time_conversion.py ................................                                                                                                                           [100%]

=========================================================================================== FAILURES ===========================================================================================
_________________________________________________________________________________ test_jobname_prefix_applied __________________________________________________________________________________

    def test_jobname_prefix_applied():
        executor = _make_executor("testprefix")
    
        with patch.object(Executor, "warn_on_jobcontext", return_value=None):
            with patch(
                "snakemake_executor_plugin_slurm.uuid.uuid4",
                return_value=uuid.UUID("00000000-0000-0000-0000-000000000000"),
            ):
>               executor.__post_init__(test_mode=True)

tests/test_cli.py:42: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
snakemake_executor_plugin_slurm/__init__.py:483: in __post_init__
    get_max_array_size(), int(self.workflow.executor_settings.array_limit)
    ^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def get_max_array_size() -> int:
        """
        Function to get the maximum array size for SLURM job arrays. This is used
        to determine how many jobs can be submitted in a single array job.
    
        Returns:
            The maximum array size for SLURM job arrays, as an integer.
            Defaults to 1000 if the SLURM_ARRAY_MAX environment variable is not set
            or cannot be parsed as an integer.
        """
        max_array_size_str = None
        scontrol_cmd = "scontrol show config"
        try:
            res = subprocess.run(
                shlex.split(scontrol_cmd),
                capture_output=True,
                text=True,
                timeout=5,
            )
            out = (res.stdout or "") + (res.stderr or "")
            m = re.search(r"MaxArraySize\s*=?\s*(\d+)", out, re.IGNORECASE)
            if m:
                max_array_size_str = m.group(1)
        except (subprocess.SubprocessError, OSError):
            max_array_size_str = None
    
        try:
>           max_array_size = int(max_array_size_str)
                             ^^^^^^^^^^^^^^^^^^^^^^^
E           TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

snakemake_executor_plugin_slurm/utils.py:46: TypeError
=================================================================================== short test summary info ====================================================================================
FAILED tests/test_cli.py::test_jobname_prefix_applied - TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
================================================================================ 1 failed, 116 passed in 0.93s =================================================================================

Copy link
Copy Markdown
Collaborator

@fbartusch fbartusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the one test fixed and it's running on @cmeesters and my Slurm cluster the PR is ready for approval.

cmeesters added a commit to snakemake/snakemake-executor-plugin-slurm-jobstep that referenced this pull request Mar 18, 2026
)

This is the corresponding PR to    
snakemake/snakemake-executor-plugin-slurm#174 -
handling selection and decompression of job information for SLURM array
jobs.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Enhanced SLURM array-job support: runtime detection of array task
index, optional per-task command handling (supports hex-encoded
compressed per-task commands), configurable array-exec option, and
clearer parsing/error messages.

* **Tests**
* Added unit tests for array-exec parsing covering JSON, Python-literal,
compact-mapping formats, and invalid inputs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@cmeesters cmeesters merged commit a51a5bf into main Mar 26, 2026
5 checks passed
@cmeesters cmeesters deleted the feat/jobarrays branch March 26, 2026 12:22
@github-project-automation github-project-automation bot moved this from In review to Done in Snakemake Hackathon 2026 Mar 26, 2026
cmeesters pushed a commit that referenced this pull request Mar 26, 2026
🤖 I have created a release *beep* *boop*
---


##
[2.6.0](v2.5.4...v2.6.0)
(2026-03-26)


### Features

* job arrays
([#174](#174))
([a51a5bf](a51a5bf))
* Set tmpspace with help of gres="tmpspace:10G" syntax
([#444](#444))
([c6f6658](c6f6658))


### Bug Fixes

* logo
([#439](#439))
([c4e3ec3](c4e3ec3))


### Documentation

* add admin documentation
([#436](#436))
([8f7491f](8f7491f))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added support for job arrays
  * Added tmpspace configuration capability via `gres="tmpspace:10G"`

* **Bug Fixes**
  * Fixed logo display issue

* **Documentation**
  * Added admin documentation

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants