Skip to content

Add start-one-worker-per-node to interactive recovery#245

Merged
daniel-thom merged 9 commits intomainfrom
fix/start-one-worker-per-node
Apr 2, 2026
Merged

Add start-one-worker-per-node to interactive recovery#245
daniel-thom merged 9 commits intomainfrom
fix/start-one-worker-per-node

Conversation

@daniel-thom
Copy link
Copy Markdown
Collaborator

  • Allow the user to specify start_one_worker_per_node in the torc recover command
  • FIx a reporting issue for Slurm partitions.
  • Fix generated OpenAPI clients for the get_pending_actions API command.
  • Add a lint job to check parity of OpenAPI clients.
  • Extend the examples to cover all torc features.

Allow the user to specify start_one_worker_per_node for multi-node Slurm
allocations in the torc recover command.
The generated Python and Julia clients were incorrect for the
get_pending_actions API command.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Torc’s recovery workflow UX and OpenAPI artifacts, while also improving Slurm partition reporting and expanding the YAML examples suite.

Changes:

  • Add start_one_worker_per_node support to the interactive torc recover flow and propagate it to slurm schedule-nodes.
  • Fix Slurm partition reporting by introducing resolved_partition in scheduler planning and using it for analysis/state queries.
  • Correct get_pending_actions OpenAPI parameter location/encoding, regenerate clients, and add CI parity checks for generated clients.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/server/live_router.rs Marks pending-actions params as query params for OpenAPI generation.
src/client/scheduler_plan.rs Adds resolved_partition to planned schedulers and populates it during plan generation.
src/client/commands/slurm.rs Uses resolved_partition for allocation analysis and partition state queries.
src/client/commands/recover.rs Interactive recovery now prompts for and forwards --start-one-worker-per-node.
src/client/apis/workflow_actions_api.rs Updates Rust client query serialization for trigger_type (multi).
python_client/src/torc/openapi_client/api/workflow_actions_api.py Makes trigger_type optional and switches collection format to multi.
julia_client/Torc/src/api/apis/api_WorkflowActionsApi.jl Moves trigger_type to an optional query parameter with explode semantics.
julia_client/julia_client/docs/WorkflowActionsApi.md Updates Julia docs to reflect optional trigger_type query param.
examples/yaml/stdio_configuration.yaml Adds a stdio capture/override example workflow.
examples/yaml/slurm_staged_pipeline.yaml Tweaks CPU binding configuration in the Slurm staged pipeline example.
examples/yaml/simulation_sweep.yaml Adds project and metadata fields to example.
examples/yaml/ro_crate_provenance.yaml Adds project and metadata fields to example.
examples/yaml/multi_node_slurm.yaml Adds a multi-node Slurm + start_one_worker_per_node example workflow.
examples/yaml/hyperparameter_sweep.yaml Adds project and metadata fields to example.
examples/yaml/fan_in_with_regexes.yaml Adds an input-file-regex fan-in example workflow.
examples/yaml/direct_mode_checkpointing.yaml Adds a direct-mode checkpointing/resource enforcement example workflow.
api/sync_openapi.sh Adjusts when OpenAPI parity checks run during sync flows.
api/regenerate_rust_client.sh Switches Rust client cleanup to find -delete, preserving ro_crate_api.rs.
api/openapi.yaml Moves trigger_type for pending actions from path → query and makes it optional.
api/openapi.codegen.yaml Mirrors the same pending-actions parameter change for codegen spec.
api/check_client_codegen_parity.sh Adds a script to diff generated Python/Julia clients against checked-in versions.
.github/workflows/lint.yml Adds CI step to run client codegen parity checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

daniel-thom and others added 4 commits April 1, 2026 08:31
build.rs watched .git/index, which changes on nearly every git operation,
causing cargo to re-run the build script and invalidate all test targets.
Removed the GIT_DIRTY env var entirely and replaced .git/index watching
with tracking the current branch ref file, so rebuilds only happen on
actual commit changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 39 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@daniel-thom daniel-thom merged commit f9efeda into main Apr 2, 2026
13 checks passed
@daniel-thom daniel-thom deleted the fix/start-one-worker-per-node branch April 2, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants