Skip to content

Support a separate set of Sisyphus worker nodes per workflow in order to: #12

@1fish2

Description

@1fish2
  • enable app-specific worker GCE VM parameters (RAM, disk, CPUs, GPUs, TPUs, ACLs, ...),
  • prevent large workflows from starving small ones,
  • enable staging servers to be independent,
  • allow log filtering by run,
  • save time pulling Docker images (locality),
  • prevent cascading problems like repeated-retry-on-failure from spreading between workflows.

Simplest approach:

  • A separate RabbitMQ task queue per workflow.
    • Eventually delete the workflow's task queue and remove its resources from Gaia memory.
  • In the workflow builder, launch the workers with the workflow name as metadata, and use that to find the task queue.
    • Ditto when resuming a workflow. Improve the usability somehow.

Smarter:

  • Auto-launch and shut down workers.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions