Skip to content

[Improvement][Scheduler] Rerun workflow instance should follow the specified workerGroup parameter #17794

@washingxian

Description

@washingxian

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Description

The current rerun mechanism of workflow instances ignores the pre-configured workerGroup parameter, leading to random assignment of tasks to idle workers instead of the specified worker group. This breaks resource isolation and scheduling rules, making it impossible to control task execution nodes as expected during rerun scenarios.

Issue Description

When re-running a workflow instance, the system does not follow the specified workerGroup in the startup parameters, but randomly assigns the task to any idle worker node instead. This violates the expected resource isolation and scheduling rules, and cannot guarantee the consistency of task execution environment between the first run and rerun.

What version of DolphinScheduler are you using?

Version: 3.3.2

What Operating System are you using?

OS: Debian 12

What happened?

  1. Create a workflow and set a specific workerGroup (e.g., "w1") in the startup parameters when running the workflow for the first time;
  2. The first run correctly executes on the nodes in the specified workerGroup;
  3. When re-running the failed/finished workflow instance (via "Rerun" button), the system ignores the workerGroup parameter;
  4. The re-run task is assigned to any idle worker node, not the specified workerGroup;

What you expected to happen?

  1. When re-running a workflow instance, the system should inherit and use the workerGroup parameter specified in the original startup parameters;
  2. The rerun task must be executed only on the nodes in the specified workerGroup, consistent with the first run;
  3. If the specified workerGroup has no idle nodes, the task should wait in the queue instead of being randomly assigned to other worker groups.

How to reproduce it (as minimally and clearly as possible)?

  1. Prepare a DolphinScheduler cluster with at least two independent worker groups (e.g., group A: node1/node2, group B: node3/node4);
  2. Create a simple test workflow (e.g., a shell task that prints the worker node name);
  3. Submit the workflow instance with startup parameter workerGroup=group A;
  4. Confirm the first run executes on node1/node2 (group A) by checking the task log;
  5. After the instance finishes/fails, click the "Rerun" button to re-execute the instance (without modifying any parameters);
  6. Check the task execution node: the rerun task runs on node3/node4 (group B) instead of group A;

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    backendhelp wantedExtra attention is neededimprovementmake more easy to user or prompt friendly

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions