fix: workflow incorrectly marked as completed while nodes are still executing by tomerqodo · Pull Request #83 · qodo-benchmark/dify-combined-qodo

tomerqodo · 2026-01-10T20:17:12Z

…s node is skipped

qodo-code-review · 2026-01-10T20:25:24Z

Code Review by Qodo (Alpha)

🐞 Bugs (1) 📘 Rule Violations (1) 📎 Requirement Gaps (0) 💡 Suggestions (0)

1. Invalid node IDs queued 🐞 Bug

Description

• enqueue_node() now pushes node_id into the ready queue before touching
  self._graph.nodes[node_id]; if the node ID is invalid, a KeyError is raised after the enqueue,
  leaving a poisoned queue item behind.
• Workers dereference self._graph.nodes[node_id] outside the worker’s try: block, so a poisoned
  queue item can crash the worker thread and stall execution.
• Resume flow enqueues paused_nodes without validating the node still exists, increasing the
  chance of this regression surfacing (e.g., stale snapshots or graph changes between pause/resume).

reliability

Code

api/core/workflow/graph_engine/graph_state_manager.py[R52-54]

        with self._lock:
-            self._graph.nodes[node_id].state = NodeState.TAKEN
            self._ready_queue.put(node_id)
+            self._graph.nodes[node_id].state = NodeState.TAKEN

Evidence

The new enqueue_node() ordering enqueues first and only then indexes graph.nodes[node_id], so an
invalid node_id can be inserted into the ready queue before the method fails. Worker threads then
pop from the queue and immediately index graph.nodes[node_id] (outside any error handling), which
can raise KeyError and kill the thread. The resume path feeds paused_nodes directly into
enqueue_node() without checking membership in graph.nodes, and paused_nodes are
stored/returned without validation.

api/core/workflow/graph_engine/graph_state_manager.py[42-55]
api/core/workflow/graph_engine/worker.py[100-112]
api/core/workflow/graph_engine/graph_engine.py[317-340]
api/core/workflow/runtime/graph_runtime_state.py[342-352]

Agent Prompt

### Issue description
`GraphStateManager.enqueue_node()` currently enqueues `node_id` into the ready queue before verifying that `node_id` exists in `self._graph.nodes`. If the node ID is invalid/stale, the method raises after the enqueue, leaving a bad ID in the queue; workers then crash when dereferencing `graph.nodes[node_id]`.

### Issue Context
This is particularly risky in resume flows: `GraphEngine._start_execution(resume=True)` enqueues `paused_nodes` without validating membership in the current graph, and `GraphRuntimeState` does not validate paused IDs.

### Fix Focus Areas
- api/core/workflow/graph_engine/graph_state_manager.py[42-55]
- api/core/workflow/graph_engine/graph_engine.py[317-340]
- api/core/workflow/graph_engine/worker.py[100-112]
- api/core/workflow/runtime/graph_runtime_state.py[342-352]

### Suggested implementation sketch
- In `enqueue_node()`:
 - resolve/validate the node first (`node = self._graph.nodes.get(node_id)`; if None -&gt; raise or no-op depending on desired behavior)
 - then `self._ready_queue.put(node_id)`
 - then set `node.state = NodeState.TAKEN`
- (Optional hardening) Add a new `schedule_node()` API that performs enqueue + execution-tracking updates atomically and use it consistently across call sites.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. `Any` used in tests 📘 Rule Violation

Description

• The new unit test introduces typing.Any and uses it in annotations for
  get_incoming_edges_side_effect, weakening static typing and making refactors/type-checking less
  effective.
• This conflicts with the requirement to use strong typing and avoid overly-permissive types unless
  strictly necessary.
• It may also mask incorrect mock usage/signatures that stronger types would catch earlier.

reliability

Code

api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[3]
+from typing import Any

Evidence
PR Compliance ID 9 requires strong typing and avoiding Any. The added test file explicitly imports
Any and uses it in the get_incoming_edges_side_effect function signature, demonstrating the
introduction of permissive typing in new code.
AGENTS.md
api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[3-3]
api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[205-210]

Agent Prompt

## Issue description
The new unit tests introduce `typing.Any` and use it in annotations, which violates the strong-typing guideline and reduces the effectiveness of type checking.

## Issue Context
These helper functions are only used as mock side effects and can be typed precisely (e.g., node IDs are `str`, and the functions return lists of `Edge`-like objects).

## Fix Focus Areas
- api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[3-3]
- api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[205-223]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-10T20:25:24Z

api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py

@@ -0,0 +1,308 @@
+"""Unit tests for skip propagator."""
+
+from typing import Any


1. any used in tests 📘 Rule Violation

• The new unit test introduces typing.Any and uses it in annotations for get_incoming_edges_side_effect, weakening static typing and making refactors/type-checking less effective. • This conflicts with the requirement to use strong typing and avoid overly-permissive types unless strictly necessary. • It may also mask incorrect mock usage/signatures that stronger types would catch earlier.

Agent Prompt

## Issue description The new unit tests introduce `typing.Any` and use it in annotations, which violates the strong-typing guideline and reduces the effectiveness of type checking. ## Issue Context These helper functions are only used as mock side effects and can be typed precisely (e.g., node IDs are `str`, and the functions return lists of `Edge`-like objects). ## Fix Focus Areas - api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[3-3] - api/tests/unit_tests/core/workflow/graph_engine/graph_traversal/test_skip_propagator.py[205-223]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-code-review · 2026-01-10T20:25:24Z

api/core/workflow/graph_engine/graph_state_manager.py

        with self._lock:
-            self._graph.nodes[node_id].state = NodeState.TAKEN
            self._ready_queue.put(node_id)
+            self._graph.nodes[node_id].state = NodeState.TAKEN


2. Invalid node ids queued 🐞 Bug

• enqueue_node() now pushes node_id into the ready queue before touching self._graph.nodes[node_id]; if the node ID is invalid, a KeyError is raised after the enqueue, leaving a poisoned queue item behind. • Workers dereference self._graph.nodes[node_id] outside the worker’s try: block, so a poisoned queue item can crash the worker thread and stall execution. • Resume flow enqueues paused_nodes without validating the node still exists, increasing the chance of this regression surfacing (e.g., stale snapshots or graph changes between pause/resume).

Agent Prompt

### Issue description `GraphStateManager.enqueue_node()` currently enqueues `node_id` into the ready queue before verifying that `node_id` exists in `self._graph.nodes`. If the node ID is invalid/stale, the method raises after the enqueue, leaving a bad ID in the queue; workers then crash when dereferencing `graph.nodes[node_id]`. ### Issue Context This is particularly risky in resume flows: `GraphEngine._start_execution(resume=True)` enqueues `paused_nodes` without validating membership in the current graph, and `GraphRuntimeState` does not validate paused IDs. ### Fix Focus Areas - api/core/workflow/graph_engine/graph_state_manager.py[42-55] - api/core/workflow/graph_engine/graph_engine.py[317-340] - api/core/workflow/graph_engine/worker.py[100-112] - api/core/workflow/runtime/graph_runtime_state.py[342-352] ### Suggested implementation sketch - In `enqueue_node()`: - resolve/validate the node first (`node = self._graph.nodes.get(node_id)`; if None -> raise or no-op depending on desired behavior) - then `self._ready_queue.put(node_id)` - then set `node.state = NodeState.TAKEN` - (Optional hardening) Add a new `schedule_node()` API that performs enqueue + execution-tracking updates atomically and use it consistently across call sites.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

sai and others added 5 commits December 27, 2025 22:46

Fix lost start_execution after enqueue_node when the last previou…

cb834b8

…s node is skipped

Add unit test

79c6e98

[autofix.ci] apply automated fixes

8a735f9

Merge branch 'langgenius:main' into bugfix_workflow_skip

6b9dd40

Apply changes for benchmark PR

dfedfef

qodo-code-review bot reviewed Jan 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: workflow incorrectly marked as completed while nodes are still executing#83

tomerqodo commented Jan 10, 2026

Uh oh!

qodo-code-review bot commented Jan 10, 2026

Uh oh!

qodo-code-review bot Jan 10, 2026

Uh oh!

qodo-code-review bot Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,308 @@
		"""Unit tests for skip propagator."""

		from typing import Any

Conversation

tomerqodo commented Jan 10, 2026

Uh oh!

qodo-code-review bot commented Jan 10, 2026

Code Review by Qodo (Alpha)

Uh oh!

qodo-code-review bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-code-review bot Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants