-
Notifications
You must be signed in to change notification settings - Fork 605
Description
Problem Statement
Currently, users can cancel a node execution by setting event.cancel_node = <STR_MSG|True> within a BeforeNodeCallEvent hook. Unlike for swarm (docs, test), this leads to an exception that stops the entire graph execution. I would like to explore if this is necessary. To help figure this out, we should look at the resulting behavior of returning False from the should_continue call.
If it is appropriate to raise an exception, we should raise an explicit node cancel exception instead of a RuntimeError.
Proposed Solution
No response
Use Case
Cleaning exit from graph without having to worry about catching a RuntimeError when I intentionally cancel a node execution.
Alternatives Solutions
No response
Additional Context
No response
Implementation Requirements
Based on clarification discussion and repository analysis:
Summary
Align Graph cancel_node behavior with Swarm - don't raise an exception, set status to FAILED and return gracefully. This follows the same pattern as should_continue returning False.
Technical Approach
Current Behavior Comparison
Swarm (swarm.py lines 750-759) - Graceful exit:
if before_event.cancel_node:
yield MultiAgentNodeCancelEvent(current_node.node_id, cancel_message)
self.state.completion_status = Status.FAILED
break # No exceptionGraph (graph.py lines 864-871) - Raises exception:
if before_event.cancel_node:
yield MultiAgentNodeCancelEvent(node.node_id, cancel_message)
raise RuntimeError(cancel_message) # This needs to changeReference: should_continue Graceful Exit Pattern (lines 648-651)
When should_continue returns False:
- Sets
self.state.status = Status.FAILED - Returns gracefully (no exception)
- Downstream nodes don't execute
GraphResultis still built and yielded normally viaMultiAgentResultEvent
Implementation Details
1. Modify _execute_node in graph.py (lines 864-871)
Replace the raise RuntimeError with graceful handling:
if before_event.cancel_node:
cancel_message = (
before_event.cancel_node if isinstance(before_event.cancel_node, str) else "node cancelled by user"
)
logger.debug("reason=<%s> | cancelling execution", cancel_message)
yield MultiAgentNodeCancelEvent(node.node_id, cancel_message)
# Create NodeResult for cancelled node (similar to failure handling)
node_result = NodeResult(
result=cancel_message,
execution_time=0,
status=Status.FAILED,
accumulated_usage=Usage(inputTokens=0, outputTokens=0, totalTokens=0),
accumulated_metrics=Metrics(latencyMs=0),
execution_count=1,
)
node.execution_status = Status.FAILED
node.result = node_result
self.state.failed_nodes.add(node)
self.state.results[node.node_id] = node_result
yield MultiAgentNodeStopEvent(node_id=node.node_id, node_result=node_result)
return # Graceful exit, no exception2. Add failed_nodes check in _execute_graph (after line 658)
The comment on line 669-670 notes: "a failure would throw exception and code would not make it here". Since we're removing the exception, add a check:
async for event in self._execute_nodes_parallel(current_batch, invocation_state):
yield event
# Check if any nodes failed (including cancelled) - stop execution gracefully
if self.state.failed_nodes:
self.state.status = Status.FAILED
return
if self.state.status == Status.INTERRUPTED:
# ... existing interrupt handlingFiles to Modify
| File | Changes |
|---|---|
src/strands/multiagent/graph.py |
Modify _execute_node to not raise, add failed_nodes check in _execute_graph |
tests/strands/multiagent/test_graph.py |
Update test_graph_cancel_node - remove pytest.raises(RuntimeError), verify result is yielded |
tests_integ/hooks/multiagent/test_cancel.py |
Update test_graph_cancel_node - remove pytest.raises(RuntimeError), verify result accessible |
Acceptance Criteria
- Setting
cancel_nodein aBeforeNodeCallEventhook does NOT raise aRuntimeError - Graph execution stops gracefully when a node is cancelled
-
GraphResultis yielded normally withstatus=Status.FAILED -
MultiAgentNodeCancelEventis still emitted -
MultiAgentNodeStopEventis emitted for the cancelled node - Downstream nodes do not execute (same as
should_continuereturningFalse) - Behavior is consistent with Swarm
cancel_nodehandling - Unit tests pass without expecting
RuntimeError - Integration tests pass without expecting
RuntimeError
Breaking Change Notice
This is a breaking change for any code that catches RuntimeError during graph node cancellation. The current behavior is considered a bug since it's inconsistent with Swarm behavior and the existing should_continue graceful exit pattern.
Related Links
- Original PR introducing cancel_node: hooks - before node call - cancel node #1203
- Swarm interrupt docs: https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/interrupts.md#swarm