Skip to content

[FEATURE] Graph - Cancel Node - Do Not Raise Exception #1500

@pgrayy

Description

@pgrayy

Problem Statement

Currently, users can cancel a node execution by setting event.cancel_node = <STR_MSG|True> within a BeforeNodeCallEvent hook. Unlike for swarm (docs, test), this leads to an exception that stops the entire graph execution. I would like to explore if this is necessary. To help figure this out, we should look at the resulting behavior of returning False from the should_continue call.

If it is appropriate to raise an exception, we should raise an explicit node cancel exception instead of a RuntimeError.

Proposed Solution

No response

Use Case

Cleaning exit from graph without having to worry about catching a RuntimeError when I intentionally cancel a node execution.

Alternatives Solutions

No response

Additional Context

No response


Implementation Requirements

Based on clarification discussion and repository analysis:

Summary

Align Graph cancel_node behavior with Swarm - don't raise an exception, set status to FAILED and return gracefully. This follows the same pattern as should_continue returning False.

Technical Approach

Current Behavior Comparison

Swarm (swarm.py lines 750-759) - Graceful exit:

if before_event.cancel_node:
    yield MultiAgentNodeCancelEvent(current_node.node_id, cancel_message)
    self.state.completion_status = Status.FAILED
    break  # No exception

Graph (graph.py lines 864-871) - Raises exception:

if before_event.cancel_node:
    yield MultiAgentNodeCancelEvent(node.node_id, cancel_message)
    raise RuntimeError(cancel_message)  # This needs to change

Reference: should_continue Graceful Exit Pattern (lines 648-651)

When should_continue returns False:

  1. Sets self.state.status = Status.FAILED
  2. Returns gracefully (no exception)
  3. Downstream nodes don't execute
  4. GraphResult is still built and yielded normally via MultiAgentResultEvent

Implementation Details

1. Modify _execute_node in graph.py (lines 864-871)

Replace the raise RuntimeError with graceful handling:

if before_event.cancel_node:
    cancel_message = (
        before_event.cancel_node if isinstance(before_event.cancel_node, str) else "node cancelled by user"
    )
    logger.debug("reason=<%s> | cancelling execution", cancel_message)
    yield MultiAgentNodeCancelEvent(node.node_id, cancel_message)
    
    # Create NodeResult for cancelled node (similar to failure handling)
    node_result = NodeResult(
        result=cancel_message,
        execution_time=0,
        status=Status.FAILED,
        accumulated_usage=Usage(inputTokens=0, outputTokens=0, totalTokens=0),
        accumulated_metrics=Metrics(latencyMs=0),
        execution_count=1,
    )
    
    node.execution_status = Status.FAILED
    node.result = node_result
    self.state.failed_nodes.add(node)
    self.state.results[node.node_id] = node_result
    
    yield MultiAgentNodeStopEvent(node_id=node.node_id, node_result=node_result)
    return  # Graceful exit, no exception

2. Add failed_nodes check in _execute_graph (after line 658)

The comment on line 669-670 notes: "a failure would throw exception and code would not make it here". Since we're removing the exception, add a check:

async for event in self._execute_nodes_parallel(current_batch, invocation_state):
    yield event

# Check if any nodes failed (including cancelled) - stop execution gracefully
if self.state.failed_nodes:
    self.state.status = Status.FAILED
    return

if self.state.status == Status.INTERRUPTED:
    # ... existing interrupt handling

Files to Modify

File Changes
src/strands/multiagent/graph.py Modify _execute_node to not raise, add failed_nodes check in _execute_graph
tests/strands/multiagent/test_graph.py Update test_graph_cancel_node - remove pytest.raises(RuntimeError), verify result is yielded
tests_integ/hooks/multiagent/test_cancel.py Update test_graph_cancel_node - remove pytest.raises(RuntimeError), verify result accessible

Acceptance Criteria

  • Setting cancel_node in a BeforeNodeCallEvent hook does NOT raise a RuntimeError
  • Graph execution stops gracefully when a node is cancelled
  • GraphResult is yielded normally with status=Status.FAILED
  • MultiAgentNodeCancelEvent is still emitted
  • MultiAgentNodeStopEvent is emitted for the cancelled node
  • Downstream nodes do not execute (same as should_continue returning False)
  • Behavior is consistent with Swarm cancel_node handling
  • Unit tests pass without expecting RuntimeError
  • Integration tests pass without expecting RuntimeError

Breaking Change Notice

This is a breaking change for any code that catches RuntimeError during graph node cancellation. The current behavior is considered a bug since it's inconsistent with Swarm behavior and the existing should_continue graceful exit pattern.

Related Links

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions