Agent and Action Early Termination by igordayen · Pull Request #1537 · embabel/embabel-agent

igordayen · 2026-03-24T22:29:10Z

OVERVIEW

Introduced graceful termination event via BlackBoard
Introduced immediate termination via Termination Exception
Added Termination handling logic to abstract, simple, and concurrent agent
Added Termination handling logic to Default and Parallel tool loops
Agent/Action Early Termination

Please refer to:

Problem Statement

Agents and actions currently lack a mechanism to terminate gracefully or immediately based on
runtime conditions. When a tool detects an unrecoverable situation (e.g., required MCP service
unavailable, invalid state, resource exhaustion), it has no clean way to signal that the action or
entire agent should stop.

This PR introduces a termination API with two scopes (ACTION, AGENT) and two mechanisms (graceful,
immediate), giving tools and actions fine-grained control over execution flow.

Usage

Immediate Termination (Exception-based)

Use when you need to stop right now:

                                                                                  
// Stop current action, agent continues with next action                                           
@Action                                                                                            
fun processData(input: Input): Output {                                                            
    if (!validator.isValid(input)) {                                                               
        throw TerminateActionException("Invalid input - skipping action")                          
    }                                                                                              
    // ...                                                                                         
}                                                                                                  
                                                                                                   
// Stop entire agent                                                                               
@Action                                                                                            
fun criticalOperation(input: Input): Output {                                                      
    if (!mcpClient.isConnected("required_service")) {                                              
        throw TerminateAgentException("Required MCP service unavailable")                          
    }                                                                                              
    // ...                                                                                         
}                                                                                                  
                                                                                                   
Graceful Termination (Signal-based)                                                                
                                                                                                   
Use when you want to stop at the next natural checkpoint:                                          
                                                                                                   
@Action                                                                                            
fun longRunningOperation(ctx: ProcessContext): Output {                                            
    // Set signal - agent will stop before next tick                                               
    ctx.terminateAgent("User requested shutdown")                                                  
                                                                                                   
    // Current action completes normally                                                           
    return Output("Completed, but agent will stop after this")                                     
}                                                                                                  
                                                                                                   
// For LLM-based actions with tool loop                                                            
@Tool                                                                                              
fun checkCondition(ctx: ProcessContext): String {                                                  
    if (shouldStop()) {                                                                            
        // Signal - tool loop stops before next tool call                                          
        ctx.terminateAction("Condition met - no more tools needed")                                
    }                                                                                              
    return "Checked"                                                                               
}

DETAILED COMPARISON

Important: Graceful terminateAction() only works in LLM-based actions with tool loops. For normal agent actions, use throw TerminateActionException().

Graceful vs Immediate Termination
Aspect: Mechanism
Graceful (Signal): Sets signal on blackboard
Immediate (Exception): Throws exception
────────────────────────────────────────
Aspect: When checked
Graceful (Signal): At next checkpoint
Immediate (Exception): Immediately
────────────────────────────────────────
Aspect: Current work
Graceful (Signal): Completes
Immediate (Exception): Stops
────────────────────────────────────────
Aspect: API
Graceful (Signal): ctx.terminateAgent() / ctx.terminateAction()
Immediate (Exception): throw TerminateAgentException() / throw TerminateActionException()
────────────────────────────────────────
Aspect: Use case
Graceful (Signal): Clean shutdown, allow cleanup
Immediate (Exception): Critical failure, invalid state

Tool Loop Handling Comparison

┌─────────────┬─────────────────────────────────────────┬─────────────────────────────────────────┐
  │ Termination │             DefaultToolLoop             │            ParallelToolLoop             │
  │     Type    │                                         │                                         │
  ├─────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
  │ Graceful    │ checkForActionTerminationSignal()       │ checkForActionTerminationSignal()       │
  │ ACTION      │ called before each tool call. Throws    │ called once before batch. Throws        │
  │             │ TerminateActionException.               │ TerminateActionException.               │
  ├─────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
  │             │ Not handled by tool loop. Signal        │                                         │
  │ Graceful    │ checked at agent process level          │ Same as DefaultToolLoop. Action         │
  │ AGENT       │ (TerminationSignalPolicy) before next   │ completes normally.                     │
  │             │ tick. Action completes normally.        │                                         │
  ├─────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
  │ Immediate   │ TerminateActionException propagates     │ Exception captured in exceptionally     │
  │ ACTION      │ immediately. Remaining tools in         │ handler. All parallel tools complete,   │
  │             │ sequence not called.                    │ then exception re-thrown.               │
  ├─────────────┼─────────────────────────────────────────┼─────────────────────────────────────────┤
  │             │ TerminateAgentException propagates      │ Exception captured in exceptionally     │
  │ Immediate   │ immediately. Remaining tools in         │ handler. All parallel tools complete,   │
  │ AGENT       │ sequence not called.                    │ then exception re-thrown with priority  │
  │             │                                         │ (agent > action).                       │
  └─────────────┴─────────────────────────────────────────┴─────────────────────────────────────────┘

Key Implementation Details

ActionStatusCode.TERMINATED: New status for action-level termination. Maps to
AgentProcessStatusCode.RUNNING (agent continues).
ActionStatusCode.AGENT_TERMINATED: New status for agent-level termination. Maps to
AgentProcessStatusCode.TERMINATED (agent stops).
Retry exclusion: Both termination exceptions excluded from retry policies (SpringAiRetryPolicy).
hasRun handling: Terminated actions don't set hasRun=true, allowing retry on next tick.

Files Changed

New files:

api/termination/TerminationExtensions.kt - Extension functions for graceful termination
api/common/TerminationSignal.kt - Signal data class and scope enum
api/tool/TerminationExceptions.kt - Exception classes

Modified:

core/OperationStatus.kt - Added TERMINATED, AGENT_TERMINATED to ActionStatusCode
core/support/AbstractAgentProcess.kt - Catch termination exceptions in executeAction
core/support/ConcurrentAgentProcess.kt - Handle termination statuses
spi/loop/support/DefaultToolLoop.kt - Check for graceful action termination signal
spi/loop/support/ParallelToolLoop.kt - Capture and propagate termination in parallel execution
spi/support/springai/SpringAiRetryPolicy.kt - Exclude termination from retry

* Introduced graceful termination event via BlackBoard * Introduced immediate termination via Termination Exception * Added Termination handling logic to abstract, simple, and concurrent agent * Added Termination handling logic to Default and Parallel tool loops

jasperblues · 2026-03-24T23:19:10Z

The termination context is currently string-only (reason: String) across the public API (TerminateAgentException, TerminateActionException, ProcessContext.terminateAgent/terminateAction). If we ever need structured context (error codes, metadata, severity, recovery hints), we'd need to add overloads or break the API.

String-only is probably fine — adding an optional metadata: Map<String, Any> = emptyMap() parameter later is backwards-compatible in Kotlin (default args) and Java (overloads). But wrapping in a TerminationContext type later would be a breaking change, so if there is appetite for structured context (e.g., data class TerminationContext(val reason: String, val errorCode: String? = null, val metadata: Map<String, Any> = emptyMap())) then it's worth introducing the type now rather than migrating callers later.

jasperblues · 2026-03-24T23:38:07Z

The termination signal is currently stored on the Blackboard via a magic string key (__termination_signal__), but it seems like purely process-level control flow — written by ProcessContext extensions, read by internal checkpoints (TerminationSignalPolicy, checkForActionTerminationSignal), never consumed externally. Since the Blackboard is the data plane for action inputs/outputs, is there a reason the signal needs to live there rather than on the process itself?

Asking because it leads to a couple of friction points: the Blackboard doesn't support key removal ("objects are immutable and may not be removed"), so clearTerminationSignal() has to set Unit as a sentinel value. And the signal sits alongside domain data with a magic string key.

Given that AbstractAgentProcess already has private mutable fields for process-level state (_status, _failureInfo, _goal, _lastWorldState), a _terminationSignal: TerminationSignal? = null would fit that same pattern — clearing becomes = null, and the BLACKBOARD_KEY / Unit sentinel / clearTerminationSignal all go away. What do you think?

jasperblues · 2026-03-25T00:07:48Z

The PR description documents the mechanics of signal vs exception well. The project docs use practical code examples throughout (e.g., the tools reference) — how about enhancing the docs with practical "when would I choose which" supported by examples? The key distinction is about side effects:

// Signal: "Let me finish my work, then stop"
@LlmTool(description = "Save and shutdown")
fun saveAndStop(ctx: ProcessContext): String {
    customerRepository.save(record)  // side effect completes
    ctx.terminateAction("Save complete, no more work needed")
    return "Saved"  // tool finishes normally
}

// Exception: "Stop now, nothing left to do"
@LlmTool(description = "Check service health")
fun checkHealth(): String {
    if (!mcpClient.isConnected("required_service")) {
        throw TerminateActionException("Service unavailable")
        // nothing after this runs
    }
    return "Healthy"
}

It is especially worth noting the key difference between sequential and parallel stop points. In both modes, neither mechanism can stop sibling tools already in flight — that's inherent to parallelism. For example, if deleteCustomer and chargeCard run in the same parallel batch and deleteCustomer signals termination — chargeCard is already executing and will attempt to charge a deleted customer.

Currently there's an additional difference: in ParallelToolLoop, the signal path isn't detected until the next batch, costing an extra LLM round-trip and potentially more side effects:

Signal path (current) — not detected until next batch:

Batch 1: [deleteCustomer, chargeCard] launch in parallel
  ├── deleteCustomer → sets signal, returns normally ───┐
  └── chargeCard     → runs to completion (charges!)    │
                                                         ▼
  propagateControlFlowSignals() finds nothing
  → results sent to LLM
  → LLM responds with Batch 2: [sendConfirmation, ...]
  → checkForActionTerminationSignal() detects signal
  → action stops HERE (one extra LLM round-trip + batch 2 side effects)

Adding a checkForActionTerminationSignal() call right after propagateControlFlowSignals(results) would catch signals set during the batch, eliminating the extra round-trip and making the stop-point behavior consistent between sequential and parallel — the only difference becomes the inherent in-flight sibling behavior.

jasperblues · 2026-03-25T00:28:43Z

Quick question on the retry behavior for terminated actions: hasRun isn't set to true for TERMINATED actions, which means the planner keeps selecting them on every tick. The existing test handles this with an attemptCount that makes the condition transient — but what happens if the termination condition is permanent (e.g., an external service that stays down)?

I wrote a test to check:

val PermanentlyTerminatingActionAgent = agent("PermanentTerminator", description = "Agent that always terminates action") {
    transformation<UserInput, TestPerson>(name = "always_terminating_action") {
        val attemptCount = (it["attemptCount"] as? Int) ?: 0
        it["attemptCount"] = attemptCount + 1
        throw TerminateActionException("Service permanently unavailable")
    }
    // ...goal and other transformations...
}

@Test
fun `permanently terminating action retries until action budget exhausted`() {
    val actionBudget = 5
    val agentProcess = SimpleAgentProcess(
        id = "test-permanent-action-termination",
        agent = PermanentlyTerminatingActionAgent,
        processOptions = ProcessOptions(budget = Budget(actions = actionBudget)),
        // ...
    )

    val result = agentProcess.run()

    // The action was retried on every tick until budget exhausted
    assertThat(blackboard["attemptCount"] as Int).isEqualTo(actionBudget)  // all 5 burned
    assertThat(result.status).isEqualTo(AgentProcessStatusCode.TERMINATED)
}

This passes — the action retries all 5 times before maxActions terminates the process. With the default budget of 50 actions, and a slow action (e.g., network timeout), that could mean a long wait before the process gives up.

Would it make sense to still set hasRun = true on terminated actions, and let actions that genuinely need to retry after termination opt in via canRerun = true? That way it stays consistent with the existing mechanism rather than introducing a special case.

jasperblues · 2026-03-25T00:34:14Z

In ConcurrentAgentProcess.actionStatusToAgentProcessStatus, AGENT_TERMINATED is checked before FAILED. Since we support concurrent actions, consider two running at the same time:

Action A: tries to connect to a database, gets a NullPointerException → FAILED
Action B: checks a condition and throws TerminateAgentException("shutting down") → AGENT_TERMINATED

The process reports TERMINATED, and the failure from Action A is silently dropped — no log, no failureInfo. Someone debugging why the agent stopped sees "terminated" and might not look for the NPE.

Is that intentional? Could be worth either logging the concurrent failure when AGENT_TERMINATED wins, or returning a richer result that surfaces both signals. What do you think?

igordayen · 2026-03-25T01:15:40Z

@jasperblues - thanks for extenive feedback.

TerminationContext - failureInfo: Any? already supports structured data.

 Flow:                                                                                              
                                                                                                     
  AGENT scope:                                                                                       
    Action sets _terminationRequest(AGENT, "reason")                                                   
         ↓                                                                                           
    identifyEarlyTermination() detects                                                               
         ↓                                                                                           
    Creates EarlyTermination(reason="reason", policy=TerminationSignalPolicy)                        
         ↓                                                                                           
    _**failureInfo = earlyTermination   ← stored here (existing field)**                                 
    _terminationRequest = null         ← cleared                                                     
    status = TERMINATED                                                                              
                                                                                                     
  ACTION scope:                                                                                      
    Tool sets _terminationRequest(ACTION, "reason")                                                  
         ↓                                                                                           
    checkForActionTerminationSignal() detects                                                        
         ↓                                                                                           
    _terminationRequest = null         ← cleared                                                     
    throw TerminateActionException("reason")                                                         
         ↓                                                                                           
    (_failureInfo NOT set - process continues)

using AbstractAgentProcess - vs. BB ==> very good point, agreed
TERMINATED ==> hasRun=true set (like normal actions)

Actions needing retry must declare canRerun=true
Consistent, explicit, safer ==> agreed

Priority Failure over Terminated:

Arguable - failures are unexpected, termination is intentional
But losing failure info is bad either way
AGENT_TERMINATED is intentional (developer requested it), but FAILED is unexpected
(something went wrong). Both should be visible for debugging. ==> will address

Re: ", eliminating the extra round-trip and making the stop-point behavior consistent between sequential and parallel — the only difference becomes the inherent in-flight sibling behavior." ==> agreed

igordayen · 2026-03-25T02:58:02Z

Review feedback

Items completed:

Moved termination signal to AbstractAgentProcess field - _terminationRequest with
terminationRequest/setTerminationRequest()/resetTerminationRequest() accessors
Updated TerminationExtensions - uses AbstractAgentProcess cast instead of Blackboard
Updated identiyEarlyTermination() - handles AGENT scope and clears stale ACTION signals
Updated tool loops - checkForActionTerminationSignal() uses new field
Removed BLACKBOARD_KEY from TerminationSignal
Used canRerun=true instead of hasRun special case for TERMINATED actions
Added signal check after transformers in both DefaultToolLoop and ParallelToolLoop (performance concern on extra LLM call)
Added concurrent failure logging in ConcurrentAgentProcess

jasperblues · 2026-03-25T03:04:34Z

Looks good!

A couple of small things I noticed:

Stale KDoc in TerminationSignal.kt — The doc still says "When placed on the blackboard" but the signal now lives on the process. Minor, but worth updating so it doesn't confuse anyone reading the API docs.
Hard cast in TerminationExtensions.kt — terminateAgent/terminateAction use agentProcess as AbstractAgentProcess, which would throw a bare ClassCastException if someone has a custom AgentProcess implementation. Maybe a safe cast with a descriptive error? e.g.:
```
val process = agentProcess as? AbstractAgentProcess
    ?: error("Termination signals require AbstractAgentProcess")
```

Also — do you think it would be worth adding some "when would I choose signal vs exception" examples to the user-facing docs? Something like the signal-for-side-effects vs exception-for-immediate-stop pattern. Totally optional, but could help users pick the right mechanism without having to read the internals.

jasperblues

Looks good. A couple of minor nitpicks in my latest comment. I think they're worth tacking on, but I added approval in advance.

igordayen · 2026-03-25T03:17:14Z

@jasperblues - agreed with comments, will address. thank you

igordayen · 2026-03-25T21:34:18Z

Stale KDoc in TerminationSignal.kt — The doc still says "When placed on the blackboard" but the signal now lives on the process. Minor, but worth updating so it doesn't confuse anyone reading the API docs.

Hard cast in TerminationExtensions.kt — terminateAgent/terminateAction use agentProcess as AbstractAgentProcess, which would throw a bare ClassCastException if someone has a custom AgentProcess implementation. Maybe a safe cast with a descriptive error? e.g.:

==> addressed both.

examples for user guide will be added as per suggestion.

poutsma

Looking good in general, couple of comments.

embabel-agent-api/src/main/kotlin/com/embabel/agent/api/termination/TerminationExtensions.kt

embabel-agent-api/src/main/kotlin/com/embabel/agent/api/tool/TerminationExceptions.kt

- Introduced TerminationException as a parent to Action and Agent Termination Exceptions - Introduced public API terminateAgent/Action on AgentProcess interface - Updated User Guide

igordayen · 2026-03-27T00:35:37Z

Summary of changes:

TerminationException sealed base class
AgentProcess interface with terminateAgent()/terminateAction()
AbstractAgentProcess implementation with private setTerminationRequest
TerminationExtensions simplified to delegate
Tests updated to use public API
Cognitive complexity fixed in SimpleAgentProcess
Checkpoints verified correct in both tool loops

- Introduced TerminationException as a parent to Action and Agent Termination Exceptions - Introduced public API terminateAgent/Action on AgentProcess interface - Updated User Guide

sonarqubecloud · 2026-03-27T12:57:12Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
76.5% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

poutsma

Looks good now, thanks!

igordayen mentioned this pull request Mar 24, 2026

Add withTools(<tools>, required=true) to Throw ToolNotFoundException if tools are not initialized #1481

Open

igordayen requested review from alexheifetz, jasperblues, johnsonr and poutsma March 24, 2026 22:50

Addressing Review Feedback

94c236a

jasperblues approved these changes Mar 25, 2026

View reviewed changes

Fixed KDOC comments and applied smart cast by 'check'

d44bbc2

poutsma reviewed Mar 26, 2026

View reviewed changes

embabel-agent-api/src/main/kotlin/com/embabel/agent/api/termination/TerminationExtensions.kt Outdated Show resolved Hide resolved

embabel-agent-api/src/main/kotlin/com/embabel/agent/api/tool/TerminationExceptions.kt Show resolved Hide resolved

Reviews feedback - next round:

7b16aab

- Introduced TerminationException as a parent to Action and Agent Termination Exceptions - Introduced public API terminateAgent/Action on AgentProcess interface - Updated User Guide

Reviews feedback - next round -committed:

42b7f19

- Introduced TerminationException as a parent to Action and Agent Termination Exceptions - Introduced public API terminateAgent/Action on AgentProcess interface - Updated User Guide

poutsma approved these changes Mar 27, 2026

View reviewed changes

igordayen merged commit eb57f0b into main Mar 27, 2026
17 checks passed

igordayen deleted the agent-termination branch March 27, 2026 13:44

Conversation

igordayen commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OVERVIEW

Agent/Action Early Termination

Problem Statement

Usage

Immediate Termination (Exception-based)

DETAILED COMPARISON

Tool Loop Handling Comparison

Key Implementation Details

Uh oh!

jasperblues commented Mar 24, 2026

Uh oh!

jasperblues commented Mar 24, 2026

Uh oh!

jasperblues commented Mar 25, 2026

Uh oh!

jasperblues commented Mar 25, 2026

Uh oh!

jasperblues commented Mar 25, 2026

Uh oh!

igordayen commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igordayen commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review feedback

Uh oh!

jasperblues commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasperblues left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

igordayen commented Mar 25, 2026

Uh oh!

igordayen commented Mar 25, 2026

Uh oh!

poutsma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

igordayen commented Mar 27, 2026

Summary of changes:

Uh oh!

sonarqubecloud bot commented Mar 27, 2026

Quality Gate passed

Uh oh!

poutsma left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

igordayen commented Mar 24, 2026 •

edited

Loading

igordayen commented Mar 25, 2026 •

edited

Loading

igordayen commented Mar 25, 2026 •

edited

Loading

jasperblues commented Mar 25, 2026 •

edited

Loading

jasperblues left a comment •

edited

Loading