Skip to content

Fix duplicate tool_result error from Anthropic API#2256

Closed
xingyaoww wants to merge 8 commits intomainfrom
fix-duplicate-tool-result-error
Closed

Fix duplicate tool_result error from Anthropic API#2256
xingyaoww wants to merge 8 commits intomainfrom
fix-duplicate-tool-result-error

Conversation

@xingyaoww
Copy link
Copy Markdown
Collaborator

@xingyaoww xingyaoww commented Mar 2, 2026

Summary

Fixes the "each tool_use must have a single result" error from Anthropic API that occurs when multiple observations exist for the same tool_call_id.

Root Cause

The bug occurs when:

  1. An agent invokes a tool (creating an ActionEvent with a tool_call_id)
  2. The runtime restarts while the tool is running
  3. On restart, event_service.start() creates an AgentErrorEvent for the "unmatched" action
  4. The tool actually completes and creates an ObservationEvent with the same tool_call_id
  5. Now there are two tool_result events for the same tool_call_id, which violates Anthropic's API contract

Fix

Added a new view property ToolResultUniquenessProperty that handles duplicate tool_call_id situations at the message conversion layer:

  • Groups all ObservationBaseEvents by their tool_call_id
  • For any tool_call_id with multiple observations, keeps only one based on priority:
    • ObservationEvent (actual tool result) is preferred over AgentErrorEvent
    • If multiple of the same type exist, the later one is kept
  • Removes duplicate observations from the view

This ensures that when the events are converted to LLM messages, each tool_use has exactly one tool_result, satisfying the API contract.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:287dc1b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-287dc1b-python \
  ghcr.io/openhands/agent-server:287dc1b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:287dc1b-golang-amd64
ghcr.io/openhands/agent-server:287dc1b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:287dc1b-golang-arm64
ghcr.io/openhands/agent-server:287dc1b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:287dc1b-java-amd64
ghcr.io/openhands/agent-server:287dc1b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:287dc1b-java-arm64
ghcr.io/openhands/agent-server:287dc1b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:287dc1b-python-amd64
ghcr.io/openhands/agent-server:287dc1b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:287dc1b-python-arm64
ghcr.io/openhands/agent-server:287dc1b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:287dc1b-golang
ghcr.io/openhands/agent-server:287dc1b-java
ghcr.io/openhands/agent-server:287dc1b-python

About Multi-Architecture Support

  • Each variant tag (e.g., 287dc1b-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 287dc1b-python-amd64) are also available if needed

Add ToolResultUniquenessProperty to ensure each tool_call_id has exactly
one tool result in the LLM message context.

The bug occurs when:
1. An agent invokes a tool (creating ActionEvent with tool_call_id)
2. Runtime restarts while tool is running
3. On restart, AgentErrorEvent is created for 'unmatched' action
4. Tool completes and creates ObservationEvent with same tool_call_id
5. Two tool_results for same tool_call_id violates Anthropic API

The fix adds a new view property that deduplicates tool results:
- Groups observations by tool_call_id
- Keeps ObservationEvent over AgentErrorEvent when both exist
- If multiple of same type, keeps the later one

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)

============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/event
   base.py98891%52, 63, 75–76, 82, 85–86, 88
TOTAL19088568970% 

…ing one

When both AgentErrorEvent and ObservationEvent exist for the same tool_call_id
(typically from a restart scenario), the error context is now merged into the
observation content. This ensures the LLM has full context about what happened
during tool execution.

Changes:
- Added transform() method to ViewPropertyBase for event transformations
- Implemented transform() in ToolResultUniquenessProperty to merge errors
- Updated View.enforce_properties() to apply transforms before enforcement
- Updated tests to verify merging behavior

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww added the review-this This label triggers a PR review by OpenHands label Mar 2, 2026
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solves Real Problem, Implementation Could Be Cleaner

Taste Rating: The core approach is sound and solves a real production issue (Anthropic API duplicate tool_result errors). Tests are comprehensive. However, there are performance and type safety issues worth addressing.

Key Insight: This is the right architectural layer to handle this problem - the view layer already enforces properties, and the restart race condition is hard to prevent upstream. The solution pragmatically merges error context into observations rather than losing information.

⚠️ Eval Risk: This PR changes how tool results are presented to LLMs (merging error context into observations). Per repo guidelines, this should have human review and lightweight eval runs before merging, as it could affect agent behavior and benchmark performance.

See inline comments for specific issues.

@xingyaoww
Copy link
Copy Markdown
Collaborator Author

@OpenHands pls fix all review comments here #2256 (review)

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 2, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

- Fix type safety: Use list[TextContent | ImageContent] instead of
  list[TextContent] with type ignore
- Use model_copy() instead of model_dump() for observation merging
- Add UUID suffix to merged event IDs to prevent potential collisions
- Extract _group_observations_by_tool_call() helper to eliminate
  duplicate iteration between transform() and enforce()
- Replace unittest mocks with real ObservationEvent instances in tests

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Mar 2, 2026

Summary

I have successfully addressed all 5 review comments on PR #2256. The changes have been committed (a50cbf6) and pushed to the fix-duplicate-tool-result-error branch.

Changes Made:

Review Comment Fix Applied
Type ignore usage (line 53) Changed to list[TextContent | ImageContent], removed type ignore
Use model_copy() (line 61) Replaced model_dump() + reconstruction with model_copy(update={...})
ID collision risk (line 66) Added UUID suffix: {id}-merged-{uuid.uuid4().hex[:8]}
Duplicate iteration (lines 95-142) Extracted _group_observations_by_tool_call() helper function
Unnecessary mocks in tests (line 192+) Replaced create_autospec() with real ObservationEvent instances

Verification:

  • ✅ Pre-commit checks pass (ruff, pyright, etc.)
  • ✅ All 15 tool_result_uniqueness tests pass
  • ✅ All 2462 SDK tests pass
  • ✅ All 5 review threads replied to and resolved

Files Modified:

  • openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
  • tests/sdk/context/view/properties/test_tool_result_uniqueness.py

# Create new ObservationEvent with a unique ID
# ID format: "{original_id}-merged-{uuid}" to ensure uniqueness
return ObservationEvent(
id=f"{obs_event.id}-merged-{uuid.uuid4().hex[:8]}",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I’m a bit uncertain about something like this, it will be at the end, which could mean ~far from the action

Then I wonder if maybe we risk running into another Anthropic LLM API error: a tool result has to follow the tool call (nothing between them)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands investigate and tell me if it is actually the case? What if we add a constraint that we only merge two consecutive error events and the observation event? If these two events are separate by a different event,then in that case,we should,even if they have the exact same tool call ID, we should still let them fail as is.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

I investigated the reviewer's concern and confirmed it was valid. The Anthropic API requires that tool_result must follow tool_use without other messages between them. The original implementation could potentially hide bugs or create invalid sequences by handling non-consecutive duplicates.

Changes Made

Implementation (tool_result_uniqueness.py):

  • Renamed _group_observations_by_tool_call()_group_consecutive_observations_by_tool_call()
  • New function only groups observations that appear consecutively in the event list
  • Non-consecutive duplicates with the same tool_call_id are NOT merged or deduplicated
  • Updated docstrings to clearly document the consecutive-only behavior

Tests (test_tool_result_uniqueness.py):

  • Added TestGroupConsecutiveObservations class with 5 tests for the new grouping function
  • Added explicit tests for non-consecutive behavior (ensuring they're NOT handled)
  • Updated existing tests to clarify they work with consecutive events

Verification Checklist

  • ✅ Pre-commit hooks pass on both modified files
  • ✅ All 23 tests pass
  • ✅ Changes directly address reviewer's request for consecutive-only handling
  • ✅ No extraneous changes
  • ✅ Committed and pushed to fix-duplicate-tool-result-error branch

Key Benefit

This safer approach ensures:

  1. Normal restart scenarios (consecutive error + observation) are still handled
  2. Abnormal scenarios (non-consecutive duplicates) fail at the API level, exposing underlying bugs rather than hiding them

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww You may want to take a look at this, where there were a lot of events between them: #1782 (comment)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enyst that one feels like an edge case, likely a CLI bug where there is two parallel processes writing to the same eventstream, which weve encountered at certain point.
i do think we can fix the consecutive case in this PR first and deal with that edge case when it comes up again, wdyt?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands can you browse OpenHands/OpenHands-CLI issue history and see if theres any report similar to this issue: #2256 (comment)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I investigated the OpenHands/OpenHands-CLI issue history as requested to find reports similar to the duplicate tool_result error being fixed in PR #2256.

Findings

I found 3 related issues with similar Anthropic API errors:

Issue Status Root Cause
#423 OPEN User message inserted between tool_use/tool_result after tab switch
#163 CLOSED /resume events misplaced relative to tool_use/result pairs
#268 CLOSED Race condition - multiple EventLog instances writing duplicate event indices

Key Distinction

The CLI issues are about different bugs:

  • Adjacency violations - messages inserted between tool_use and tool_result
  • Race conditions - parallel processes writing to the same eventstream

PR #2256 addresses a different case: When there are two tool_result events for the same tool_call_id (e.g., AgentErrorEvent from runtime restart + actual ObservationEvent).

Conclusion

The current PR's approach of only handling consecutive duplicate tool results is appropriate because:

  1. Non-consecutive duplicates indicate different bugs that should fail loudly, not be silently merged
  2. The runtime restart scenario (the target of this PR) naturally produces consecutive events
  3. The edge case mentioned in issue Bug: Duplicate ObservationEvent with same tool_call_id causes LLM API error on conversation resume #1782 (many events between duplicates) appears to be a CLI bug where parallel processes wrote to the same eventstream - this is a separate issue that needs a different fix

No code changes were made - this was purely an investigation task to answer the reviewer's question.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can deal with consecutive ones in this PR.

Just for clarity, though: issue 1782 is via Cloud, and it contains repeated cases (I had it too)

This addresses reviewer feedback about non-consecutive duplicates. The property
now only handles duplicate tool_call_id observations when they are consecutive
in the event list. If two observations with the same tool_call_id are separated
by other events, they are NOT merged or deduplicated - the underlying bug should
be exposed via an API error rather than hidden.

This is a safer approach because:
1. The normal restart scenario (error then observation) creates consecutive events
2. Non-consecutive duplicates likely indicate a more serious bug that should fail
3. It avoids potentially creating invalid message sequences

Co-authored-by: openhands <openhands@all-hands.dev>
the current view to enforce the property.
"""

def transform(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands review this PR, is there a way to do this WITHOUT adding this new transform fucntion?

I remember there's a util function that transform the full list of events to LLM messages, could we do it there?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I addressed the review comment by simplifying the implementation as suggested. The changes have already been pushed (commit b7b1e6b3).

What Changed

Instead of adding a new transform() method to ViewPropertyBase and a new ToolResultUniquenessProperty class, I moved the duplicate observation deduplication directly into the existing events_to_messages() function.

Files Changed

  1. openhands-sdk/openhands/sdk/event/base.py - Added logic to events_to_messages() to deduplicate consecutive observations with the same tool_call_id, plus a helper function _select_best_observation()

  2. tests/sdk/event/test_events_to_messages.py - Added TestDuplicateObservationDeduplication test class with 7 tests covering the deduplication scenarios

  3. Reverted/Removed:

    • openhands-sdk/openhands/sdk/context/view/properties/base.py - reverted (no transform() method)
    • openhands-sdk/openhands/sdk/context/view/properties/__init__.py - reverted
    • openhands-sdk/openhands/sdk/context/view/view.py - reverted
    • openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py - deleted
    • tests/sdk/context/view/properties/test_tool_result_uniqueness.py - deleted

Why This Approach Is Better

  • Existing pattern: events_to_messages() already batches ActionEvents with the same llm_response_id - same pattern now handles observation deduplication
  • No new abstraction: No transform() method added to ViewPropertyBase
  • Right layer: Deduplication at message conversion is where the Anthropic API constraint matters
  • Net reduction: ~484 fewer lines of code (315 added, 799 removed)

Instead of adding a new transform() method to ViewPropertyBase and a
new ToolResultUniquenessProperty class, handle the deduplication of
consecutive observations with the same tool_call_id directly in the
events_to_messages() function.

This is simpler because:
- events_to_messages already handles a similar case (batching ActionEvents
  with the same llm_response_id)
- No new abstraction needed in the view property system
- The deduplication happens at message conversion time, which is where
  the constraint matters (Anthropic API message format)

The fix still handles the original bug: when a runtime restart creates
an AgentErrorEvent for an "unmatched" action, but the tool then
completes and creates an ObservationEvent with the same tool_call_id,
only the ObservationEvent is kept (it has the actual result).

Co-authored-by: openhands <openhands@all-hands.dev>
import uuid
from abc import ABC, abstractmethod
from datetime import datetime
from logging import getLogger
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands you should use logger from SDK if applicable- please /remember it in AGENTS.md. in this cae, i don't think we need loggers in this file.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've addressed the review comment requesting removal of the logger from base.py and documenting the SDK logger preference. Here's what was done:

Changes Made

1. openhands-sdk/openhands/sdk/event/base.py - Removed unnecessary logger:

  • Removed from logging import getLogger import
  • Removed logger = getLogger(__name__) instance
  • Removed the logger.warning() call when deduplicating observations (the deduplication logic still works, just without logging)

2. AGENTS.md - Added SDK logger guidance:

  • Added new bullet point in the <CODE> section: "When logging is needed, use the SDK's logger from openhands.sdk.logger instead of logging.getLogger(__name__)."

Checklist

  • ✅ Removed logger import and usage from base.py
  • ✅ Added note to AGENTS.md about using SDK logger (openhands.sdk.logger)
  • ✅ Pre-commit checks passed on modified files
  • ✅ Changes committed with proper co-author attribution
  • ✅ Changes pushed to fix-duplicate-tool-result-error branch

The changes are minimal and directly address the reviewer's feedback without any extraneous modifications.

Comment on lines +155 to +161
# Select the best observation to use
if len(batch_observations) > 1:
logger.warning(
f"Found {len(batch_observations)} consecutive observations "
f"for tool_call_id={tool_call_id}. Deduplicating."
)
selected = _select_best_observation(batch_observations)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should NOT select the best, instead of should just: convert all to llm messages, and then merge them together (ie merge the .content field)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands pls fix it

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I addressed the PR review comment from @xingyaoww which requested changing from "selecting the best" observation to "merging all observations together by combining their content fields."

Changes Made:

1. openhands-sdk/openhands/sdk/event/base.py:

  • Replaced _select_best_observation() with _merge_observation_messages()
  • The new function converts all consecutive observations with the same tool_call_id to LLM messages, then combines their .content fields into a single Message
  • Updated docstring to reflect merge behavior

2. tests/sdk/event/test_events_to_messages.py:

  • Updated all test assertions to check that merged content contains information from ALL observations (not just the "best" one)
  • Renamed test class from TestDuplicateObservationDeduplicationTestDuplicateObservationMerging
  • Renamed test methods to reflect merge behavior (e.g., test_consecutive_error_and_observation_merges_content)

Checklist:

  • ✅ Changed from selection to merge approach as requested
  • ✅ All 17 tests pass
  • ✅ Pre-commit checks pass (ruff, pyright, etc.)
  • ✅ Changes are concise and focused only on the requested modification
  • ✅ Committed and pushed to fix-duplicate-tool-result-error branch

Commit:

b99b0bd2 - "Merge duplicate observation content instead of selecting best"

…ence

- Removed logging import and logger usage from event/base.py
- Added note to AGENTS.md about using SDK logger (openhands.sdk.logger) instead of logging.getLogger

Co-authored-by: openhands <openhands@all-hands.dev>
Changed the approach for handling multiple observations with the same
tool_call_id from selecting the "best" one to merging all of them.
Now when consecutive observations share a tool_call_id (e.g., when a
runtime restart creates an error but the tool also completes), their
content is combined into a single tool result message.

This ensures the LLM receives all relevant information from both the
error and the actual result, rather than losing one of them.

Co-authored-by: openhands <openhands@all-hands.dev>
@csmith49
Copy link
Copy Markdown
Collaborator

csmith49 commented Mar 4, 2026

Heads up, I think the problem this PR is fixing is resolved by #2300. But please double-check, I might be misunderstanding this PR!

@xingyaoww
Copy link
Copy Markdown
Collaborator Author

Thanks @csmith49! I do think #2300 is a cleaner fix. Gonna close this PR in favor of that

@xingyaoww xingyaoww closed this Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants