Fix duplicate tool_result error from Anthropic API by xingyaoww · Pull Request #2256 · OpenHands/software-agent-sdk

xingyaoww · 2026-03-02T07:31:04Z

Summary

Fixes the "each tool_use must have a single result" error from Anthropic API that occurs when multiple observations exist for the same tool_call_id.

Root Cause

The bug occurs when:

An agent invokes a tool (creating an ActionEvent with a tool_call_id)
The runtime restarts while the tool is running
On restart, event_service.start() creates an AgentErrorEvent for the "unmatched" action
The tool actually completes and creates an ObservationEvent with the same tool_call_id
Now there are two tool_result events for the same tool_call_id, which violates Anthropic's API contract

Fix

Added a new view property ToolResultUniquenessProperty that handles duplicate tool_call_id situations at the message conversion layer:

Groups all ObservationBaseEvents by their tool_call_id
For any tool_call_id with multiple observations, keeps only one based on priority:
- ObservationEvent (actual tool result) is preferred over AgentErrorEvent
- If multiple of the same type exist, the later one is kept
Removes duplicate observations from the view

This ensures that when the events are converted to LLM messages, each tool_use has exactly one tool_result, satisfying the API contract.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:287dc1b-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-287dc1b-python \
  ghcr.io/openhands/agent-server:287dc1b-python

All tags pushed for this build

ghcr.io/openhands/agent-server:287dc1b-golang-amd64
ghcr.io/openhands/agent-server:287dc1b-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:287dc1b-golang-arm64
ghcr.io/openhands/agent-server:287dc1b-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:287dc1b-java-amd64
ghcr.io/openhands/agent-server:287dc1b-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:287dc1b-java-arm64
ghcr.io/openhands/agent-server:287dc1b-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:287dc1b-python-amd64
ghcr.io/openhands/agent-server:287dc1b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:287dc1b-python-arm64
ghcr.io/openhands/agent-server:287dc1b-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:287dc1b-golang
ghcr.io/openhands/agent-server:287dc1b-java
ghcr.io/openhands/agent-server:287dc1b-python

About Multi-Architecture Support

Each variant tag (e.g., 287dc1b-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 287dc1b-python-amd64) are also available if needed

Add ToolResultUniquenessProperty to ensure each tool_call_id has exactly one tool result in the LLM message context. The bug occurs when: 1. An agent invokes a tool (creating ActionEvent with tool_call_id) 2. Runtime restarts while tool is running 3. On restart, AgentErrorEvent is created for 'unmatched' action 4. Tool completes and creates ObservationEvent with same tool_call_id 5. Two tool_results for same tool_call_id violates Anthropic API The fix adds a new view property that deduplicates tool results: - Groups observations by tool_call_id - Keeps ObservationEvent over AgentErrorEvent when both exist - If multiple of same type, keeps the later one Co-authored-by: openhands <openhands@all-hands.dev>

github-actions · 2026-03-02T07:31:29Z

API breakage checks (Griffe)

Result: Failed

Log excerpt (first 1000 characters)


============================================================
Checking openhands-sdk (openhands.sdk)
============================================================
Comparing openhands-sdk 1.11.5 against 1.11.4
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): load_public_skills
::notice title=openhands-sdk API::Ignoring Field metadata-only change (non-breaking): temperature
::warning file=openhands-sdk/openhands/sdk/llm/llm.py,line=196,title=LLM.top_p::Attribute value was changed: `Field(default=1.0, ge=0, le=1)` -> `Field(default=None, ge=0, le=1, description='Nucleus sampling parameter. Defaults to None (uses provider default). Set to a value between 0 and 1 to control diversity of outputs.')`
::error title=SemVer::Breaking changes detected (1); require at least minor version bump from 1.11.x, but new is 1.11.5

============================================================
Checking openhands-workspace (openhands.workspace)
============================

Action log

github-actions · 2026-03-02T07:31:41Z

Agent server REST API breakage checks (OpenAPI)

Result: Passed

Action log

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py

github-actions · 2026-03-02T07:33:42Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/event
base.py	98	8	91%	52, 63, 75–76, 82, 85–86, 88
TOTAL	19088	5689	70%

…ing one When both AgentErrorEvent and ObservationEvent exist for the same tool_call_id (typically from a restart scenario), the error context is now merged into the observation content. This ensures the LLM has full context about what happened during tool execution. Changes: - Added transform() method to ViewPropertyBase for event transformations - Implemented transform() in ToolResultUniquenessProperty to merge errors - Updated View.enforce_properties() to apply transforms before enforcement - Updated tests to verify merging behavior Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Solves Real Problem, Implementation Could Be Cleaner

Taste Rating: The core approach is sound and solves a real production issue (Anthropic API duplicate tool_result errors). Tests are comprehensive. However, there are performance and type safety issues worth addressing.

Key Insight: This is the right architectural layer to handle this problem - the view layer already enforces properties, and the restart race condition is hard to prevent upstream. The solution pragmatically merges error context into observations rather than losing information.

⚠️ Eval Risk: This PR changes how tool results are presented to LLMs (merging error context into observations). Per repo guidelines, this should have human review and lightweight eval runs before merging, as it could affect agent behavior and benchmark performance.

See inline comments for specific issues.

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py

tests/sdk/context/view/properties/test_tool_result_uniqueness.py

xingyaoww · 2026-03-02T07:53:06Z

@OpenHands pls fix all review comments here #2256 (review)

openhands-ai · 2026-03-02T07:53:28Z

I'm on it! xingyaoww can track my progress at all-hands.dev

- Fix type safety: Use list[TextContent | ImageContent] instead of list[TextContent] with type ignore - Use model_copy() instead of model_dump() for observation merging - Add UUID suffix to merged event IDs to prevent potential collisions - Extract _group_observations_by_tool_call() helper to eliminate duplicate iteration between transform() and enforce() - Replace unittest mocks with real ObservationEvent instances in tests Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-03-02T08:03:08Z

Summary

I have successfully addressed all 5 review comments on PR #2256. The changes have been committed (a50cbf6) and pushed to the fix-duplicate-tool-result-error branch.

Changes Made:

Review Comment	Fix Applied
Type ignore usage (line 53)	Changed to `list[TextContent \| ImageContent]`, removed type ignore
Use model_copy() (line 61)	Replaced `model_dump()` + reconstruction with `model_copy(update={...})`
ID collision risk (line 66)	Added UUID suffix: `{id}-merged-{uuid.uuid4().hex[:8]}`
Duplicate iteration (lines 95-142)	Extracted `_group_observations_by_tool_call()` helper function
Unnecessary mocks in tests (line 192+)	Replaced `create_autospec()` with real `ObservationEvent` instances

Verification:

✅ Pre-commit checks pass (ruff, pyright, etc.)
✅ All 15 tool_result_uniqueness tests pass
✅ All 2462 SDK tests pass
✅ All 5 review threads replied to and resolved

Files Modified:

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
tests/sdk/context/view/properties/test_tool_result_uniqueness.py

enyst · 2026-03-02T08:30:06Z

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py

+    # Create new ObservationEvent with a unique ID
+    # ID format: "{original_id}-merged-{uuid}" to ensure uniqueness
+    return ObservationEvent(
+        id=f"{obs_event.id}-merged-{uuid.uuid4().hex[:8]}",


🤔 I’m a bit uncertain about something like this, it will be at the end, which could mean ~far from the action

Then I wonder if maybe we risk running into another Anthropic LLM API error: a tool result has to follow the tool call (nothing between them)

@OpenHands investigate and tell me if it is actually the case? What if we add a constraint that we only merge two consecutive error events and the observation event? If these two events are separate by a different event,then in that case,we should,even if they have the exact same tool call ID, we should still let them fail as is.

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary of Changes

I investigated the reviewer's concern and confirmed it was valid. The Anthropic API requires that tool_result must follow tool_use without other messages between them. The original implementation could potentially hide bugs or create invalid sequences by handling non-consecutive duplicates.

Changes Made

Implementation (tool_result_uniqueness.py):

Renamed _group_observations_by_tool_call() → _group_consecutive_observations_by_tool_call()

New function only groups observations that appear consecutively in the event list

Non-consecutive duplicates with the same tool_call_id are NOT merged or deduplicated

Updated docstrings to clearly document the consecutive-only behavior

Tests (test_tool_result_uniqueness.py):

Added TestGroupConsecutiveObservations class with 5 tests for the new grouping function

Added explicit tests for non-consecutive behavior (ensuring they're NOT handled)

Updated existing tests to clarify they work with consecutive events

Verification Checklist

✅ Pre-commit hooks pass on both modified files

✅ All 23 tests pass

✅ Changes directly address reviewer's request for consecutive-only handling

✅ No extraneous changes

✅ Committed and pushed to fix-duplicate-tool-result-error branch

Key Benefit

This safer approach ensures:

Normal restart scenarios (consecutive error + observation) are still handled

Abnormal scenarios (non-consecutive duplicates) fail at the API level, exposing underlying bugs rather than hiding them

@xingyaoww You may want to take a look at this, where there were a lot of events between them: #1782 (comment)

@enyst that one feels like an edge case, likely a CLI bug where there is two parallel processes writing to the same eventstream, which weve encountered at certain point.
i do think we can fix the consecutive case in this PR first and deal with that edge case when it comes up again, wdyt?

@OpenHands can you browse OpenHands/OpenHands-CLI issue history and see if theres any report similar to this issue: #2256 (comment)

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary

I investigated the OpenHands/OpenHands-CLI issue history as requested to find reports similar to the duplicate tool_result error being fixed in PR #2256.

Findings

I found 3 related issues with similar Anthropic API errors:

Issue Status Root Cause

#423 OPEN User message inserted between tool_use/tool_result after tab switch

#163 CLOSED /resume events misplaced relative to tool_use/result pairs

#268 CLOSED Race condition - multiple EventLog instances writing duplicate event indices

Key Distinction

The CLI issues are about different bugs:

Adjacency violations - messages inserted between tool_use and tool_result

Race conditions - parallel processes writing to the same eventstream

PR #2256 addresses a different case: When there are two tool_result events for the same tool_call_id (e.g., AgentErrorEvent from runtime restart + actual ObservationEvent).

Conclusion

The current PR's approach of only handling consecutive duplicate tool results is appropriate because:

Non-consecutive duplicates indicate different bugs that should fail loudly, not be silently merged

The runtime restart scenario (the target of this PR) naturally produces consecutive events

The edge case mentioned in issue Bug: Duplicate ObservationEvent with same tool_call_id causes LLM API error on conversation resume #1782 (many events between duplicates) appears to be a CLI bug where parallel processes wrote to the same eventstream - this is a separate issue that needs a different fix

No code changes were made - this was purely an investigation task to answer the reviewer's question.

Sure, we can deal with consecutive ones in this PR.

Just for clarity, though: issue 1782 is via Cloud, and it contains repeated cases (I had it too)

This addresses reviewer feedback about non-consecutive duplicates. The property now only handles duplicate tool_call_id observations when they are consecutive in the event list. If two observations with the same tool_call_id are separated by other events, they are NOT merged or deduplicated - the underlying bug should be exposed via an API error rather than hidden. This is a safer approach because: 1. The normal restart scenario (error then observation) creates consecutive events 2. Non-consecutive duplicates likely indicate a more serious bug that should fail 3. It avoids potentially creating invalid message sequences Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-03-03T11:06:24Z

openhands-sdk/openhands/sdk/context/view/properties/base.py

            the current view to enforce the property.
        """

+    def transform(


@OpenHands review this PR, is there a way to do this WITHOUT adding this new transform fucntion?

I remember there's a util function that transform the full list of events to LLM messages, could we do it there?

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary

I addressed the review comment by simplifying the implementation as suggested. The changes have already been pushed (commit b7b1e6b3).

What Changed

Instead of adding a new transform() method to ViewPropertyBase and a new ToolResultUniquenessProperty class, I moved the duplicate observation deduplication directly into the existing events_to_messages() function.

Files Changed

openhands-sdk/openhands/sdk/event/base.py - Added logic to events_to_messages() to deduplicate consecutive observations with the same tool_call_id, plus a helper function _select_best_observation()

tests/sdk/event/test_events_to_messages.py - Added TestDuplicateObservationDeduplication test class with 7 tests covering the deduplication scenarios

Reverted/Removed:

openhands-sdk/openhands/sdk/context/view/properties/base.py - reverted (no transform() method)

openhands-sdk/openhands/sdk/context/view/properties/__init__.py - reverted

openhands-sdk/openhands/sdk/context/view/view.py - reverted

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py - deleted

tests/sdk/context/view/properties/test_tool_result_uniqueness.py - deleted

Why This Approach Is Better

Existing pattern: events_to_messages() already batches ActionEvents with the same llm_response_id - same pattern now handles observation deduplication

No new abstraction: No transform() method added to ViewPropertyBase

Right layer: Deduplication at message conversion is where the Anthropic API constraint matters

Net reduction: ~484 fewer lines of code (315 added, 799 removed)

Instead of adding a new transform() method to ViewPropertyBase and a new ToolResultUniquenessProperty class, handle the deduplication of consecutive observations with the same tool_call_id directly in the events_to_messages() function. This is simpler because: - events_to_messages already handles a similar case (batching ActionEvents with the same llm_response_id) - No new abstraction needed in the view property system - The deduplication happens at message conversion time, which is where the constraint matters (Anthropic API message format) The fix still handles the original bug: when a runtime restart creates an AgentErrorEvent for an "unmatched" action, but the tool then completes and creates an ObservationEvent with the same tool_call_id, only the ObservationEvent is kept (it has the actual result). Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-03-03T11:15:26Z

openhands-sdk/openhands/sdk/event/base.py

 import uuid
 from abc import ABC, abstractmethod
 from datetime import datetime
+from logging import getLogger


@OpenHands you should use logger from SDK if applicable- please /remember it in AGENTS.md. in this cae, i don't think we need loggers in this file.

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary

I've addressed the review comment requesting removal of the logger from base.py and documenting the SDK logger preference. Here's what was done:

Changes Made

1. openhands-sdk/openhands/sdk/event/base.py - Removed unnecessary logger:

Removed from logging import getLogger import

Removed logger = getLogger(__name__) instance

Removed the logger.warning() call when deduplicating observations (the deduplication logic still works, just without logging)

2. AGENTS.md - Added SDK logger guidance:

Added new bullet point in the <CODE> section: "When logging is needed, use the SDK's logger from openhands.sdk.logger instead of logging.getLogger(__name__)."

Checklist

✅ Removed logger import and usage from base.py

✅ Added note to AGENTS.md about using SDK logger (openhands.sdk.logger)

✅ Pre-commit checks passed on modified files

✅ Changes committed with proper co-author attribution

✅ Changes pushed to fix-duplicate-tool-result-error branch

The changes are minimal and directly address the reviewer's feedback without any extraneous modifications.

xingyaoww · 2026-03-03T11:16:07Z

openhands-sdk/openhands/sdk/event/base.py

+                # Select the best observation to use
+                if len(batch_observations) > 1:
+                    logger.warning(
+                        f"Found {len(batch_observations)} consecutive observations "
+                        f"for tool_call_id={tool_call_id}. Deduplicating."
+                    )
+                    selected = _select_best_observation(batch_observations)


We should NOT select the best, instead of should just: convert all to llm messages, and then merge them together (ie merge the .content field)

@OpenHands pls fix it

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary

I addressed the PR review comment from @xingyaoww which requested changing from "selecting the best" observation to "merging all observations together by combining their content fields."

Changes Made:

1. openhands-sdk/openhands/sdk/event/base.py:

Replaced _select_best_observation() with _merge_observation_messages()

The new function converts all consecutive observations with the same tool_call_id to LLM messages, then combines their .content fields into a single Message

Updated docstring to reflect merge behavior

2. tests/sdk/event/test_events_to_messages.py:

Updated all test assertions to check that merged content contains information from ALL observations (not just the "best" one)

Renamed test class from TestDuplicateObservationDeduplication → TestDuplicateObservationMerging

Renamed test methods to reflect merge behavior (e.g., test_consecutive_error_and_observation_merges_content)

Checklist:

✅ Changed from selection to merge approach as requested

✅ All 17 tests pass

✅ Pre-commit checks pass (ruff, pyright, etc.)

✅ Changes are concise and focused only on the requested modification

✅ Committed and pushed to fix-duplicate-tool-result-error branch

Commit:

b99b0bd2 - "Merge duplicate observation content instead of selecting best"

…ence - Removed logging import and logger usage from event/base.py - Added note to AGENTS.md about using SDK logger (openhands.sdk.logger) instead of logging.getLogger Co-authored-by: openhands <openhands@all-hands.dev>

Changed the approach for handling multiple observations with the same tool_call_id from selecting the "best" one to merging all of them. Now when consecutive observations share a tool_call_id (e.g., when a runtime restart creates an error but the tool also completes), their content is combined into a single tool result message. This ensures the LLM receives all relevant information from both the error and the actual result, rather than losing one of them. Co-authored-by: openhands <openhands@all-hands.dev>

csmith49 · 2026-03-04T21:24:24Z

Heads up, I think the problem this PR is fixing is resolved by #2300. But please double-check, I might be misunderstanding this PR!

xingyaoww · 2026-03-05T11:06:06Z

Thanks @csmith49! I do think #2300 is a cleaner fix. Gonna close this PR in favor of that

xingyaoww commented Mar 2, 2026

View reviewed changes

openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py Outdated Show resolved Hide resolved

xingyaoww added the review-this This label triggers a PR review by OpenHands label Mar 2, 2026

all-hands-bot reviewed Mar 2, 2026

View reviewed changes

enyst reviewed Mar 2, 2026

View reviewed changes

xingyaoww mentioned this pull request Mar 3, 2026

fix: deduplicate observations with same tool_call_id #2114

Closed

Merge branch 'main' into fix-duplicate-tool-result-error

d66d74d

xingyaoww commented Mar 3, 2026

View reviewed changes

openhands-agent added 2 commits March 3, 2026 11:17

csmith49 mentioned this pull request Mar 4, 2026

Bug: Duplicate observations for same tool_call_id after crash recovery #2298

Closed

xingyaoww closed this Mar 5, 2026

Issue	Status	Root Cause
#423	OPEN	User message inserted between `tool_use`/`tool_result` after tab switch
#163	CLOSED	`/resume` events misplaced relative to tool_use/result pairs
#268	CLOSED	Race condition - multiple EventLog instances writing duplicate event indices

Conversation

xingyaoww commented Mar 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Fix

Checklist

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API breakage checks (Griffe)

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Agent server REST API breakage checks (OpenAPI)

Uh oh!

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Solves Real Problem, Implementation Could Be Cleaner

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xingyaoww commented Mar 2, 2026

Uh oh!

openhands-ai bot commented Mar 2, 2026

Uh oh!

openhands-ai bot commented Mar 2, 2026

Summary

Changes Made:

Verification:

Files Modified:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 2, 2026

Choose a reason for hiding this comment

Summary of Changes

Changes Made

Verification Checklist

Key Benefit

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 2, 2026

Choose a reason for hiding this comment

Summary

Findings

Key Distinction

Conclusion

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Mar 3, 2026

Choose a reason for hiding this comment

Summary

What Changed

xingyaoww commented Mar 2, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading