Fix duplicate tool_result error from Anthropic API#2256
Fix duplicate tool_result error from Anthropic API#2256
Conversation
Add ToolResultUniquenessProperty to ensure each tool_call_id has exactly one tool result in the LLM message context. The bug occurs when: 1. An agent invokes a tool (creating ActionEvent with tool_call_id) 2. Runtime restarts while tool is running 3. On restart, AgentErrorEvent is created for 'unmatched' action 4. Tool completes and creates ObservationEvent with same tool_call_id 5. Two tool_results for same tool_call_id violates Anthropic API The fix adds a new view property that deduplicates tool results: - Groups observations by tool_call_id - Keeps ObservationEvent over AgentErrorEvent when both exist - If multiple of same type, keeps the later one Co-authored-by: openhands <openhands@all-hands.dev>
API breakage checks (Griffe)Result: Failed Log excerpt (first 1000 characters) |
Agent server REST API breakage checks (OpenAPI)Result: Passed |
openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
…ing one When both AgentErrorEvent and ObservationEvent exist for the same tool_call_id (typically from a restart scenario), the error context is now merged into the observation content. This ensures the LLM has full context about what happened during tool execution. Changes: - Added transform() method to ViewPropertyBase for event transformations - Implemented transform() in ToolResultUniquenessProperty to merge errors - Updated View.enforce_properties() to apply transforms before enforcement - Updated tests to verify merging behavior Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solves Real Problem, Implementation Could Be Cleaner
Taste Rating: The core approach is sound and solves a real production issue (Anthropic API duplicate tool_result errors). Tests are comprehensive. However, there are performance and type safety issues worth addressing.
Key Insight: This is the right architectural layer to handle this problem - the view layer already enforces properties, and the restart race condition is hard to prevent upstream. The solution pragmatically merges error context into observations rather than losing information.
See inline comments for specific issues.
openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
openhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
tests/sdk/context/view/properties/test_tool_result_uniqueness.py
Outdated
Show resolved
Hide resolved
|
@OpenHands pls fix all review comments here #2256 (review) |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
- Fix type safety: Use list[TextContent | ImageContent] instead of list[TextContent] with type ignore - Use model_copy() instead of model_dump() for observation merging - Add UUID suffix to merged event IDs to prevent potential collisions - Extract _group_observations_by_tool_call() helper to eliminate duplicate iteration between transform() and enforce() - Replace unittest mocks with real ObservationEvent instances in tests Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI have successfully addressed all 5 review comments on PR #2256. The changes have been committed (a50cbf6) and pushed to the Changes Made:
Verification:
Files Modified:
|
| # Create new ObservationEvent with a unique ID | ||
| # ID format: "{original_id}-merged-{uuid}" to ensure uniqueness | ||
| return ObservationEvent( | ||
| id=f"{obs_event.id}-merged-{uuid.uuid4().hex[:8]}", |
There was a problem hiding this comment.
🤔 I’m a bit uncertain about something like this, it will be at the end, which could mean ~far from the action
Then I wonder if maybe we risk running into another Anthropic LLM API error: a tool result has to follow the tool call (nothing between them)
There was a problem hiding this comment.
@OpenHands investigate and tell me if it is actually the case? What if we add a constraint that we only merge two consecutive error events and the observation event? If these two events are separate by a different event,then in that case,we should,even if they have the exact same tool call ID, we should still let them fail as is.
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary of Changes
I investigated the reviewer's concern and confirmed it was valid. The Anthropic API requires that tool_result must follow tool_use without other messages between them. The original implementation could potentially hide bugs or create invalid sequences by handling non-consecutive duplicates.
Changes Made
Implementation (tool_result_uniqueness.py):
- Renamed
_group_observations_by_tool_call()→_group_consecutive_observations_by_tool_call() - New function only groups observations that appear consecutively in the event list
- Non-consecutive duplicates with the same
tool_call_idare NOT merged or deduplicated - Updated docstrings to clearly document the consecutive-only behavior
Tests (test_tool_result_uniqueness.py):
- Added
TestGroupConsecutiveObservationsclass with 5 tests for the new grouping function - Added explicit tests for non-consecutive behavior (ensuring they're NOT handled)
- Updated existing tests to clarify they work with consecutive events
Verification Checklist
- ✅ Pre-commit hooks pass on both modified files
- ✅ All 23 tests pass
- ✅ Changes directly address reviewer's request for consecutive-only handling
- ✅ No extraneous changes
- ✅ Committed and pushed to
fix-duplicate-tool-result-errorbranch
Key Benefit
This safer approach ensures:
- Normal restart scenarios (consecutive error + observation) are still handled
- Abnormal scenarios (non-consecutive duplicates) fail at the API level, exposing underlying bugs rather than hiding them
There was a problem hiding this comment.
@xingyaoww You may want to take a look at this, where there were a lot of events between them: #1782 (comment)
There was a problem hiding this comment.
@enyst that one feels like an edge case, likely a CLI bug where there is two parallel processes writing to the same eventstream, which weve encountered at certain point.
i do think we can fix the consecutive case in this PR first and deal with that edge case when it comes up again, wdyt?
There was a problem hiding this comment.
@OpenHands can you browse OpenHands/OpenHands-CLI issue history and see if theres any report similar to this issue: #2256 (comment)
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary
I investigated the OpenHands/OpenHands-CLI issue history as requested to find reports similar to the duplicate tool_result error being fixed in PR #2256.
Findings
I found 3 related issues with similar Anthropic API errors:
| Issue | Status | Root Cause |
|---|---|---|
| #423 | OPEN | User message inserted between tool_use/tool_result after tab switch |
| #163 | CLOSED | /resume events misplaced relative to tool_use/result pairs |
| #268 | CLOSED | Race condition - multiple EventLog instances writing duplicate event indices |
Key Distinction
The CLI issues are about different bugs:
- Adjacency violations - messages inserted between
tool_useandtool_result - Race conditions - parallel processes writing to the same eventstream
PR #2256 addresses a different case: When there are two tool_result events for the same tool_call_id (e.g., AgentErrorEvent from runtime restart + actual ObservationEvent).
Conclusion
The current PR's approach of only handling consecutive duplicate tool results is appropriate because:
- Non-consecutive duplicates indicate different bugs that should fail loudly, not be silently merged
- The runtime restart scenario (the target of this PR) naturally produces consecutive events
- The edge case mentioned in issue Bug: Duplicate ObservationEvent with same tool_call_id causes LLM API error on conversation resume #1782 (many events between duplicates) appears to be a CLI bug where parallel processes wrote to the same eventstream - this is a separate issue that needs a different fix
No code changes were made - this was purely an investigation task to answer the reviewer's question.
There was a problem hiding this comment.
Sure, we can deal with consecutive ones in this PR.
Just for clarity, though: issue 1782 is via Cloud, and it contains repeated cases (I had it too)
This addresses reviewer feedback about non-consecutive duplicates. The property now only handles duplicate tool_call_id observations when they are consecutive in the event list. If two observations with the same tool_call_id are separated by other events, they are NOT merged or deduplicated - the underlying bug should be exposed via an API error rather than hidden. This is a safer approach because: 1. The normal restart scenario (error then observation) creates consecutive events 2. Non-consecutive duplicates likely indicate a more serious bug that should fail 3. It avoids potentially creating invalid message sequences Co-authored-by: openhands <openhands@all-hands.dev>
| the current view to enforce the property. | ||
| """ | ||
|
|
||
| def transform( |
There was a problem hiding this comment.
@OpenHands review this PR, is there a way to do this WITHOUT adding this new transform fucntion?
I remember there's a util function that transform the full list of events to LLM messages, could we do it there?
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary
I addressed the review comment by simplifying the implementation as suggested. The changes have already been pushed (commit b7b1e6b3).
What Changed
Instead of adding a new transform() method to ViewPropertyBase and a new ToolResultUniquenessProperty class, I moved the duplicate observation deduplication directly into the existing events_to_messages() function.
Files Changed
-
openhands-sdk/openhands/sdk/event/base.py- Added logic toevents_to_messages()to deduplicate consecutive observations with the sametool_call_id, plus a helper function_select_best_observation() -
tests/sdk/event/test_events_to_messages.py- AddedTestDuplicateObservationDeduplicationtest class with 7 tests covering the deduplication scenarios -
Reverted/Removed:
openhands-sdk/openhands/sdk/context/view/properties/base.py- reverted (notransform()method)openhands-sdk/openhands/sdk/context/view/properties/__init__.py- revertedopenhands-sdk/openhands/sdk/context/view/view.py- revertedopenhands-sdk/openhands/sdk/context/view/properties/tool_result_uniqueness.py- deletedtests/sdk/context/view/properties/test_tool_result_uniqueness.py- deleted
Why This Approach Is Better
- Existing pattern:
events_to_messages()already batchesActionEvents with the samellm_response_id- same pattern now handles observation deduplication - No new abstraction: No
transform()method added toViewPropertyBase - Right layer: Deduplication at message conversion is where the Anthropic API constraint matters
- Net reduction: ~484 fewer lines of code (315 added, 799 removed)
Instead of adding a new transform() method to ViewPropertyBase and a new ToolResultUniquenessProperty class, handle the deduplication of consecutive observations with the same tool_call_id directly in the events_to_messages() function. This is simpler because: - events_to_messages already handles a similar case (batching ActionEvents with the same llm_response_id) - No new abstraction needed in the view property system - The deduplication happens at message conversion time, which is where the constraint matters (Anthropic API message format) The fix still handles the original bug: when a runtime restart creates an AgentErrorEvent for an "unmatched" action, but the tool then completes and creates an ObservationEvent with the same tool_call_id, only the ObservationEvent is kept (it has the actual result). Co-authored-by: openhands <openhands@all-hands.dev>
| import uuid | ||
| from abc import ABC, abstractmethod | ||
| from datetime import datetime | ||
| from logging import getLogger |
There was a problem hiding this comment.
@OpenHands you should use logger from SDK if applicable- please /remember it in AGENTS.md. in this cae, i don't think we need loggers in this file.
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary
I've addressed the review comment requesting removal of the logger from base.py and documenting the SDK logger preference. Here's what was done:
Changes Made
1. openhands-sdk/openhands/sdk/event/base.py - Removed unnecessary logger:
- Removed
from logging import getLoggerimport - Removed
logger = getLogger(__name__)instance - Removed the
logger.warning()call when deduplicating observations (the deduplication logic still works, just without logging)
2. AGENTS.md - Added SDK logger guidance:
- Added new bullet point in the
<CODE>section: "When logging is needed, use the SDK's logger fromopenhands.sdk.loggerinstead oflogging.getLogger(__name__)."
Checklist
- ✅ Removed logger import and usage from
base.py - ✅ Added note to
AGENTS.mdabout using SDK logger (openhands.sdk.logger) - ✅ Pre-commit checks passed on modified files
- ✅ Changes committed with proper co-author attribution
- ✅ Changes pushed to
fix-duplicate-tool-result-errorbranch
The changes are minimal and directly address the reviewer's feedback without any extraneous modifications.
| # Select the best observation to use | ||
| if len(batch_observations) > 1: | ||
| logger.warning( | ||
| f"Found {len(batch_observations)} consecutive observations " | ||
| f"for tool_call_id={tool_call_id}. Deduplicating." | ||
| ) | ||
| selected = _select_best_observation(batch_observations) |
There was a problem hiding this comment.
We should NOT select the best, instead of should just: convert all to llm messages, and then merge them together (ie merge the .content field)
There was a problem hiding this comment.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Summary
I addressed the PR review comment from @xingyaoww which requested changing from "selecting the best" observation to "merging all observations together by combining their content fields."
Changes Made:
1. openhands-sdk/openhands/sdk/event/base.py:
- Replaced
_select_best_observation()with_merge_observation_messages() - The new function converts all consecutive observations with the same
tool_call_idto LLM messages, then combines their.contentfields into a singleMessage - Updated docstring to reflect merge behavior
2. tests/sdk/event/test_events_to_messages.py:
- Updated all test assertions to check that merged content contains information from ALL observations (not just the "best" one)
- Renamed test class from
TestDuplicateObservationDeduplication→TestDuplicateObservationMerging - Renamed test methods to reflect merge behavior (e.g.,
test_consecutive_error_and_observation_merges_content)
Checklist:
- ✅ Changed from selection to merge approach as requested
- ✅ All 17 tests pass
- ✅ Pre-commit checks pass (ruff, pyright, etc.)
- ✅ Changes are concise and focused only on the requested modification
- ✅ Committed and pushed to
fix-duplicate-tool-result-errorbranch
Commit:
b99b0bd2 - "Merge duplicate observation content instead of selecting best"
…ence - Removed logging import and logger usage from event/base.py - Added note to AGENTS.md about using SDK logger (openhands.sdk.logger) instead of logging.getLogger Co-authored-by: openhands <openhands@all-hands.dev>
Changed the approach for handling multiple observations with the same tool_call_id from selecting the "best" one to merging all of them. Now when consecutive observations share a tool_call_id (e.g., when a runtime restart creates an error but the tool also completes), their content is combined into a single tool result message. This ensures the LLM receives all relevant information from both the error and the actual result, rather than losing one of them. Co-authored-by: openhands <openhands@all-hands.dev>
|
Heads up, I think the problem this PR is fixing is resolved by #2300. But please double-check, I might be misunderstanding this PR! |
Summary
Fixes the
"each tool_use must have a single result"error from Anthropic API that occurs when multiple observations exist for the sametool_call_id.Root Cause
The bug occurs when:
ActionEventwith atool_call_id)event_service.start()creates anAgentErrorEventfor the "unmatched" actionObservationEventwith the sametool_call_idtool_call_id, which violates Anthropic's API contractFix
Added a new view property
ToolResultUniquenessPropertythat handles duplicatetool_call_idsituations at the message conversion layer:ObservationBaseEvents by theirtool_call_idtool_call_idwith multiple observations, keeps only one based on priority:ObservationEvent(actual tool result) is preferred overAgentErrorEventThis ensures that when the events are converted to LLM messages, each
tool_usehas exactly onetool_result, satisfying the API contract.Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:287dc1b-pythonRun
All tags pushed for this build
About Multi-Architecture Support
287dc1b-python) is a multi-arch manifest supporting both amd64 and arm64287dc1b-python-amd64) are also available if needed