fix(sdk): Use concise error message for tool validation errors#2748
fix(sdk): Use concise error message for tool validation errors#2748VascoSch92 merged 3 commits intomainfrom
Conversation
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
When tool validation fails, the error message now includes only
parameter names (not values) to avoid wasting LLM context window
on large payloads like file_editor's old_str/new_str.
Before:
Error validating args {"command": "view", "old_str": "<huge value>"...} for tool 'file_editor': ...
After:
Error validating tool 'file_editor': ... Parameters provided: ['command', 'path', 'old_str']
For unparseable JSON, the message indicates:
Error validating tool 'file_editor': ... Arguments: unparseable JSON
Fixes #2741
Co-authored-by: openhands <openhands@all-hands.dev>
915a95a to
f4a8219
Compare
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste - Clean solution to a real problem
This is solid engineering. You're solving a real problem (LLM context waste on large tool arguments like file_editor's old_str/new_str) with a straightforward solution: show parameter names, not values.
What's Good:
- Pragmatic: Addresses actual context waste we see in production
- Simple: No unnecessary complexity - just extract keys instead of showing full values
- Well-tested: Tests verify actual error message content, not just mocks
- Clean implementation: Clear variable names, appropriate comments, follows repo patterns
Eval-Risk Flag (Required by Repo Policy):
Per the repo's review guidelines, I cannot approve this PR because it changes tool execution error messages visible to the LLM, which falls under the eval-risk category:
"Do NOT submit an APPROVE review when the PR changes agent behavior or anything that could plausibly affect benchmark/evaluation performance. Examples include: prompt templates, tool calling/execution, ..."
While this change should improve LLM self-correction by reducing noise (as you correctly note), changing LLM-facing error messages can affect agent behavior in subtle ways that need empirical validation.
Required Next Step:
Run lightweight evals (e.g., subset of SWE-bench or GAIA) to verify this maintains or improves agent performance. The new error format is cleaner, but we need data to confirm LLMs self-correct as well as or better than before.
VERDICT: ✅ Worth merging after eval validation
KEY INSIGHT: Clean implementation solving a real problem - just needs empirical validation that the concise error format helps LLMs self-correct effectively.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean solution to a real problem. Minimal code, clear intent, solves actual LLM context waste. One minor edge case to fix with empty dict handling. 👍
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
The previous commit introduced broken indentation in the except block, causing a SyntaxError (unmatched ')') that broke all CI checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes #2741
When tool validation fails, the error message now includes only parameter names (not values) to avoid wasting LLM context window on large payloads like
file_editor'sold_str/new_str.Changes
Before
After
For unparseable JSON:
Benefits
Testing
Added two tests:
test_validation_error_shows_keys_not_values- Verifies error shows keys but not large valuestest_unparseable_json_error_message- Verifies unparseable JSON is handled gracefullyThis PR was created by an AI agent (OpenHands) on behalf of the user.
@VascoSch92 can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:20379d9-pythonRun
All tags pushed for this build
About Multi-Architecture Support
20379d9-python) is a multi-arch manifest supporting both amd64 and arm6420379d9-python-amd64) are also available if needed