Add configurable file editing toolset support#2077
Conversation
Add support for selecting different file editing tool presets (default, gemini, planning) via --tool-preset argument in integration tests and GitHub Actions workflow. Changes: - Add ToolPresetType and get_tools_for_preset() to tests/integration/base.py - Add tool_preset parameter to BaseIntegrationTest.__init__ - Add --tool-preset argument to tests/integration/run_infer.py - Update all integration tests to use get_tools_for_preset() - Update behavior_helpers.py to support tool presets - Add tool_preset input to integration-runner.yml workflow This enables testing with Gemini-style file editing tools (read_file, write_file, edit, list_directory) instead of the default FileEditorTool. Co-authored-by: openhands <openhands@all-hands.dev>
Adds support for the tool_preset parameter (default, gemini, planning) to the Run Eval workflow, allowing evaluations to be run with different tool presets. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable - Infrastructure is clean but incomplete
The plumbing for configurable tool presets is straightforward, but this PR has fundamental issues with completeness and testing.
[CRITICAL ISSUES]
🔴 Missing Implementation (tests/integration/base.py:28-54)
The diff shows imports from openhands.tools.preset.gemini, .planning, and .default but these modules are not included in the diff. Where are these implementations?
- If they exist elsewhere, they should be in this PR for proper review
- If they don't exist yet, this PR is incomplete
- This violates the "show me the code" principle - we're reviewing infrastructure for something we can't see
🔴 No Error Handling (tests/integration/base.py:28-48)
What happens if someone adds a new preset to ToolPresetType but forgets to create the module? This will throw ImportError at runtime.
At minimum, wrap the imports:
try:
from openhands.tools.preset.gemini import get_gemini_tools
return get_gemini_tools(enable_browser=enable_browser)
except ImportError as e:
raise ValueError(
f"Tool preset '{preset}' is not available. "
f"Make sure openhands.tools.preset.{preset} is installed."
) from e🔴 No Unit Tests (tests/integration/base.py:40)
This is new core functionality with zero test coverage. What validates that:
- Each preset returns the expected tools?
- Invalid presets are handled gracefully?
- The
enable_browserflag works correctly?
Add tests/integration/test_tool_presets.py with tests for each preset.
[IMPROVEMENT OPPORTUNITIES]
🟠 Inconsistent API (tests/integration/base.py:43)
The planning preset comment says "doesn't support browser tools" but the function signature suggests all presets do. This is confusing.
Make the contract clear:
- All presets support
enable_browser(even if ignored), OR - Remove the parameter and let each preset decide, OR
- Validate that presets respect the parameter
Code that says one thing and does another creates maintenance problems.
🟡 Documentation Gap (tests/integration/run_infer.py:459-465)
The CLI help text explains presets, but where should developers look to understand when to use gemini vs planning vs default? Add this to docstrings or README.
🟡 Default Consistency (.github/workflows/integration-runner.yml:34-43)
The workflow and code both default to "default". Consider defining a constant to prevent drift:
DEFAULT_TOOL_PRESET: ToolPresetType = "default"[POSITIVE NOTES]
🟢 Clean Refactoring (tests/integration/tests/t01_fix_simple_typo.py:37)
Good pattern - centralizing tool selection instead of duplicating registration code in every test.
VERDICT
❌ Needs rework: The infrastructure is solid, but this PR is incomplete without:
- The actual preset module implementations (or at least showing they exist)
- Error handling for missing presets
- Unit tests for the preset selection logic
KEY INSIGHT
You're adding a factory pattern for tool selection, which is the right design. But a factory that can fail at runtime with no validation or error handling is a time bomb. Add the safety rails before merging.
|
Update: I found the preset module implementations in This changes my assessment: Revised Verdict: 🟡 Acceptable with required fixes The core implementation is complete. However, the following issues remain critical:
These can be addressed in follow-up fixes, but should be done before this feature is considered production-ready. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable plumbing, but incomplete without preset implementations.
The infrastructure changes are clean and mechanical, but the PR imports modules that aren't shown in the diff. The previous review flagged this - it remains unresolved.
Co-authored-by: Vasco Schiavo <115561717+VascoSch92@users.noreply.github.com>
|
@OpenHands take a look at the unresolved review comments and reflect them if necessary. Once all of the necessary review comments are reflected, mark them as resolved using the graphql API. Then check GitHub CI and make sure CI passes |
|
I'm on it! neubig can track my progress at all-hands.dev |
- Add 'gpt5' option to ToolPresetType and get_tools_for_preset() - Update workflow files (integration-runner.yml, run-eval.yml) to include gpt5 preset - Fix docstring in BaseIntegrationTest to match implementation - Add comprehensive tests for get_tools_for_preset() function Co-authored-by: openhands <openhands@all-hands.dev>
SummaryI have addressed all the unresolved review comments on PR #2077 and pushed the changes (commit Changes Made
Review Threads ResolvedAll 6 unresolved threads were replied to and resolved via GraphQL API:
CI StatusAll critical checks are passing:
Only slow Agent Server Docker builds (amd64) are still pending, which are non-blocking. |
…cutor hang Browser tools (BrowserToolSet) cause integration tests to hang indefinitely when running with ProcessPoolExecutor. The browser cleanup during atexit handlers in worker processes doesn't complete properly, causing the executor to wait forever for workers to exit. This was introduced in PR #2077 which changed tests to use get_tools_for_preset() with enable_browser=True. Previously, tests manually specified only TerminalTool and FileEditorTool without browser tools. Fixes #2124 Co-authored-by: openhands <openhands@all-hands.dev>
Summary
This PR adds support for configurable file editing toolsets, allowing the SDK to use different tool presets for file editing operations. The primary use case is supporting gemini-style file editing tools (read_file, write_file, edit, list_directory) as an alternative to the default FileEditorTool.
Changes
Core SDK Changes
ToolPresetenum with values:default,gemini,planningopenhands.tools.preset.geminimodule with:get_gemini_tools()- Returns gemini-style file editing toolsget_gemini_condenser()- Returns default condenser for gemini presetget_gemini_agent()- Convenience function to create agent with gemini toolsToolPresetTypetype alias for type hintsIntegration Testing Changes
--tool-presetargument to integration test runnerRun Eval Workflow Changes
tool_presetinput parameter to the Run Eval workflowTesting
get_tools_for_preset('gemini')returns the correct gemini tools:terminal,read_file,write_file,edit,list_directory,task_trackerRelated PRs
configurable-tool-presetbranch - Adds--tool-presetargument to run_infer.pyconfigurable-tool-presetbranch - Passes tool_preset through the evaluation workflow@neubig can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:769ba7c-pythonRun
All tags pushed for this build
About Multi-Architecture Support
769ba7c-python) is a multi-arch manifest supporting both amd64 and arm64769ba7c-python-amd64) are also available if needed