Skip to content

Add configurable file editing toolset support#2077

Merged
neubig merged 5 commits intomainfrom
configurable-file-editing-toolset
Feb 18, 2026
Merged

Add configurable file editing toolset support#2077
neubig merged 5 commits intomainfrom
configurable-file-editing-toolset

Conversation

@neubig
Copy link
Copy Markdown
Contributor

@neubig neubig commented Feb 14, 2026

Summary

This PR adds support for configurable file editing toolsets, allowing the SDK to use different tool presets for file editing operations. The primary use case is supporting gemini-style file editing tools (read_file, write_file, edit, list_directory) as an alternative to the default FileEditorTool.

Changes

Core SDK Changes

  • Added ToolPreset enum with values: default, gemini, planning
  • Added openhands.tools.preset.gemini module with:
    • get_gemini_tools() - Returns gemini-style file editing tools
    • get_gemini_condenser() - Returns default condenser for gemini preset
    • get_gemini_agent() - Convenience function to create agent with gemini tools
  • Added ToolPresetType type alias for type hints

Integration Testing Changes

  • Added --tool-preset argument to integration test runner
  • Integration tests can now be run with different tool presets via workflow dispatch

Run Eval Workflow Changes

  • Added tool_preset input parameter to the Run Eval workflow
  • The parameter is passed through to the evaluation job

Testing

  • Integration tests have been triggered with the gemini toolset to verify the implementation works correctly (workflow run #22016547199)
  • Local verification confirmed that get_tools_for_preset('gemini') returns the correct gemini tools: terminal, read_file, write_file, edit, list_directory, task_tracker

Related PRs

  • OpenHands/benchmarks: configurable-tool-preset branch - Adds --tool-preset argument to run_infer.py
  • OpenHands/evaluation: configurable-tool-preset branch - Passes tool_preset through the evaluation workflow

@neubig can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:769ba7c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-769ba7c-python \
  ghcr.io/openhands/agent-server:769ba7c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:769ba7c-golang-amd64
ghcr.io/openhands/agent-server:769ba7c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:769ba7c-golang-arm64
ghcr.io/openhands/agent-server:769ba7c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:769ba7c-java-amd64
ghcr.io/openhands/agent-server:769ba7c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:769ba7c-java-arm64
ghcr.io/openhands/agent-server:769ba7c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:769ba7c-python-amd64
ghcr.io/openhands/agent-server:769ba7c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:769ba7c-python-arm64
ghcr.io/openhands/agent-server:769ba7c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:769ba7c-golang
ghcr.io/openhands/agent-server:769ba7c-java
ghcr.io/openhands/agent-server:769ba7c-python

About Multi-Architecture Support

  • Each variant tag (e.g., 769ba7c-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 769ba7c-python-amd64) are also available if needed

Add support for selecting different file editing tool presets (default,
gemini, planning) via --tool-preset argument in integration tests and
GitHub Actions workflow.

Changes:
- Add ToolPresetType and get_tools_for_preset() to tests/integration/base.py
- Add tool_preset parameter to BaseIntegrationTest.__init__
- Add --tool-preset argument to tests/integration/run_infer.py
- Update all integration tests to use get_tools_for_preset()
- Update behavior_helpers.py to support tool presets
- Add tool_preset input to integration-runner.yml workflow

This enables testing with Gemini-style file editing tools (read_file,
write_file, edit, list_directory) instead of the default FileEditorTool.

Co-authored-by: openhands <openhands@all-hands.dev>
Adds support for the tool_preset parameter (default, gemini, planning) to
the Run Eval workflow, allowing evaluations to be run with different tool
presets.

Co-authored-by: openhands <openhands@all-hands.dev>
@neubig neubig added the review-this This label triggers a PR review by OpenHands label Feb 15, 2026 — with OpenHands AI
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable - Infrastructure is clean but incomplete

The plumbing for configurable tool presets is straightforward, but this PR has fundamental issues with completeness and testing.


[CRITICAL ISSUES]

🔴 Missing Implementation (tests/integration/base.py:28-54)
The diff shows imports from openhands.tools.preset.gemini, .planning, and .default but these modules are not included in the diff. Where are these implementations?

  • If they exist elsewhere, they should be in this PR for proper review
  • If they don't exist yet, this PR is incomplete
  • This violates the "show me the code" principle - we're reviewing infrastructure for something we can't see

🔴 No Error Handling (tests/integration/base.py:28-48)
What happens if someone adds a new preset to ToolPresetType but forgets to create the module? This will throw ImportError at runtime.

At minimum, wrap the imports:

try:
    from openhands.tools.preset.gemini import get_gemini_tools
    return get_gemini_tools(enable_browser=enable_browser)
except ImportError as e:
    raise ValueError(
        f"Tool preset '{preset}' is not available. "
        f"Make sure openhands.tools.preset.{preset} is installed."
    ) from e

🔴 No Unit Tests (tests/integration/base.py:40)
This is new core functionality with zero test coverage. What validates that:

  • Each preset returns the expected tools?
  • Invalid presets are handled gracefully?
  • The enable_browser flag works correctly?

Add tests/integration/test_tool_presets.py with tests for each preset.


[IMPROVEMENT OPPORTUNITIES]

🟠 Inconsistent API (tests/integration/base.py:43)
The planning preset comment says "doesn't support browser tools" but the function signature suggests all presets do. This is confusing.

Make the contract clear:

  1. All presets support enable_browser (even if ignored), OR
  2. Remove the parameter and let each preset decide, OR
  3. Validate that presets respect the parameter

Code that says one thing and does another creates maintenance problems.

🟡 Documentation Gap (tests/integration/run_infer.py:459-465)
The CLI help text explains presets, but where should developers look to understand when to use gemini vs planning vs default? Add this to docstrings or README.

🟡 Default Consistency (.github/workflows/integration-runner.yml:34-43)
The workflow and code both default to "default". Consider defining a constant to prevent drift:

DEFAULT_TOOL_PRESET: ToolPresetType = "default"

[POSITIVE NOTES]

🟢 Clean Refactoring (tests/integration/tests/t01_fix_simple_typo.py:37)
Good pattern - centralizing tool selection instead of duplicating registration code in every test.


VERDICT

Needs rework: The infrastructure is solid, but this PR is incomplete without:

  1. The actual preset module implementations (or at least showing they exist)
  2. Error handling for missing presets
  3. Unit tests for the preset selection logic

KEY INSIGHT

You're adding a factory pattern for tool selection, which is the right design. But a factory that can fail at runtime with no validation or error handling is a time bomb. Add the safety rails before merging.

@all-hands-bot
Copy link
Copy Markdown
Collaborator

Update: I found the preset module implementations in openhands-tools/openhands/tools/preset/ - they exist but weren't included in the diff provided for review.

This changes my assessment:

Revised Verdict: 🟡 Acceptable with required fixes

The core implementation is complete. However, the following issues remain critical:

  1. Error handling: get_tools_for_preset() needs try-except to handle import failures gracefully
  2. Unit tests: The preset selection logic needs test coverage
  3. API consistency: Clarify whether all presets must respect enable_browser

These can be addressed in follow-up fixes, but should be done before this feature is considered production-ready.

@neubig neubig marked this pull request as ready for review February 17, 2026 04:19
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taste Rating: 🟡 Acceptable plumbing, but incomplete without preset implementations.

The infrastructure changes are clean and mechanical, but the PR imports modules that aren't shown in the diff. The previous review flagged this - it remains unresolved.

Co-authored-by: Vasco Schiavo <115561717+VascoSch92@users.noreply.github.com>
@neubig
Copy link
Copy Markdown
Contributor Author

neubig commented Feb 17, 2026

@OpenHands take a look at the unresolved review comments and reflect them if necessary. Once all of the necessary review comments are reflected, mark them as resolved using the graphql API. Then check GitHub CI and make sure CI passes

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 17, 2026

I'm on it! neubig can track my progress at all-hands.dev

- Add 'gpt5' option to ToolPresetType and get_tools_for_preset()
- Update workflow files (integration-runner.yml, run-eval.yml) to include gpt5 preset
- Fix docstring in BaseIntegrationTest to match implementation
- Add comprehensive tests for get_tools_for_preset() function

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Feb 17, 2026

Summary

I have addressed all the unresolved review comments on PR #2077 and pushed the changes (commit e9b42660).

Changes Made

  1. Added GPT-5 apply_patch preset (addressing @enyst's question):

    • Added gpt5 to ToolPresetType literal type
    • Added gpt5 case in get_tools_for_preset() function
    • Updated integration-runner.yml and run-eval.yml workflows with the gpt5 option
  2. Fixed docstring (addressing bot comment):

    • Updated BaseIntegrationTest docstring to correctly reflect that presets are passed via constructor parameter
  3. Added unit tests (addressing bot comment):

    • Created tests/integration/test_tool_presets.py with 7 tests covering all presets and edge cases

Review Threads Resolved

All 6 unresolved threads were replied to and resolved via GraphQL API:

  • ✅ Missing implementations → Modules exist in openhands-tools/openhands/tools/preset/
  • ✅ Error handling → Already addressed with match/case + ValueError
  • ✅ Planning preset → Intentionally read-only, no browser support
  • ✅ Missing tests → Added comprehensive tests
  • ✅ Docstring mismatch → Fixed
  • ✅ GPT-5 apply_patch tool → Added gpt5 preset

CI Status

All critical checks are passing:

  • ✅ Pre-commit checks
  • ✅ All test suites (sdk, tools, agent-server, cross)
  • ✅ Review Thread Gate
  • ✅ API breakage checks
  • ✅ Deprecation deadlines

Only slow Agent Server Docker builds (amd64) are still pending, which are non-blocking.

@neubig neubig merged commit a2b442e into main Feb 18, 2026
37 of 38 checks passed
@neubig neubig deleted the configurable-file-editing-toolset branch February 18, 2026 02:45
neubig pushed a commit that referenced this pull request Feb 20, 2026
…cutor hang

Browser tools (BrowserToolSet) cause integration tests to hang indefinitely
when running with ProcessPoolExecutor. The browser cleanup during atexit
handlers in worker processes doesn't complete properly, causing the executor
to wait forever for workers to exit.

This was introduced in PR #2077 which changed tests to use get_tools_for_preset()
with enable_browser=True. Previously, tests manually specified only TerminalTool
and FileEditorTool without browser tools.

Fixes #2124

Co-authored-by: openhands <openhands@all-hands.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review-this This label triggers a PR review by OpenHands

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants