Add configurable file editing toolset support by neubig · Pull Request #2077 · OpenHands/software-agent-sdk

neubig · 2026-02-14T15:16:30Z

Summary

This PR adds support for configurable file editing toolsets, allowing the SDK to use different tool presets for file editing operations. The primary use case is supporting gemini-style file editing tools (read_file, write_file, edit, list_directory) as an alternative to the default FileEditorTool.

Changes

Core SDK Changes

Added ToolPreset enum with values: default, gemini, planning
Added openhands.tools.preset.gemini module with:
- get_gemini_tools() - Returns gemini-style file editing tools
- get_gemini_condenser() - Returns default condenser for gemini preset
- get_gemini_agent() - Convenience function to create agent with gemini tools
Added ToolPresetType type alias for type hints

Integration Testing Changes

Added --tool-preset argument to integration test runner
Integration tests can now be run with different tool presets via workflow dispatch

Run Eval Workflow Changes

Added tool_preset input parameter to the Run Eval workflow
The parameter is passed through to the evaluation job

Testing

Integration tests have been triggered with the gemini toolset to verify the implementation works correctly (workflow run #22016547199)
Local verification confirmed that get_tools_for_preset('gemini') returns the correct gemini tools: terminal, read_file, write_file, edit, list_directory, task_tracker

Related PRs

OpenHands/benchmarks: configurable-tool-preset branch - Adds --tool-preset argument to run_infer.py
OpenHands/evaluation: configurable-tool-preset branch - Passes tool_preset through the evaluation workflow

@neubig can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:769ba7c-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-769ba7c-python \
  ghcr.io/openhands/agent-server:769ba7c-python

All tags pushed for this build

ghcr.io/openhands/agent-server:769ba7c-golang-amd64
ghcr.io/openhands/agent-server:769ba7c-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:769ba7c-golang-arm64
ghcr.io/openhands/agent-server:769ba7c-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:769ba7c-java-amd64
ghcr.io/openhands/agent-server:769ba7c-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:769ba7c-java-arm64
ghcr.io/openhands/agent-server:769ba7c-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:769ba7c-python-amd64
ghcr.io/openhands/agent-server:769ba7c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:769ba7c-python-arm64
ghcr.io/openhands/agent-server:769ba7c-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:769ba7c-golang
ghcr.io/openhands/agent-server:769ba7c-java
ghcr.io/openhands/agent-server:769ba7c-python

About Multi-Architecture Support

Each variant tag (e.g., 769ba7c-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 769ba7c-python-amd64) are also available if needed

Add support for selecting different file editing tool presets (default, gemini, planning) via --tool-preset argument in integration tests and GitHub Actions workflow. Changes: - Add ToolPresetType and get_tools_for_preset() to tests/integration/base.py - Add tool_preset parameter to BaseIntegrationTest.__init__ - Add --tool-preset argument to tests/integration/run_infer.py - Update all integration tests to use get_tools_for_preset() - Update behavior_helpers.py to support tool presets - Add tool_preset input to integration-runner.yml workflow This enables testing with Gemini-style file editing tools (read_file, write_file, edit, list_directory) instead of the default FileEditorTool. Co-authored-by: openhands <openhands@all-hands.dev>

Adds support for the tool_preset parameter (default, gemini, planning) to the Run Eval workflow, allowing evaluations to be run with different tool presets. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

Taste Rating: 🟡 Acceptable - Infrastructure is clean but incomplete

The plumbing for configurable tool presets is straightforward, but this PR has fundamental issues with completeness and testing.

[CRITICAL ISSUES]

🔴 Missing Implementation (tests/integration/base.py:28-54)
The diff shows imports from openhands.tools.preset.gemini, .planning, and .default but these modules are not included in the diff. Where are these implementations?

If they exist elsewhere, they should be in this PR for proper review
If they don't exist yet, this PR is incomplete
This violates the "show me the code" principle - we're reviewing infrastructure for something we can't see

🔴 No Error Handling (tests/integration/base.py:28-48)
What happens if someone adds a new preset to ToolPresetType but forgets to create the module? This will throw ImportError at runtime.

At minimum, wrap the imports:

try:
    from openhands.tools.preset.gemini import get_gemini_tools
    return get_gemini_tools(enable_browser=enable_browser)
except ImportError as e:
    raise ValueError(
        f"Tool preset '{preset}' is not available. "
        f"Make sure openhands.tools.preset.{preset} is installed."
    ) from e

🔴 No Unit Tests (tests/integration/base.py:40)
This is new core functionality with zero test coverage. What validates that:

Each preset returns the expected tools?
Invalid presets are handled gracefully?
The enable_browser flag works correctly?

Add tests/integration/test_tool_presets.py with tests for each preset.

[IMPROVEMENT OPPORTUNITIES]

🟠 Inconsistent API (tests/integration/base.py:43)
The planning preset comment says "doesn't support browser tools" but the function signature suggests all presets do. This is confusing.

Make the contract clear:

All presets support enable_browser (even if ignored), OR
Remove the parameter and let each preset decide, OR
Validate that presets respect the parameter

Code that says one thing and does another creates maintenance problems.

🟡 Documentation Gap (tests/integration/run_infer.py:459-465)
The CLI help text explains presets, but where should developers look to understand when to use gemini vs planning vs default? Add this to docstrings or README.

🟡 Default Consistency (.github/workflows/integration-runner.yml:34-43)
The workflow and code both default to "default". Consider defining a constant to prevent drift:

DEFAULT_TOOL_PRESET: ToolPresetType = "default"

[POSITIVE NOTES]

🟢 Clean Refactoring (tests/integration/tests/t01_fix_simple_typo.py:37)
Good pattern - centralizing tool selection instead of duplicating registration code in every test.

VERDICT

❌ Needs rework: The infrastructure is solid, but this PR is incomplete without:

The actual preset module implementations (or at least showing they exist)
Error handling for missing presets
Unit tests for the preset selection logic

KEY INSIGHT

You're adding a factory pattern for tool selection, which is the right design. But a factory that can fail at runtime with no validation or error handling is a time bomb. Add the safety rails before merging.

all-hands-bot · 2026-02-15T23:25:04Z

Update: I found the preset module implementations in openhands-tools/openhands/tools/preset/ - they exist but weren't included in the diff provided for review.

This changes my assessment:

Revised Verdict: 🟡 Acceptable with required fixes

The core implementation is complete. However, the following issues remain critical:

Error handling: get_tools_for_preset() needs try-except to handle import failures gracefully
Unit tests: The preset selection logic needs test coverage
API consistency: Clarify whether all presets must respect enable_browser

These can be addressed in follow-up fixes, but should be done before this feature is considered production-ready.

all-hands-bot

Taste Rating: 🟡 Acceptable plumbing, but incomplete without preset implementations.

The infrastructure changes are clean and mechanical, but the PR imports modules that aren't shown in the diff. The previous review flagged this - it remains unresolved.

tests/integration/base.py

tests/integration/run_infer.py

tests/integration/base.py

.github/workflows/integration-runner.yml

tests/integration/base.py

Co-authored-by: Vasco Schiavo <115561717+VascoSch92@users.noreply.github.com>

neubig · 2026-02-17T12:43:15Z

@OpenHands take a look at the unresolved review comments and reflect them if necessary. Once all of the necessary review comments are reflected, mark them as resolved using the graphql API. Then check GitHub CI and make sure CI passes

openhands-ai · 2026-02-17T12:43:35Z

I'm on it! neubig can track my progress at all-hands.dev

- Add 'gpt5' option to ToolPresetType and get_tools_for_preset() - Update workflow files (integration-runner.yml, run-eval.yml) to include gpt5 preset - Fix docstring in BaseIntegrationTest to match implementation - Add comprehensive tests for get_tools_for_preset() function Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-02-17T12:53:09Z

Summary

I have addressed all the unresolved review comments on PR #2077 and pushed the changes (commit e9b42660).

Changes Made

Added GPT-5 apply_patch preset (addressing @enyst's question):
- Added gpt5 to ToolPresetType literal type
- Added gpt5 case in get_tools_for_preset() function
- Updated integration-runner.yml and run-eval.yml workflows with the gpt5 option
Fixed docstring (addressing bot comment):
- Updated BaseIntegrationTest docstring to correctly reflect that presets are passed via constructor parameter
Added unit tests (addressing bot comment):
- Created tests/integration/test_tool_presets.py with 7 tests covering all presets and edge cases

Review Threads Resolved

All 6 unresolved threads were replied to and resolved via GraphQL API:

✅ Missing implementations → Modules exist in openhands-tools/openhands/tools/preset/
✅ Error handling → Already addressed with match/case + ValueError
✅ Planning preset → Intentionally read-only, no browser support
✅ Missing tests → Added comprehensive tests
✅ Docstring mismatch → Fixed
✅ GPT-5 apply_patch tool → Added gpt5 preset

CI Status

All critical checks are passing:

✅ Pre-commit checks
✅ All test suites (sdk, tools, agent-server, cross)
✅ Review Thread Gate
✅ API breakage checks
✅ Deprecation deadlines

Only slow Agent Server Docker builds (amd64) are still pending, which are non-blocking.

…cutor hang Browser tools (BrowserToolSet) cause integration tests to hang indefinitely when running with ProcessPoolExecutor. The browser cleanup during atexit handlers in worker processes doesn't complete properly, causing the executor to wait forever for workers to exit. This was introduced in PR #2077 which changed tests to use get_tools_for_preset() with enable_browser=True. Previously, tests manually specified only TerminalTool and FileEditorTool without browser tools. Fixes #2124 Co-authored-by: openhands <openhands@all-hands.dev>

openhands-agent added 2 commits February 14, 2026 01:53

Add tool_preset parameter to run-eval workflow

c445cd3

Adds support for the tool_preset parameter (default, gemini, planning) to the Run Eval workflow, allowing evaluations to be run with different tool presets. Co-authored-by: openhands <openhands@all-hands.dev>

neubig added the review-this This label triggers a PR review by OpenHands label Feb 15, 2026 — with OpenHands AI

all-hands-bot reviewed Feb 15, 2026

View reviewed changes

neubig marked this pull request as ready for review February 17, 2026 04:19

Merge branch 'main' into configurable-file-editing-toolset

0cde94c

all-hands-bot reviewed Feb 17, 2026

View reviewed changes

enyst reviewed Feb 17, 2026

View reviewed changes

.github/workflows/integration-runner.yml Show resolved Hide resolved

VascoSch92 approved these changes Feb 17, 2026

View reviewed changes

tests/integration/base.py Outdated Show resolved Hide resolved

Update tests/integration/base.py

4f5aba4

Co-authored-by: Vasco Schiavo <115561717+VascoSch92@users.noreply.github.com>

neubig merged commit a2b442e into main Feb 18, 2026
37 of 38 checks passed

neubig deleted the configurable-file-editing-toolset branch February 18, 2026 02:45

openhands-ai bot mentioned this pull request Feb 18, 2026

fix: Move litellm install before model loading and use correct API key #2118

Merged

5 tasks

neubig mentioned this pull request Feb 20, 2026

fix: Disable browser tools in integration tests to fix ProcessPoolExecutor hang #2149

Merged

openhands-ai bot mentioned this pull request Feb 20, 2026

Update integration tests to use claude-sonnet-4-6 #2113

Merged

5 tasks

openhands-ai bot mentioned this pull request Mar 4, 2026

Integration tests error Invalid choice: 'gpt5' (choose from default, gemini, planning) #2305

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable file editing toolset support#2077

Add configurable file editing toolset support#2077
neubig merged 5 commits intomainfrom
configurable-file-editing-toolset

neubig commented Feb 14, 2026 •

edited by github-actions bot

Loading

Uh oh!

all-hands-bot left a comment

Uh oh!

all-hands-bot commented Feb 15, 2026

Uh oh!

all-hands-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neubig commented Feb 17, 2026

Uh oh!

openhands-ai bot commented Feb 17, 2026

Uh oh!

openhands-ai bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

neubig commented Feb 14, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core SDK Changes

Integration Testing Changes

Run Eval Workflow Changes

Testing

Related PRs

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

[CRITICAL ISSUES]

[IMPROVEMENT OPPORTUNITIES]

[POSITIVE NOTES]

VERDICT

KEY INSIGHT

Uh oh!

all-hands-bot commented Feb 15, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neubig commented Feb 17, 2026

Uh oh!

openhands-ai bot commented Feb 17, 2026

Uh oh!

openhands-ai bot commented Feb 17, 2026

Summary

Changes Made

Review Threads Resolved

CI Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

neubig commented Feb 14, 2026 •

edited by github-actions bot

Loading