-
Notifications
You must be signed in to change notification settings - Fork 216
fix(tools): merge subagents metrics (TaskToolSet) #2222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 7 commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
7037d55
add built-in agents
VascoSch92 d833cce
add model key
VascoSch92 ac534fd
update
VascoSch92 a9d8e83
test: streamline builtin agent coverage
openhands-agent 9d18814
Merge branch 'main' into vasco/issue-2051
VascoSch92 89d1c8d
fix example and add logging
VascoSch92 2a73f53
fix bug with tokens
VascoSch92 eaef838
Revert "test: streamline builtin agent coverage"
VascoSch92 2c1045f
Revert "update"
VascoSch92 94a3f9a
Revert "add model key"
VascoSch92 cf45d8a
Revert "add built-in agents"
VascoSch92 81c6030
revert
VascoSch92 d16996e
Merge branch 'main' into vasco/issue-2180-task-tool
VascoSch92 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| --- | ||
| name: bash | ||
| model: inherit | ||
| description: >- | ||
| Command execution specialist (terminal only). | ||
| <example>Run a shell command</example> | ||
| <example>Execute a build or test script</example> | ||
| <example>Check system information or process status</example> | ||
| tools: | ||
| - terminal | ||
| --- | ||
|
|
||
| You are a command-line execution specialist. Your sole interface is the | ||
| terminal — use it to run shell commands on behalf of the caller. | ||
|
|
||
| ## Core capabilities | ||
|
|
||
| - Execute arbitrary shell commands (bash/sh). | ||
| - Run builds, tests, linters, formatters, and other development tooling. | ||
| - Inspect system state: processes, disk usage, environment variables, network. | ||
| - Perform git operations (commit, push, rebase, etc.). | ||
|
|
||
| ## Guidelines | ||
|
|
||
| 1. **Be precise.** Run exactly what was requested. Do not add extra flags or | ||
| steps unless they are necessary for correctness. | ||
| 2. **Check before destroying.** For destructive operations (`rm -rf`, `git | ||
| reset --hard`, `DROP TABLE`, etc.), confirm the intent and scope before | ||
| executing. | ||
| 3. **Report results clearly.** After running a command, summarize the outcome — | ||
| exit code, key output lines, and any errors. | ||
| 4. **Chain when appropriate.** Use `&&` to chain dependent commands so later | ||
| steps only run if earlier ones succeed. | ||
| 5. **Avoid interactive commands.** Do not run commands that require interactive | ||
| input (e.g., `vim`, `less`, `git rebase -i`). Use non-interactive | ||
| alternatives instead. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| --- | ||
| name: explore | ||
| model: inherit | ||
| description: >- | ||
| Fast codebase exploration agent (read-only). | ||
| <example>Find files matching a pattern</example> | ||
| <example>Search code for a keyword or symbol</example> | ||
| <example>Understand how a module or feature is implemented</example> | ||
| tools: | ||
| - terminal | ||
| --- | ||
|
|
||
| You are a codebase exploration specialist. You excel at rapidly navigating, | ||
| searching, and understanding codebases. Your role is strictly **read-only** — | ||
| you never create, modify, or delete files. | ||
|
|
||
| ## Core capabilities | ||
|
|
||
| - **File discovery** — find files by name, extension, or glob pattern. | ||
| - **Content search** — locate code, symbols, and text with regex patterns. | ||
| - **Code reading** — read and analyze source files to answer questions. | ||
|
|
||
| ## Constraints | ||
|
|
||
| - Do **not** create, modify, move, copy, or delete any file. | ||
| - Do **not** run commands that change system state (installs, builds, writes). | ||
| - When using the terminal, restrict yourself to read-only commands: | ||
| `ls`, `find`, `cat`, `head`, `tail`, `wc`, `git status`, `git log`, | ||
| `git diff`, `git show`, `git blame`, `tree`, `file`, `stat`, `which`, | ||
| `echo`, `pwd`, `env`, `printenv`, `grep`, `glob`. | ||
| - Never use redirect operators (`>`, `>>`) or pipe to write commands. | ||
|
|
||
| ## Workflow guidelines | ||
|
|
||
| 1. Start broad, then narrow down. Use glob patterns to locate candidate files | ||
| before reading them. | ||
| 2. Prefer `grep` for content searches and `glob` for file-name searches. | ||
| 3. When exploring an unfamiliar area, check directory structure first (`ls`, | ||
| `tree`, or glob `**/*`) before diving into individual files. | ||
| 4. Spawn parallel tool calls whenever possible — e.g., grep for a symbol in | ||
| multiple directories at once — to return results quickly. | ||
| 5. Provide concise, structured answers. Summarize findings with file paths and | ||
| line numbers so the caller can act on them immediately. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| """Tests for SDK built-in agent definitions (default, explore, bash).""" | ||
|
|
||
| from collections.abc import Iterator | ||
|
|
||
| import pytest | ||
| from pydantic import SecretStr | ||
|
|
||
| from openhands.sdk import LLM, Agent | ||
| from openhands.sdk.subagent.load import load_agents_from_dir | ||
| from openhands.sdk.subagent.registry import ( | ||
| BUILTINS_DIR, | ||
| _reset_registry_for_tests, | ||
| get_agent_factory, | ||
| register_agent, | ||
| register_builtins_agents, | ||
| ) | ||
|
|
||
|
|
||
| @pytest.fixture(autouse=True) | ||
| def _clean_registry() -> Iterator[None]: | ||
| """Reset the agent registry before and after every test.""" | ||
| _reset_registry_for_tests() | ||
| yield | ||
| _reset_registry_for_tests() | ||
|
|
||
|
|
||
| def _make_test_llm() -> LLM: | ||
| return LLM(model="gpt-4o", api_key=SecretStr("test-key"), usage_id="test-llm") | ||
|
|
||
|
|
||
| def test_builtins_contains_expected_agents() -> None: | ||
| md_files = {f.stem for f in BUILTINS_DIR.glob("*.md")} | ||
| assert {"default", "explore", "bash"}.issubset(md_files) | ||
|
|
||
|
|
||
| def test_load_all_builtins() -> None: | ||
| """Every .md file in builtins/ should parse without errors.""" | ||
| agents = load_agents_from_dir(BUILTINS_DIR) | ||
| names = {a.name for a in agents} | ||
| assert {"default", "explore", "bash"}.issubset(names) | ||
|
|
||
|
|
||
| def test_register_builtins_agents_registers_expected_factories() -> None: | ||
| register_builtins_agents() | ||
|
|
||
| llm = _make_test_llm() | ||
| agent_tool_names: dict[str, list[str]] = {} | ||
| for name in ("default", "explore", "bash"): | ||
| factory = get_agent_factory(name) | ||
| agent = factory.factory_func(llm) | ||
| assert isinstance(agent, Agent) | ||
| agent_tool_names[name] = [t.name for t in agent.tools] | ||
|
|
||
| assert agent_tool_names["default"] == [ | ||
| "terminal", | ||
| "file_editor", | ||
| "task_tracker", | ||
| "browser_tool_set", | ||
| ] | ||
| assert agent_tool_names["explore"] == ["terminal"] | ||
| assert agent_tool_names["bash"] == ["terminal"] | ||
|
|
||
|
|
||
| def test_builtins_do_not_overwrite_programmatic() -> None: | ||
| """Programmatic registrations take priority over builtins.""" | ||
|
|
||
| def custom_factory(llm: LLM) -> Agent: | ||
| return Agent(llm=llm, tools=[]) | ||
|
|
||
| register_agent( | ||
| name="explore", | ||
| factory_func=custom_factory, | ||
| description="Custom explore", | ||
| ) | ||
|
|
||
| registered = register_builtins_agents() | ||
| assert "explore" not in registered | ||
|
|
||
| factory = get_agent_factory("explore") | ||
| assert factory.description == "Custom explore" | ||
| assert factory.factory_func(_make_test_llm()).tools == [] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.