feat(pr-review): add sub-agent delegation for file-level reviews#164
feat(pr-review): add sub-agent delegation for file-level reviews#164
Conversation
Add experimental support for splitting PR reviews across multiple sub-agents using the SDK DelegateTool. When USE_SUB_AGENTS=true: - The main agent becomes a review coordinator - It spawns file_reviewer sub-agents (one per changed file) - Each sub-agent reviews its file and returns structured findings - The coordinator consolidates results and posts a single PR review New components: - SUB_AGENT_PROMPT: coordinator prompt template in prompt.py - FILE_REVIEWER_SKILL: sub-agent persona/instructions - _create_file_reviewer_agent: factory for file_reviewer agents - _register_sub_agents: registers the agent type and DelegateTool The feature is opt-in via the use-sub-agents action input (default: false). All existing behavior is preserved when the flag is not set. Co-authored-by: openhands <openhands@all-hands.dev>
Point the pr-review workflow at the extensions sub-agent branch (OpenHands/extensions#164) and enable use-sub-agents: true to experiment with file-level review delegation in this repo. Revert to @main once the experiment concludes and the extensions PR is merged. Co-authored-by: openhands <openhands@all-hands.dev>
| from openhands.sdk.plugin import PluginSource | ||
| from openhands.sdk.subagent import register_agent | ||
| from openhands.sdk.tool import register_tool | ||
| from openhands.tools.delegate import DelegateTool, DelegationVisualizer |
There was a problem hiding this comment.
The DelegateTool is the deprecated one.
TaskToolSet is the current SDK tool for sub-agent delegation: - Sequential/blocking execution (one file review at a time) - Single 'task' tool call with prompt, subagent_type, description - Auto-registers on import (no register_tool() needed) - Built-in resumption via task IDs DelegateTool is the older parallel variant (spawn + delegate). Co-authored-by: openhands <openhands@all-hands.dev>
Add missing stubs to test_pr_review_review_context.py for: - sdk.Tool (added to agent_script imports) - openhands.sdk.context.Skill - openhands.sdk.plugin.PluginSource - openhands.tools.delegate (DelegationVisualizer, register_agent) - openhands.tools.task (TaskToolSet) Also fix a prompt module cache collision: clear sys.modules['prompt'] before loading agent_script.py so it picks up the correct prompt.py from pr-review/scripts/ instead of release-notes/scripts/. Co-authored-by: openhands <openhands@all-hands.dev>
|
@OpenHands please merge from main, and /verify until this PR is approved |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Addresses a real problem (agent context limits on large PRs) with an experimental opt-in feature, but the unresolved deprecation issue must be addressed before merge.
| from openhands.sdk.conversation import get_agent_final_response | ||
| from openhands.sdk.git.utils import run_git_command | ||
| from openhands.sdk.plugin import PluginSource | ||
| from openhands.tools.delegate import DelegationVisualizer, register_agent |
There was a problem hiding this comment.
🔴 Critical: This addresses the unresolved review thread. The previous reviewer noted that DelegateTool is deprecated, but this code still imports register_agent and DelegationVisualizer from openhands.tools.delegate.
Questions to resolve:
- Does
TaskToolSetprovide its own agent registration API that should be used instead ofregister_agent? - Is there a TaskToolSet-native visualizer that replaces
DelegationVisualizer? - If these imports from the delegate module are still the correct API, please document why they're still valid despite the deprecation concern.
Without clarity on the correct API pattern, this could break when the deprecated module is eventually removed.
| TaskToolSet auto-registers on import, so no explicit | ||
| ``register_tool()`` call is needed. | ||
| """ | ||
| register_agent( |
There was a problem hiding this comment.
🟠 Important: Related to the deprecation concern - verify that register_agent from openhands.tools.delegate is the correct API for TaskToolSet-based delegation. If TaskToolSet has its own registration mechanism (e.g., TaskToolSet.register_agent()), that should be used instead.
| description. | ||
| - `description`: a short label like `"Review src/utils.py"` | ||
|
|
||
| 2. **Collect results** — each task tool call returns the sub-agent's findings. |
There was a problem hiding this comment.
🟠 Important: The coordinator relies entirely on the LLM to parse and merge JSON responses from sub-agents with no code-level validation. This could be fragile:
- What happens if a sub-agent returns malformed JSON?
- What if the LLM fails to properly merge findings or loses information?
- How do we handle errors in the delegation flow?
Consider adding explicit error handling in the code layer, or at minimum, document these limitations in the README as known constraints of the experimental feature.
|
|
||
| You have access to the **task** tool (TaskToolSet). Follow these steps: | ||
|
|
||
| 1. **Delegate file reviews** — for each changed file (or small group of |
There was a problem hiding this comment.
🟡 Suggestion: The coordinator prompt has significant responsibilities:
- Parse the diff and split by file
- Delegate to sub-agents with correct parameters
- Parse JSON responses
- Merge and deduplicate findings
- Map findings to correct line numbers
- Post the consolidated review
For a production feature, consider whether steps 1, 3, and 4 should be handled in code rather than relying on LLM prompt engineering. For an experimental opt-in feature this is acceptable, but worth documenting as a known limitation.
| assert "Analyze the changes and post your review" in prompt | ||
|
|
||
|
|
||
| def test_format_prompt_uses_sub_agent_prompt_when_enabled(): |
There was a problem hiding this comment.
🟡 Suggestion: Current tests only verify prompt formatting. For a more robust experimental feature, consider adding at least a smoke test that verifies:
- The agent registration completes without errors
- The TaskToolSet is properly added to the agent's tools
- Basic mocking of a delegation flow to ensure the plumbing works
This would catch integration issues earlier than waiting for manual workflow testing. However, acceptable for an initial experimental feature if documented as a testing gap.
| | `llm-base-url` | No | `''` | Custom LLM endpoint URL | | ||
| | `review-style` | No | `roasted` | **[DEPRECATED]** Previously chose between `standard` and `roasted` review styles. Now ignored — the styles have been merged into a single unified skill. | | ||
| | `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. | | ||
| | `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to `file_reviewer` sub-agents via the SDK TaskToolSet, then consolidates findings into a single PR review. Useful for large PRs with many changed files. | |
There was a problem hiding this comment.
🟢 Nit: Consider adding a Known Limitations subsection under the experimental feature description that mentions:
- JSON parsing is LLM-driven (no code validation)
- Requires manual testing via workflow (no integration tests yet)
- Potential for information loss during finding consolidation
This sets appropriate expectations for users trying the experimental feature.
…docs - Move register_agent import from openhands.tools.delegate to openhands.sdk where it actually lives (fixes critical review feedback) - Update test stub to match the corrected import path - Add 'Known Limitations: Sub-Agent Delegation' section to README documenting experimental constraints (LLM-driven JSON parsing, potential consolidation info loss, no integration tests yet) Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed review feedback in 7dcf8b6:
Ready for another look. This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solid experimental feature with good opt-in design and documentation. One critical unresolved issue must be fixed: DelegationVisualizer is still imported from the deprecated module.
| from openhands.sdk.context.skills import load_project_skills | ||
| from openhands.sdk.conversation import get_agent_final_response | ||
| from openhands.sdk.git.utils import run_git_command | ||
| from openhands.sdk.plugin import PluginSource |
There was a problem hiding this comment.
🔴 Critical: Partially unresolved deprecation issue. While register_agent has been correctly moved to openhands.sdk, DelegationVisualizer is still imported from the deprecated openhands.tools.delegate module.
Verify:
- Does
TaskToolSetoropenhands.sdkprovide an updatedDelegationVisualizerAPI? - If not, is this visualizer still compatible with TaskToolSet-based delegation?
- Document why using the deprecated module is acceptable if it's the only option.
The previous critical review thread specifically flagged this import as problematic.
| skill_content = get_file_reviewer_skill_content(review_style) | ||
|
|
||
| skills = [ | ||
| Skill( |
There was a problem hiding this comment.
🟠 Important: Verify the factory function signature matches register_agent's expectations.
The function accepts llm: LLM but it's unclear if this is the correct signature for the factory_func parameter. Check the SDK docs or type hints for register_agent to confirm the expected signature.
If the signature is wrong, agent instantiation will fail at runtime with no static type checking to catch it.
| - `path`: the file path | ||
| - `line`: the diff line number (use the NEW file line number) | ||
| - `severity`: one of `critical`, `major`, `minor`, `nit` | ||
| - `body`: a concise description of the issue with a suggested fix when possible |
There was a problem hiding this comment.
🟠 Important: The FILE_REVIEWER_SKILL lacks a concrete JSON schema example. The coordinator will parse LLM-generated JSON with no validation.
Suggest adding an explicit example to reduce parsing ambiguity:
| - `body`: a concise description of the issue with a suggested fix when possible | |
| You are a **file-level code reviewer**. You will receive a diff for one or more | |
| files from a pull request together with PR metadata. | |
| Review style: {review_style_description} | |
| For each issue you find, return a JSON object with: | |
| - `path`: the file path | |
| - `line`: the diff line number (use the NEW file line number) | |
| - `severity`: one of `critical`, `major`, `minor`, `nit` | |
| - `body`: a concise description of the issue with a suggested fix when possible | |
| Return your findings as a JSON array. If you find no issues, return `[]`. | |
| **Example output:** | |
| ```json | |
| [ | |
| {"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"}, | |
| {"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"} | |
| ] |
Do NOT post anything to the GitHub API — the coordinator agent will handle that.
This addresses the fragility concern raised in the previous review.
| 1. **Delegate file reviews** — for each changed file (or small group of | ||
| closely related files), call the task tool with: | ||
| - `subagent_type`: `"file_reviewer"` | ||
| - `prompt`: the diff chunk for the file(s), together with the PR context |
There was a problem hiding this comment.
🟡 Suggestion: The coordinator has complex responsibilities with no explicit error handling guidance. Consider adding a step for handling malformed sub-agent responses:
| - `prompt`: the diff chunk for the file(s), together with the PR context | |
| You have access to the **task** tool (TaskToolSet). Follow these steps: | |
| 1. **Delegate file reviews** — for each changed file (or small group of | |
| closely related files), call the task tool with: | |
| - `subagent_type`: `"file_reviewer"` | |
| - `prompt`: the diff chunk for the file(s), together with the PR context | |
| (title, description, base/head branch). Ask it to return a structured | |
| list of findings with severity, file path, line number, and a short | |
| description. | |
| - `description`: a short label like `"Review src/utils.py"` | |
| 2. **Collect results** — each task tool call returns the sub-agent's findings. | |
| If a sub-agent returns malformed JSON or fails, log the error and continue | |
| with other files. Merge all valid findings together. De-duplicate and drop | |
| low-signal noise. | |
| 3. **Post the review** — use the GitHub API (as described by /github-pr-review) | |
| to submit a single PR review with inline comments on the relevant lines. | |
| Keep the top-level review body brief. |
This provides explicit guidance for handling failures, which is critical given the LLM-driven parsing noted in Known Limitations.
| assert "Analyze the changes and post your review" in prompt | ||
|
|
||
|
|
||
| def test_format_prompt_uses_sub_agent_prompt_when_enabled(): |
There was a problem hiding this comment.
🟡 Suggestion: Current tests only verify prompt formatting. For an experimental feature, consider adding at least one integration smoke test that:
- Verifies
_register_sub_agents()completes without errors - Confirms
TaskToolSetis added to agent tools whenuse_sub_agents=True - Validates the
_create_file_reviewer_agent()factory returns a valid Agent
This was flagged in the previous review and is still unaddressed. The Known Limitations section acknowledges this, but a basic smoke test would catch API signature mismatches before deployment.
| The `use-sub-agents` feature is **experimental** and has the following known constraints: | ||
|
|
||
| - **LLM-driven JSON parsing**: The coordinator agent relies on the LLM to parse and merge JSON responses from sub-agents. There is no code-level validation of sub-agent output, so malformed responses may cause incomplete reviews. | ||
| - **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues. |
There was a problem hiding this comment.
🟢 Nit: Typo - "analyse" should be "analyze" for consistency with US English used elsewhere in the codebase.
| - **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues. | |
| - **Sub-agents have no tools**: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API. |
- Add JSON schema example to FILE_REVIEWER_SKILL for less ambiguous output - Add error handling guidance to coordinator prompt for malformed sub-agent responses - Fix typo: 'analyse' -> 'analyze' in README - Add smoke tests for _register_sub_agents() and _create_file_reviewer_agent() - Improve test stubs to accept keyword arguments (Skill, Agent, etc.) Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed second round of review feedback in 199020e:
Ready for another look. This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solid experimental feature with good opt-in design. The critical deprecation concern from previous reviews is actually a non-issue, but there are a few minor items to address.
| from openhands.sdk.context.skills import load_project_skills | ||
| from openhands.sdk.conversation import get_agent_final_response | ||
| from openhands.sdk.git.utils import run_git_command | ||
| from openhands.sdk.plugin import PluginSource |
There was a problem hiding this comment.
🟢 Clarification on previous review: The concern about DelegationVisualizer being deprecated appears to be unfounded. I tested the import with deprecation warnings enabled and no warning was raised. The confusion likely stems from DelegateTool being deprecated (which you correctly replaced with TaskToolSet), not DelegationVisualizer. The import from openhands.tools.delegate is valid.
Note: register_agent was correctly moved to openhands.sdk as you've done.
| return pr_diff, commit_id, review_context | ||
|
|
||
|
|
||
| def _create_file_reviewer_agent(llm: LLM) -> Agent: |
There was a problem hiding this comment.
🟢 Clarification on previous review: The factory function signature is correct. I verified that register_agent expects Callable[[LLM], Agent], which matches _create_file_reviewer_agent(llm: LLM) -> Agent. This concern has been properly addressed.
| all GitHub API interaction. | ||
| """ | ||
| # review_style is read at registration time from the environment | ||
| review_style = os.getenv("REVIEW_STYLE", "standard").lower() |
There was a problem hiding this comment.
🟡 Suggestion: The deprecated REVIEW_STYLE env var is still being read here for sub-agents, even though action.yml says it's "deprecated and ignored." While this works fine (proper defaults exist), it's potentially confusing.
Consider either:
- Updating the action.yml description to clarify it's only deprecated for the main reviewer, not sub-agents
- Or removing this usage and hardcoding to "standard" if sub-agents should always use standard style
Not blocking, but worth clarifying the intended behavior.
| ] | ||
| return Agent( | ||
| llm=llm, | ||
| tools=[], # sub-agents only analyse; coordinator posts the review |
There was a problem hiding this comment.
🟢 Nit: Spelling consistency - "analyse" (British) vs "analyze" (US). The README was updated to use US English ("analyze"), so this comment should match.
| tools=[], # sub-agents only analyse; coordinator posts the review | |
| tools=[], # sub-agents only analyze; coordinator posts the review |
|
|
||
| You have access to the **task** tool (TaskToolSet). Follow these steps: | ||
|
|
||
| 1. **Delegate file reviews** — for each changed file (or small group of |
There was a problem hiding this comment.
🟢 Previous concern addressed: The coordinator prompt now includes explicit error handling guidance in step 2 ("If a sub-agent returns malformed output... skip its results and note the file in the review body"). Well done.
| | `github-token` | Yes | - | GitHub token for API access | | ||
| | `lmnr-api-key` | No | `''` | Laminar API key for observability | | ||
|
|
||
| ## Known Limitations: Sub-Agent Delegation |
There was a problem hiding this comment.
🟢 Previous concern addressed: The Known Limitations section clearly documents the experimental nature and constraints (LLM-driven JSON parsing, potential information loss, no integration tests yet, sub-agents have no tools). This transparency is excellent for an experimental feature.
- Fix spelling: 'analyse' -> 'analyze' in code comment - Clarify REVIEW_STYLE deprecation scope in sub-agent factory comment Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed remaining nits in 0ef1afc:
This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults and clear documentation. Just one tiny spelling inconsistency to fix.
| skill_content = get_file_reviewer_skill_content(review_style) | ||
|
|
||
| skills = [ | ||
| Skill( |
There was a problem hiding this comment.
🟢 Nit: Spelling consistency - use US English to match the rest of the codebase (README uses "analyze").
| Skill( | |
| tools=[], # sub-agents only analyze; coordinator posts the review |
| """ | ||
| # REVIEW_STYLE is deprecated for the main reviewer (styles are merged), | ||
| # but still used here to configure sub-agent tone. Defaults to "standard". | ||
| review_style = os.getenv("REVIEW_STYLE", "standard").lower() |
There was a problem hiding this comment.
🟢 Acceptable: Using the deprecated REVIEW_STYLE for sub-agent tone is a pragmatic choice - it's confusing from a naming perspective but functionally correct. The default is sensible and the main reviewer correctly ignores this env var per the deprecation notice.
SummaryThe user's request had two parts: merge from main and /verify until the PR is approved. Here's the status: ✅ Merge from main
✅ /verify loop completed (3 iterations)Round 1 — Bot requested changes (critical import bug):
Round 2 — Bot requested changes (improvements):
Round 3 — Bot left
Checklist
ConcisenessAll changes directly address either the merge request or review feedback from the bot. No extraneous changes were introduced. The 4 commits on top of the merge are: (1) critical import fix + docs, (2) schema example + error handling + smoke tests, (3) spelling nits — all driven by specific bot review comments.
|
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults. Previous review concerns (spelling consistency, error handling guidance, limitations documentation) have been addressed. Two minor observations below that haven't been discussed yet, but neither blocks merge.
KEY INSIGHT: Clean separation of concerns - coordinator handles API interaction, sub-agents analyze only. This constraint eliminates a whole class of potential issues.
| - **Base Branch**: {base_branch} | ||
| - **Head Branch**: {head_branch} | ||
| - **PR Number**: {pr_number} | ||
| - **Commit ID**: {commit_id} |
There was a problem hiding this comment.
🟢 Nit: "De-duplicate and drop low-signal noise" is subjective guidance for an LLM. One model's "noise" might be a useful finding to a human. For an experimental feature this is acceptable, but consider either:
- Being more specific (e.g., "drop duplicate findings for the same line")
- Erring on the side of keeping findings (let humans decide what's noise)
Not a blocker given the experimental flag and documented limitations.
| use_sub_agents = config.get("use_sub_agents", False) | ||
| if use_sub_agents: | ||
| _register_sub_agents() | ||
| tools.append(Tool(name=TaskToolSet.name)) |
There was a problem hiding this comment.
🟢 Acceptable: No validation that sub-agent registration succeeded or that TaskToolSet.name is valid. If registration silently fails, the coordinator would try to use a non-existent tool.
For an experimental feature with manual workflow testing, this is acceptable. If/when this graduates from experimental, consider adding validation:
try:
_register_sub_agents()
tools.append(Tool(name=TaskToolSet.name))
logger.info(f"Sub-agent delegation enabled — {TaskToolSet.name} registered")
except Exception as e:
logger.error(f"Sub-agent registration failed: {e}")
raise|
@OpenHands please send the prompt to the OpenHands/software-agent-SDK and update the review script there to use the sub-agent mode defined here. Note that you should not merge the current PR. The code review skill in that repo should select this code review skill from the feature branch. |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Point pr-review workflow at the extensions feature branch (feat/pr-review-sub-agent-delegation) and enable use-sub-agents: true. This enables the experimental sub-agent delegation mode where the main review agent acts as a coordinator, delegating per-file reviews to file_reviewer sub-agents via the TaskToolSet, then consolidating findings into a single PR review. Companion to OpenHands/extensions#164. Co-authored-by: openhands <openhands@all-hands.dev>
|
Companion PR opened in the software-agent-SDK repo: OpenHands/software-agent-sdk#2839 Changes made:
The SDK workflow will now use the sub-agent coordinator prompt and file_reviewer sub-agents defined in this PR. Once this extensions PR is merged to main, the SDK PR should be updated to point back to This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
Summary of WorkWhat was doneThe PR comment requested two things: (1) send the prompt/review configuration to the Checklist
Are the changes concise?Yes — the SDK PR contains exactly one file change (
No extraneous changes were made. The PR description clearly notes that the extensions ref should be reverted to |
| - {commit_id} - The HEAD commit SHA | ||
| - {review_context} - Previous review comments and thread resolution status | ||
|
|
||
| When sub-agent delegation is enabled, the main agent acts as a coordinator |
There was a problem hiding this comment.
Could we have "smart activation" of delegation? Prompt the main agent so that it decides when to delegate vs when not to delegate. And maybe also add an override that forces the mode?
There was a problem hiding this comment.
Addressed in 4f1e5da: added auto mode for use-sub-agents.
Tri-state modes:
false(default): single-agent review, no delegationtrue: force delegation — coordinator + file_reviewer sub-agentsauto: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size/complexity (roughly 4+ files or 500+ changed lines → delegate, otherwise review directly)
The auto mode uses a dedicated AUTO_DELEGATION_PROMPT that gives the agent heuristics for when delegation is worthwhile vs overhead.
This comment was posted by an AI assistant (OpenHands) on behalf of a user.
| ] | ||
| return Agent( | ||
| llm=llm, | ||
| tools=[], # sub-agents only analyze; coordinator posts the review |
There was a problem hiding this comment.
is this not a limitation? Should not the subagent having access to the Terminal? Just for read operation.
There was a problem hiding this comment.
Addressed in 4f1e5da: file_reviewer sub-agents now get terminal and file_editor tools.
They can use cat, grep, and file_editor view to inspect surrounding code context in the PR repo — so they're not limited to just the diff snippet. The coordinator still handles all GitHub API interaction.
Updated both the agent factory (_create_file_reviewer_agent) and the FILE_REVIEWER_SKILL prompt to document the available tools.
This comment was posted by an AI assistant (OpenHands) on behalf of a user.
Address two review comments on PR #164: 1. Smart activation (simonrosenberg): Add 'auto' mode for use-sub-agents. In auto mode the agent gets the TaskToolSet but decides at runtime whether to delegate based on diff size/complexity, vs reviewing directly for small PRs. 'true' forces delegation, 'false' disables it. 2. Sub-agent terminal access (VascoSch92): Give file_reviewer sub-agents terminal and file_editor tools so they can inspect surrounding code context (cat, grep, view) instead of relying only on the diff snippet. The coordinator still handles all GitHub API interaction. Changes: - action.yml: document 'auto'/'true'/'false' tri-state for use-sub-agents - agent_script.py: add _get_sub_agents_mode() for tri-state parsing, give sub-agents terminal + file_editor tools - prompt.py: add AUTO_DELEGATION_PROMPT with delegation heuristics, update FILE_REVIEWER_SKILL to mention tool access, update format_prompt to handle str|bool use_sub_agents - tests: add 4 new tests for auto mode and tool access All 11 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>
Drop the tri-state (false/true/auto) in favour of a simple boolean: - false (default): single-agent review, no delegation - true: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size and complexity The forced-coordinator mode (SUB_AGENT_PROMPT) is removed; there is now a single DELEGATION_PROMPT that includes heuristics for when delegation is worthwhile vs overhead. Sub-agent tool access (terminal + file_editor) is kept from the previous commit so file_reviewer sub-agents can inspect surrounding code context. All 8 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>
Instead of a full duplicate DELEGATION_PROMPT, append a short _DELEGATION_SUFFIX to the base PROMPT when use_sub_agents=True. The main agent gets the same review prompt it always had, plus a section explaining the task tool is available for large diffs. Rewrote FILE_REVIEWER_SKILL with: - clear section structure (task, tools, review style, output format) - explicit JSON schema table with field types and descriptions - severity guide (critical/major/minor/nit) - example output and empty-array case Removed redundant system_message_suffix from _create_file_reviewer_agent since the skill content now covers everything. All 8 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>
✅ Sub-agent delegation is working — evidence from SDK PR #2890We triggered the reviewer bot CI on software-agent-sdk#2890 (the PR that adds 1. Extensions checked out from the feature branch ✅2. Sub-agent code path activated ✅{"message": "Reviewing PR #2890: fix(ci): pass extensions-version so sub-agent review script is used"}
{"message": "Sub-agent delegation: True"}
{"message": "Sub-agent delegation enabled — TaskToolSet added"}Compare with the previous run on PR #2886 (before the 3. Smart activation: agent reviewed directly (correct for small diff) ✅PR #2890 is a 1-file, 3-line change — well below the delegation threshold (4+ files / 500+ lines). The agent correctly chose to review it directly instead of spawning sub-agents. The review was posted in 1m44s:
Summary
This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
Review styles (standard/roasted) were already merged into a single unified style for the main agent. The sub-agent still had the old style_descriptions dict and REVIEW_STYLE env var dispatch. - Bake unified review description directly into FILE_REVIEWER_SKILL - Simplify get_file_reviewer_skill_content() to take no arguments - Remove REVIEW_STYLE env var usage from _create_file_reviewer_agent - Merge two style-specific tests into one unified test All 7 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>
…sh tool - Delete get_file_reviewer_skill_content() — callers now import the FILE_REVIEWER_SKILL constant directly - Sub-agent returns its JSON findings via the finish tool instead of a vague 'return ONLY the JSON array' instruction Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable design with one critical documentation bug that must be fixed. The experimental sub-agent delegation feature is well-structured with solid opt-in defaults and clear separation of concerns. Most previous review concerns have been addressed.
Review SummaryTaste Rating: 🟡 Acceptable - Well-designed experimental feature with one documentation bug that must be fixed before merge. What Works Well✅ Clean Architecture:
✅ Smart Opt-In Design:
✅ Sub-Agents Have Context Access:
✅ Clear Output Schema:
✅ Previous Concerns Addressed:
Critical Issue🔴 Documentation Bug (README.md:159): The Known Limitations section states: "Sub-agents have no tools" This is factually incorrect. The code in return Agent(
llm=llm,
tools=[
Tool(name="terminal"),
Tool(name="file_editor"),
],
agent_context=AgentContext(skills=skills),
)And the FILE_REVIEWER_SKILL explicitly documents:
Must fix: Update the README to reflect that sub-agents have limited (read-only) tools, not zero tools. [RISK ASSESSMENT]Risk Level Justification:
Mitigating Factors:
Recommendation: KEY INSIGHTThe coordinator/sub-agent separation is the right architectural choice: by giving sub-agents read-only tools (terminal, file_editor) for context inspection while keeping all GitHub API interaction in the coordinator, you eliminate coordination complexity and potential race conditions. The documentation just needs to accurately reflect this design. This review was created by an AI agent (OpenHands) on behalf of a user. |
|
@OpenHands /iterate until the pr-review bot is happy |
|
I'm on it! xingyaoww can track my progress at all-hands.dev |
Sub-agents DO have terminal and file_editor tools for reading source files and context — the README incorrectly stated they had no tools. Co-authored-by: openhands <openhands@all-hands.dev>
|
Addressed the critical documentation bug in 2b1fcf3: corrected the Known Limitation to accurately state that sub-agents have This comment was posted by an AI assistant (OpenHands) on behalf of a user. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Well-designed experimental feature addressing agent context limits on large PRs. Previous critical documentation bug has been resolved. The opt-in default and clear documentation of limitations make this safe to merge.
KEY INSIGHT: Smart design choice to use LLM-driven delegation decision rather than hardcoded heuristics—aligns well with the experimental nature and allows the model to make context-aware choices.
[RISK ASSESSMENT]
Factors considered:
- ✅ Opt-in experimental feature (no impact on default behavior)
- ✅ Clean separation: coordinator handles API, sub-agents analyze only
- ✅ Previous critical issue (documentation) was fixed in 2b1fcf3
⚠️ LLM-driven JSON parsing without validation (documented limitation)⚠️ Limited integration test coverage (acceptable for experimental)- ✅ Transparent documentation of constraints
The medium risk is acceptable for an opt-in experimental feature with clear documentation. No auto-merge blockers.
SummaryThe Checklist
ConcisenessThe change is minimal — a single line edit in Final StatePR #164 is APPROVED, MERGEABLE, and all CI checks pass on SHA |
Why
Per-PR code reviews can get too large for a single agent to handle well — the agent may get lazy and only look through part of the code. By delegating file-level reviews to sub-agents, we can:
This is the first step toward experimenting with sub-agent-based code review.
Summary
USE_SUB_AGENTSenv var /use-sub-agentsaction input (default:false)file_reviewersub-agents via the SDKDelegateTool, delegates per-file review work, and consolidates findings into a single PR reviewSUB_AGENT_PROMPT(coordinator prompt) andFILE_REVIEWER_SKILL(sub-agent persona) inprompt.pyfile_revieweragent type andDelegateToolinagent_script.pyHow to Test
Unit tests — the prompt tests pass:
All 7 tests pass (2 existing + 5 new).
Integration — to test end-to-end, enable
use-sub-agents: truein a workflow pointing to this branch:Notes
use-sub-agentsinput defaults tofalsefile_reviewersub-agents haveterminalandfile_editorfor read-only source inspection; the coordinator handles all GitHub API interactionOpenHands/software-agent-sdkto point its PR review workflow at this branch for testingThis PR was created by an AI assistant (OpenHands) on behalf of a user.