feat(pr-review): add sub-agent delegation for file-level reviews by xingyaoww · Pull Request #164 · OpenHands/extensions

xingyaoww · 2026-04-13T13:55:23Z

A human has tested these changes.

Why

Per-PR code reviews can get too large for a single agent to handle well — the agent may get lazy and only look through part of the code. By delegating file-level reviews to sub-agents, we can:

Give each sub-agent focused context on a single file (or small group of related files)
Let the coordinator merge findings and post a cohesive review
Leverage the SDK DelegateTool for parallel file-level analysis

This is the first step toward experimenting with sub-agent-based code review.

Summary

Add experimental sub-agent delegation support to the pr-review plugin via USE_SUB_AGENTS env var / use-sub-agents action input (default: false)
When enabled, the main agent becomes a review coordinator that spawns file_reviewer sub-agents via the SDK DelegateTool, delegates per-file review work, and consolidates findings into a single PR review
Add SUB_AGENT_PROMPT (coordinator prompt) and FILE_REVIEWER_SKILL (sub-agent persona) in prompt.py
Register file_reviewer agent type and DelegateTool in agent_script.py
All existing single-agent behavior is preserved when the flag is not set

How to Test

Unit tests — the prompt tests pass:
```
python -m pytest tests/test_pr_review_prompt.py -v
```
All 7 tests pass (2 existing + 5 new).

Integration — to test end-to-end, enable use-sub-agents: true in a workflow pointing to this branch:

uses: OpenHands/extensions/plugins/pr-review@feat/pr-review-sub-agent-delegation
with:
  use-sub-agents: true

Notes

This is marked as experimental — the use-sub-agents input defaults to false
The file_reviewer sub-agents have terminal and file_editor for read-only source inspection; the coordinator handles all GitHub API interaction
A companion PR will be opened in OpenHands/software-agent-sdk to point its PR review workflow at this branch for testing

This PR was created by an AI assistant (OpenHands) on behalf of a user.

Add experimental support for splitting PR reviews across multiple sub-agents using the SDK DelegateTool. When USE_SUB_AGENTS=true: - The main agent becomes a review coordinator - It spawns file_reviewer sub-agents (one per changed file) - Each sub-agent reviews its file and returns structured findings - The coordinator consolidates results and posts a single PR review New components: - SUB_AGENT_PROMPT: coordinator prompt template in prompt.py - FILE_REVIEWER_SKILL: sub-agent persona/instructions - _create_file_reviewer_agent: factory for file_reviewer agents - _register_sub_agents: registers the agent type and DelegateTool The feature is opt-in via the use-sub-agents action input (default: false). All existing behavior is preserved when the flag is not set. Co-authored-by: openhands <openhands@all-hands.dev>

@main

Point the pr-review workflow at the extensions sub-agent branch (OpenHands/extensions#164) and enable use-sub-agents: true to experiment with file-level review delegation in this repo. Revert to @main once the experiment concludes and the extensions PR is merged. Co-authored-by: openhands <openhands@all-hands.dev>

VascoSch92 · 2026-04-13T13:59:01Z

 from openhands.sdk.plugin import PluginSource
+from openhands.sdk.subagent import register_agent
+from openhands.sdk.tool import register_tool
+from openhands.tools.delegate import DelegateTool, DelegationVisualizer


The DelegateTool is the deprecated one.

TaskToolSet is the current SDK tool for sub-agent delegation: - Sequential/blocking execution (one file review at a time) - Single 'task' tool call with prompt, subagent_type, description - Auto-registers on import (no register_tool() needed) - Built-in resumption via task IDs DelegateTool is the older parallel variant (spawn + delegate). Co-authored-by: openhands <openhands@all-hands.dev>

Add missing stubs to test_pr_review_review_context.py for: - sdk.Tool (added to agent_script imports) - openhands.sdk.context.Skill - openhands.sdk.plugin.PluginSource - openhands.tools.delegate (DelegationVisualizer, register_agent) - openhands.tools.task (TaskToolSet) Also fix a prompt module cache collision: clear sys.modules['prompt'] before loading agent_script.py so it picks up the correct prompt.py from pr-review/scripts/ instead of release-notes/scripts/. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-15T14:33:31Z

@OpenHands please merge from main, and /verify until this PR is approved

openhands-ai · 2026-04-15T14:33:54Z

I'm on it! xingyaoww can track my progress at all-hands.dev

…ent-delegation

all-hands-bot

🟡 Acceptable - Addresses a real problem (agent context limits on large PRs) with an experimental opt-in feature, but the unresolved deprecation issue must be addressed before merge.

all-hands-bot · 2026-04-15T14:39:30Z

 from openhands.sdk.conversation import get_agent_final_response
 from openhands.sdk.git.utils import run_git_command
 from openhands.sdk.plugin import PluginSource
+from openhands.tools.delegate import DelegationVisualizer, register_agent


🔴 Critical: This addresses the unresolved review thread. The previous reviewer noted that DelegateTool is deprecated, but this code still imports register_agent and DelegationVisualizer from openhands.tools.delegate.

Questions to resolve:

Does TaskToolSet provide its own agent registration API that should be used instead of register_agent?

Is there a TaskToolSet-native visualizer that replaces DelegationVisualizer?

If these imports from the delegate module are still the correct API, please document why they're still valid despite the deprecation concern.

Without clarity on the correct API pattern, this could break when the deprecated module is eventually removed.

all-hands-bot · 2026-04-15T14:39:30Z

+    TaskToolSet auto-registers on import, so no explicit
+    ``register_tool()`` call is needed.
+    """
+    register_agent(


🟠 Important: Related to the deprecation concern - verify that register_agent from openhands.tools.delegate is the correct API for TaskToolSet-based delegation. If TaskToolSet has its own registration mechanism (e.g., TaskToolSet.register_agent()), that should be used instead.

all-hands-bot · 2026-04-15T14:39:30Z

+     description.
+   - `description`: a short label like `"Review src/utils.py"`
+
+2. **Collect results** — each task tool call returns the sub-agent's findings.


🟠 Important: The coordinator relies entirely on the LLM to parse and merge JSON responses from sub-agents with no code-level validation. This could be fragile:

What happens if a sub-agent returns malformed JSON?

What if the LLM fails to properly merge findings or loses information?

How do we handle errors in the delegation flow?

Consider adding explicit error handling in the code layer, or at minimum, document these limitations in the README as known constraints of the experimental feature.

all-hands-bot · 2026-04-15T14:39:30Z

+
+You have access to the **task** tool (TaskToolSet). Follow these steps:
+
+1. **Delegate file reviews** — for each changed file (or small group of


🟡 Suggestion: The coordinator prompt has significant responsibilities:

Parse the diff and split by file

Delegate to sub-agents with correct parameters

Parse JSON responses

Merge and deduplicate findings

Map findings to correct line numbers

Post the consolidated review

For a production feature, consider whether steps 1, 3, and 4 should be handled in code rather than relying on LLM prompt engineering. For an experimental opt-in feature this is acceptable, but worth documenting as a known limitation.

all-hands-bot · 2026-04-15T14:39:31Z

+    assert "Analyze the changes and post your review" in prompt
+
+
+def test_format_prompt_uses_sub_agent_prompt_when_enabled():


🟡 Suggestion: Current tests only verify prompt formatting. For a more robust experimental feature, consider adding at least a smoke test that verifies:

The agent registration completes without errors

The TaskToolSet is properly added to the agent's tools

Basic mocking of a delegation flow to ensure the plumbing works

This would catch integration issues earlier than waiting for manual workflow testing. However, acceptable for an initial experimental feature if documented as a testing gap.

all-hands-bot · 2026-04-15T14:39:31Z

 | `llm-base-url` | No | `''` | Custom LLM endpoint URL |
 | `review-style` | No | `roasted` | **[DEPRECATED]** Previously chose between `standard` and `roasted` review styles. Now ignored — the styles have been merged into a single unified skill. |
 | `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. |
+| `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to `file_reviewer` sub-agents via the SDK TaskToolSet, then consolidates findings into a single PR review. Useful for large PRs with many changed files. |


🟢 Nit: Consider adding a Known Limitations subsection under the experimental feature description that mentions:

JSON parsing is LLM-driven (no code validation)

Requires manual testing via workflow (no integration tests yet)

Potential for information loss during finding consolidation

This sets appropriate expectations for users trying the experimental feature.

…docs - Move register_agent import from openhands.tools.delegate to openhands.sdk where it actually lives (fixes critical review feedback) - Update test stub to match the corrected import path - Add 'Known Limitations: Sub-Agent Delegation' section to README documenting experimental constraints (LLM-driven JSON parsing, potential consolidation info loss, no integration tests yet) Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-15T14:44:13Z

Addressed review feedback in 7dcf8b6:

🔴 Critical — wrong import path: Moved register_agent import from openhands.tools.delegate to openhands.sdk where it actually lives. Updated test stubs to match.
🟠 Important — register_agent API: Same fix as above — openhands.sdk.register_agent() is the correct API for TaskToolSet-based delegation.
🟠 Important — error handling: Documented as a known limitation for this experimental feature (see new README section).
🟡 Suggestion — code vs prompt: Documented as acceptable for experimental opt-in; will be addressed as the feature matures.
🟡 Suggestion — smoke tests: Documented as a testing gap in Known Limitations; acceptable for initial experimental feature.
🟢 Nit — Known Limitations: Added a dedicated Known Limitations: Sub-Agent Delegation section to README.

Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

all-hands-bot

🟡 Acceptable - Solid experimental feature with good opt-in design and documentation. One critical unresolved issue must be fixed: DelegationVisualizer is still imported from the deprecated module.

all-hands-bot · 2026-04-15T14:46:30Z

 from openhands.sdk.context.skills import load_project_skills
 from openhands.sdk.conversation import get_agent_final_response
 from openhands.sdk.git.utils import run_git_command
 from openhands.sdk.plugin import PluginSource


🔴 Critical: Partially unresolved deprecation issue. While register_agent has been correctly moved to openhands.sdk, DelegationVisualizer is still imported from the deprecated openhands.tools.delegate module.

Verify:

Does TaskToolSet or openhands.sdk provide an updated DelegationVisualizer API?

If not, is this visualizer still compatible with TaskToolSet-based delegation?

Document why using the deprecated module is acceptable if it's the only option.

The previous critical review thread specifically flagged this import as problematic.

all-hands-bot · 2026-04-15T14:46:30Z

+    skill_content = get_file_reviewer_skill_content(review_style)
+
+    skills = [
+        Skill(


🟠 Important: Verify the factory function signature matches register_agent's expectations.

The function accepts llm: LLM but it's unclear if this is the correct signature for the factory_func parameter. Check the SDK docs or type hints for register_agent to confirm the expected signature.

If the signature is wrong, agent instantiation will fail at runtime with no static type checking to catch it.

all-hands-bot · 2026-04-15T14:46:30Z

+- `path`: the file path
+- `line`: the diff line number (use the NEW file line number)
+- `severity`: one of `critical`, `major`, `minor`, `nit`
+- `body`: a concise description of the issue with a suggested fix when possible


🟠 Important: The FILE_REVIEWER_SKILL lacks a concrete JSON schema example. The coordinator will parse LLM-generated JSON with no validation.

Suggest adding an explicit example to reduce parsing ambiguity:

Suggested change

- `body`: a concise description of the issue with a suggested fix when possible

You are a **file-level code reviewer**. You will receive a diff for one or more

files from a pull request together with PR metadata.

Review style: {review_style_description}

For each issue you find, return a JSON object with:

- `path`: the file path

- `line`: the diff line number (use the NEW file line number)

- `severity`: one of `critical`, `major`, `minor`, `nit`

- `body`: a concise description of the issue with a suggested fix when possible

Return your findings as a JSON array. If you find no issues, return `[]`.

**Example output:**

```json

[

{"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},

{"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}

]

Do NOT post anything to the GitHub API — the coordinator agent will handle that.

This addresses the fragility concern raised in the previous review.

all-hands-bot · 2026-04-15T14:46:31Z

+1. **Delegate file reviews** — for each changed file (or small group of
+   closely related files), call the task tool with:
+   - `subagent_type`: `"file_reviewer"`
+   - `prompt`: the diff chunk for the file(s), together with the PR context


🟡 Suggestion: The coordinator has complex responsibilities with no explicit error handling guidance. Consider adding a step for handling malformed sub-agent responses:

Suggested change

- `prompt`: the diff chunk for the file(s), together with the PR context

You have access to the **task** tool (TaskToolSet). Follow these steps:

1. **Delegate file reviews** — for each changed file (or small group of

closely related files), call the task tool with:

- `subagent_type`: `"file_reviewer"`

- `prompt`: the diff chunk for the file(s), together with the PR context

(title, description, base/head branch). Ask it to return a structured

list of findings with severity, file path, line number, and a short

description.

- `description`: a short label like `"Review src/utils.py"`

2. **Collect results** — each task tool call returns the sub-agent's findings.

If a sub-agent returns malformed JSON or fails, log the error and continue

with other files. Merge all valid findings together. De-duplicate and drop

low-signal noise.

3. **Post the review** — use the GitHub API (as described by /github-pr-review)

to submit a single PR review with inline comments on the relevant lines.

Keep the top-level review body brief.

This provides explicit guidance for handling failures, which is critical given the LLM-driven parsing noted in Known Limitations.

all-hands-bot · 2026-04-15T14:46:31Z

+    assert "Analyze the changes and post your review" in prompt
+
+
+def test_format_prompt_uses_sub_agent_prompt_when_enabled():


🟡 Suggestion: Current tests only verify prompt formatting. For an experimental feature, consider adding at least one integration smoke test that:

Verifies _register_sub_agents() completes without errors

Confirms TaskToolSet is added to agent tools when use_sub_agents=True

Validates the _create_file_reviewer_agent() factory returns a valid Agent

This was flagged in the previous review and is still unaddressed. The Known Limitations section acknowledges this, but a basic smoke test would catch API signature mismatches before deployment.

all-hands-bot · 2026-04-15T14:46:31Z

+The `use-sub-agents` feature is **experimental** and has the following known constraints:
+
+- **LLM-driven JSON parsing**: The coordinator agent relies on the LLM to parse and merge JSON responses from sub-agents. There is no code-level validation of sub-agent output, so malformed responses may cause incomplete reviews.
+- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.


🟢 Nit: Typo - "analyse" should be "analyze" for consistency with US English used elsewhere in the codebase.

Suggested change

- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.

- **Sub-agents have no tools**: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API.

- Add JSON schema example to FILE_REVIEWER_SKILL for less ambiguous output - Add error handling guidance to coordinator prompt for malformed sub-agent responses - Fix typo: 'analyse' -> 'analyze' in README - Add smoke tests for _register_sub_agents() and _create_file_reviewer_agent() - Improve test stubs to accept keyword arguments (Skill, Agent, etc.) Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-15T14:50:39Z

Addressed second round of review feedback in 199020e:

🟠 Important — factory signature: Confirmed register_agent expects factory_func: Callable[[LLM], Agent], which matches our _create_file_reviewer_agent(llm: LLM) -> Agent. Added smoke tests to verify.
🟠 Important — JSON schema: Added an explicit JSON example to FILE_REVIEWER_SKILL to reduce parsing ambiguity.
🟡 Suggestion — error handling: Added guidance to the coordinator prompt to skip malformed sub-agent responses and note them in the review body.
🟡 Suggestion — smoke tests: Added test_register_sub_agents_completes_without_error and test_create_file_reviewer_agent_factory_is_callable.
🟢 Nit — typo: Fixed "analyse" → "analyze".
Re: DelegationVisualizer import: DelegationVisualizer is correctly located in openhands.tools.delegate — it's a tool-specific visualizer. Only the old DelegateTool class was deprecated (replaced by TaskToolSet); the visualizer is still the canonical import path.

Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

all-hands-bot

🟡 Acceptable - Solid experimental feature with good opt-in design. The critical deprecation concern from previous reviews is actually a non-issue, but there are a few minor items to address.

all-hands-bot · 2026-04-15T14:54:46Z

 from openhands.sdk.context.skills import load_project_skills
 from openhands.sdk.conversation import get_agent_final_response
 from openhands.sdk.git.utils import run_git_command
 from openhands.sdk.plugin import PluginSource


🟢 Clarification on previous review: The concern about DelegationVisualizer being deprecated appears to be unfounded. I tested the import with deprecation warnings enabled and no warning was raised. The confusion likely stems from DelegateTool being deprecated (which you correctly replaced with TaskToolSet), not DelegationVisualizer. The import from openhands.tools.delegate is valid.

Note: register_agent was correctly moved to openhands.sdk as you've done.

all-hands-bot · 2026-04-15T14:54:46Z

    return pr_diff, commit_id, review_context


+def _create_file_reviewer_agent(llm: LLM) -> Agent:


🟢 Clarification on previous review: The factory function signature is correct. I verified that register_agent expects Callable[[LLM], Agent], which matches _create_file_reviewer_agent(llm: LLM) -> Agent. This concern has been properly addressed.

all-hands-bot · 2026-04-15T14:54:46Z

+    all GitHub API interaction.
+    """
+    # review_style is read at registration time from the environment
+    review_style = os.getenv("REVIEW_STYLE", "standard").lower()


🟡 Suggestion: The deprecated REVIEW_STYLE env var is still being read here for sub-agents, even though action.yml says it's "deprecated and ignored." While this works fine (proper defaults exist), it's potentially confusing.

Consider either:

Updating the action.yml description to clarify it's only deprecated for the main reviewer, not sub-agents

Or removing this usage and hardcoding to "standard" if sub-agents should always use standard style

Not blocking, but worth clarifying the intended behavior.

all-hands-bot · 2026-04-15T14:54:47Z

+    ]
+    return Agent(
+        llm=llm,
+        tools=[],  # sub-agents only analyse; coordinator posts the review


🟢 Nit: Spelling consistency - "analyse" (British) vs "analyze" (US). The README was updated to use US English ("analyze"), so this comment should match.

Suggested change

tools=[], # sub-agents only analyse; coordinator posts the review

tools=[], # sub-agents only analyze; coordinator posts the review

all-hands-bot · 2026-04-15T14:54:47Z

+
+You have access to the **task** tool (TaskToolSet). Follow these steps:
+
+1. **Delegate file reviews** — for each changed file (or small group of


🟢 Previous concern addressed: The coordinator prompt now includes explicit error handling guidance in step 2 ("If a sub-agent returns malformed output... skip its results and note the file in the review body"). Well done.

all-hands-bot · 2026-04-15T14:54:47Z

 | `github-token` | Yes | - | GitHub token for API access |
 | `lmnr-api-key` | No | `''` | Laminar API key for observability |

+## Known Limitations: Sub-Agent Delegation


🟢 Previous concern addressed: The Known Limitations section clearly documents the experimental nature and constraints (LLM-driven JSON parsing, potential information loss, no integration tests yet, sub-agents have no tools). This transparency is excellent for an experimental feature.

- Fix spelling: 'analyse' -> 'analyze' in code comment - Clarify REVIEW_STYLE deprecation scope in sub-agent factory comment Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-15T14:57:01Z

Addressed remaining nits in 0ef1afc:

Fixed spelling: analyse → analyze in code comment
Clarified REVIEW_STYLE deprecation scope

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

all-hands-bot

🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults and clear documentation. Just one tiny spelling inconsistency to fix.

all-hands-bot · 2026-04-15T14:59:36Z

+    skill_content = get_file_reviewer_skill_content(review_style)
+
+    skills = [
+        Skill(


🟢 Nit: Spelling consistency - use US English to match the rest of the codebase (README uses "analyze").

Suggested change

Skill(

tools=[], # sub-agents only analyze; coordinator posts the review

all-hands-bot · 2026-04-15T14:59:36Z

+    """
+    # REVIEW_STYLE is deprecated for the main reviewer (styles are merged),
+    # but still used here to configure sub-agent tone. Defaults to "standard".
+    review_style = os.getenv("REVIEW_STYLE", "standard").lower()


🟢 Acceptable: Using the deprecated REVIEW_STYLE for sub-agent tone is a pragmatic choice - it's confusing from a naming perspective but functionally correct. The default is sensible and the main reviewer correctly ignores this env var per the deprecation notice.

openhands-ai · 2026-04-15T15:01:58Z

Summary

The user's request had two parts: merge from main and /verify until the PR is approved. Here's the status:

✅ Merge from main

Successfully merged 6 commits from main into feat/pr-review-sub-agent-delegation with no conflicts.

✅ /verify loop completed (3 iterations)

Round 1 — Bot requested changes (critical import bug):

Moved register_agent import from openhands.tools.delegate → openhands.sdk
Updated test stub to match corrected import path
Added "Known Limitations: Sub-Agent Delegation" section to README

Round 2 — Bot requested changes (improvements):

Added JSON schema example to FILE_REVIEWER_SKILL
Added error handling guidance to coordinator prompt
Added 2 smoke tests for sub-agent registration and factory
Fixed test stubs (Skill, Agent, etc.) to accept keyword arguments
Fixed "analyse" → "analyze" typo in README
Explained DelegationVisualizer import is correct (only DelegateTool was deprecated)

Round 3 — Bot left COMMENTED review (all 🟢):

Fixed "analyse" → "analyze" in code comment
Clarified REVIEW_STYLE deprecation scope in comment

Checklist

Merged from main
Pushed merged branch
Ran /verify loop — addressed all CHANGES_REQUESTED feedback across 3 iterations
CI: All 3 checks pass (156 tests green)
PR Review bot: Final review is COMMENTED with all 🟢 items (no more changes requested)
PR marked as ready for review

Conciseness

All changes directly address either the merge request or review feedback from the bot. No extraneous changes were introduced. The 4 commits on top of the merge are: (1) critical import fix + docs, (2) schema example + error handling + smoke tests, (3) spelling nits — all driven by specific bot review comments.

⚠️ Note

The bot never explicitly posted an APPROVED review — its final state is COMMENTED with a "🟡 Acceptable" verdict and all-green inline comments. This appears to be the best outcome achievable from the automated reviewer for this PR. Human approval may still be needed.

all-hands-bot

🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults. Previous review concerns (spelling consistency, error handling guidance, limitations documentation) have been addressed. Two minor observations below that haven't been discussed yet, but neither blocks merge.

KEY INSIGHT: Clean separation of concerns - coordinator handles API interaction, sub-agents analyze only. This constraint eliminates a whole class of potential issues.

all-hands-bot · 2026-04-15T15:03:52Z

+- **Base Branch**: {base_branch}
+- **Head Branch**: {head_branch}
+- **PR Number**: {pr_number}
+- **Commit ID**: {commit_id}


🟢 Nit: "De-duplicate and drop low-signal noise" is subjective guidance for an LLM. One model's "noise" might be a useful finding to a human. For an experimental feature this is acceptable, but consider either:

Being more specific (e.g., "drop duplicate findings for the same line")

Erring on the side of keeping findings (let humans decide what's noise)

Not a blocker given the experimental flag and documented limitations.

all-hands-bot · 2026-04-15T15:03:52Z

+    use_sub_agents = config.get("use_sub_agents", False)
+    if use_sub_agents:
+        _register_sub_agents()
+        tools.append(Tool(name=TaskToolSet.name))


🟢 Acceptable: No validation that sub-agent registration succeeded or that TaskToolSet.name is valid. If registration silently fails, the coordinator would try to use a non-existent tool.

For an experimental feature with manual workflow testing, this is acceptable. If/when this graduates from experimental, consider adding validation:

try: _register_sub_agents() tools.append(Tool(name=TaskToolSet.name)) logger.info(f"Sub-agent delegation enabled — {TaskToolSet.name} registered") except Exception as e: logger.error(f"Sub-agent registration failed: {e}") raise

xingyaoww · 2026-04-15T21:43:03Z

@OpenHands please send the prompt to the OpenHands/software-agent-SDK and update the review script there to use the sub-agent mode defined here.

Note that you should not merge the current PR. The code review skill in that repo should select this code review skill from the feature branch.

openhands-ai · 2026-04-15T21:43:18Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Point pr-review workflow at the extensions feature branch (feat/pr-review-sub-agent-delegation) and enable use-sub-agents: true. This enables the experimental sub-agent delegation mode where the main review agent acts as a coordinator, delegating per-file reviews to file_reviewer sub-agents via the TaskToolSet, then consolidating findings into a single PR review. Companion to OpenHands/extensions#164. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-15T21:46:33Z

Companion PR opened in the software-agent-SDK repo: OpenHands/software-agent-sdk#2839

Changes made:

Updated .github/workflows/pr-review-by-openhands.yml to point at OpenHands/extensions/plugins/pr-review@feat/pr-review-sub-agent-delegation (this branch)
Enabled use-sub-agents: 'true' to activate the experimental sub-agent delegation mode

The SDK workflow will now use the sub-agent coordinator prompt and file_reviewer sub-agents defined in this PR. Once this extensions PR is merged to main, the SDK PR should be updated to point back to @main.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

openhands-ai · 2026-04-15T21:47:27Z

Summary of Work

What was done

The PR comment requested two things: (1) send the prompt/review configuration to the OpenHands/software-agent-sdk repo, and (2) update the review workflow there to use the sub-agent mode from the feature branch — without merging the current PR #164.

Checklist

Created companion PR in OpenHands/software-agent-sdk: PR #2839
Updated .github/workflows/pr-review-by-openhands.yml to point at OpenHands/extensions/plugins/pr-review@feat/pr-review-sub-agent-delegation (the feature branch, not main)
Enabled use-sub-agents: 'true' in the workflow inputs to activate the experimental sub-agent delegation mode
Did NOT merge PR feat(pr-review): add sub-agent delegation for file-level reviews #164 in the extensions repo, as instructed
Commented on PR feat(pr-review): add sub-agent delegation for file-level reviews #164 with a link to the companion SDK PR for traceability

Are the changes concise?

Yes — the SDK PR contains exactly one file change (.github/workflows/pr-review-by-openhands.yml) with only two modifications:

@main → @feat/pr-review-sub-agent-delegation (points to the feature branch)
Added use-sub-agents: 'true' (enables sub-agent mode)

No extraneous changes were made. The PR description clearly notes that the extensions ref should be reverted to @main once extensions PR #164 is merged.

simonrosenberg · 2026-04-15T22:15:02Z

 - {commit_id} - The HEAD commit SHA
 - {review_context} - Previous review comments and thread resolution status
+
+When sub-agent delegation is enabled, the main agent acts as a coordinator


Could we have "smart activation" of delegation? Prompt the main agent so that it decides when to delegate vs when not to delegate. And maybe also add an override that forces the mode?

Addressed in 4f1e5da: added auto mode for use-sub-agents.

Tri-state modes:

false (default): single-agent review, no delegation

true: force delegation — coordinator + file_reviewer sub-agents

auto: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size/complexity (roughly 4+ files or 500+ changed lines → delegate, otherwise review directly)

The auto mode uses a dedicated AUTO_DELEGATION_PROMPT that gives the agent heuristics for when delegation is worthwhile vs overhead.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

VascoSch92 · 2026-04-16T05:21:30Z

+    ]
+    return Agent(
+        llm=llm,
+        tools=[],  # sub-agents only analyze; coordinator posts the review


is this not a limitation? Should not the subagent having access to the Terminal? Just for read operation.

Addressed in 4f1e5da: file_reviewer sub-agents now get terminal and file_editor tools.

They can use cat, grep, and file_editor view to inspect surrounding code context in the PR repo — so they're not limited to just the diff snippet. The coordinator still handles all GitHub API interaction.

Updated both the agent factory (_create_file_reviewer_agent) and the FILE_REVIEWER_SKILL prompt to document the available tools.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

Address two review comments on PR #164: 1. Smart activation (simonrosenberg): Add 'auto' mode for use-sub-agents. In auto mode the agent gets the TaskToolSet but decides at runtime whether to delegate based on diff size/complexity, vs reviewing directly for small PRs. 'true' forces delegation, 'false' disables it. 2. Sub-agent terminal access (VascoSch92): Give file_reviewer sub-agents terminal and file_editor tools so they can inspect surrounding code context (cat, grep, view) instead of relying only on the diff snippet. The coordinator still handles all GitHub API interaction. Changes: - action.yml: document 'auto'/'true'/'false' tri-state for use-sub-agents - agent_script.py: add _get_sub_agents_mode() for tri-state parsing, give sub-agents terminal + file_editor tools - prompt.py: add AUTO_DELEGATION_PROMPT with delegation heuristics, update FILE_REVIEWER_SKILL to mention tool access, update format_prompt to handle str|bool use_sub_agents - tests: add 4 new tests for auto mode and tool access All 11 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>

Drop the tri-state (false/true/auto) in favour of a simple boolean: - false (default): single-agent review, no delegation - true: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size and complexity The forced-coordinator mode (SUB_AGENT_PROMPT) is removed; there is now a single DELEGATION_PROMPT that includes heuristics for when delegation is worthwhile vs overhead. Sub-agent tool access (terminal + file_editor) is kept from the previous commit so file_reviewer sub-agents can inspect surrounding code context. All 8 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>

Instead of a full duplicate DELEGATION_PROMPT, append a short _DELEGATION_SUFFIX to the base PROMPT when use_sub_agents=True. The main agent gets the same review prompt it always had, plus a section explaining the task tool is available for large diffs. Rewrote FILE_REVIEWER_SKILL with: - clear section structure (task, tools, review style, output format) - explicit JSON schema table with field types and descriptions - severity guide (critical/major/minor/nit) - example output and empty-array case Removed redundant system_message_suffix from _create_file_reviewer_agent since the skill content now covers everything. All 8 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-20T14:12:59Z

✅ Sub-agent delegation is working — evidence from SDK PR #2890

We triggered the reviewer bot CI on software-agent-sdk#2890 (the PR that adds extensions-version: feat/pr-review-sub-agent-delegation). Here's what the workflow run logs show:

1. Extensions checked out from the feature branch ✅

Download action repository 'OpenHands/extensions@feat/pr-review-sub-agent-delegation' (SHA:722ea61131957d6062969ebc15999cc7d9452267)

extensions-version: feat/pr-review-sub-agent-delegation
USE_SUB_AGENTS: true

2. Sub-agent code path activated ✅

{"message": "Reviewing PR #2890: fix(ci): pass extensions-version so sub-agent review script is used"}
{"message": "Sub-agent delegation: True"}
{"message": "Sub-agent delegation enabled — TaskToolSet added"}

Compare with the previous run on PR #2886 (before the extensions-version fix): USE_SUB_AGENTS=true was set but none of these log lines appeared because the script from main had no sub-agent code.

3. Smart activation: agent reviewed directly (correct for small diff) ✅

PR #2890 is a 1-file, 3-line change — well below the delegation threshold (4+ files / 500+ lines). The agent correctly chose to review it directly instead of spawning sub-agents. The review was posted in 1m44s:

all-hands-bot — APPROVED
"LGTM! Clean fix aligning the script checkout with the action branch. The comments clearly explain the temporary workaround until extensions#164 merges to main."

Summary

Check	Before (`extensions-version: main`)	After (`extensions-version: feat/...`)
`USE_SUB_AGENTS=true` env set	✅	✅
"Sub-agent delegation enabled" log	❌	✅
TaskToolSet added to agent tools	❌	✅
Review posted	✅ (single-agent only)	✅ (smart activation)

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

Review styles (standard/roasted) were already merged into a single unified style for the main agent. The sub-agent still had the old style_descriptions dict and REVIEW_STYLE env var dispatch. - Bake unified review description directly into FILE_REVIEWER_SKILL - Simplify get_file_reviewer_skill_content() to take no arguments - Remove REVIEW_STYLE env var usage from _create_file_reviewer_agent - Merge two style-specific tests into one unified test All 7 prompt tests pass. Co-authored-by: openhands <openhands@all-hands.dev>

…sh tool - Delete get_file_reviewer_skill_content() — callers now import the FILE_REVIEWER_SKILL constant directly - Sub-agent returns its JSON findings via the finish tool instead of a vague 'return ONLY the JSON array' instruction Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable design with one critical documentation bug that must be fixed. The experimental sub-agent delegation feature is well-structured with solid opt-in defaults and clear separation of concerns. Most previous review concerns have been addressed.

all-hands-bot · 2026-04-20T16:37:58Z

Review Summary

Taste Rating: 🟡 Acceptable - Well-designed experimental feature with one documentation bug that must be fixed before merge.

What Works Well

✅ Clean Architecture:

Coordinator handles GitHub API interaction; sub-agents only analyze
This separation eliminates a whole class of potential issues (no risk of sub-agents posting duplicate reviews or conflicting with coordinator)

✅ Smart Opt-In Design:

Defaults to false (zero impact on existing users)
When enabled, the LLM decides whether to delegate based on diff size/complexity (4+ files or 500+ lines suggested)
No forced tri-state mode complexity

✅ Sub-Agents Have Context Access:

File reviewers get terminal and file_editor tools
Can inspect surrounding code via cat, grep, or file_editor view
Not limited to just the diff snippet (addresses previous reviewer concern)

✅ Clear Output Schema:

FILE_REVIEWER_SKILL documents exact JSON format with example
Severity guide (critical/major/minor/nit) is explicit
Sub-agents instructed to call finish tool with results

✅ Previous Concerns Addressed:

Spelling consistency fixed (US English "analyze")
Error handling guidance added to coordinator prompt (skip malformed sub-agent output)
Known Limitations section added to README
Test coverage for prompt formatting

Critical Issue

🔴 Documentation Bug (README.md:159):

The Known Limitations section states: "Sub-agents have no tools"

This is factually incorrect. The code in _create_file_reviewer_agent() clearly shows:

return Agent(
    llm=llm,
    tools=[
        Tool(name="terminal"),
        Tool(name="file_editor"),
    ],
    agent_context=AgentContext(skills=skills),
)

And the FILE_REVIEWER_SKILL explicitly documents:

"You have terminal and file_editor so you can inspect the full source files for surrounding context — use cat, grep, or file_editor view when the diff alone is not enough to judge an issue."

Must fix: Update the README to reflect that sub-agents have limited (read-only) tools, not zero tools.

[RISK ASSESSMENT]

⚠️ Risk Assessment: 🟡 MEDIUM

Risk Level Justification:

Experimental opt-in feature with no behavior change when disabled (default)
No integration tests yet — validation requires manual workflow testing
LLM-driven JSON parsing — coordinator relies on LLM to parse/merge sub-agent responses with no code-level validation (documented as known limitation)
Potential information loss during finding consolidation (also documented)

Mitigating Factors:

Defaults to false (no impact on existing users)
Clear documentation of limitations
Clean separation of concerns (coordinator = GitHub API, sub-agents = analysis only)
Unit test coverage for prompt formatting
Uses established SDK APIs (TaskToolSet, register_agent, DelegationVisualizer)

Recommendation:
✅ Safe to merge after fixing the README documentation bug. This is an experimental feature that users must explicitly enable, and the limitations are clearly documented. The architecture is sound and previous review concerns have been addressed.

KEY INSIGHT

The coordinator/sub-agent separation is the right architectural choice: by giving sub-agents read-only tools (terminal, file_editor) for context inspection while keeping all GitHub API interaction in the coordinator, you eliminate coordination complexity and potential race conditions. The documentation just needs to accurately reflect this design.

This review was created by an AI agent (OpenHands) on behalf of a user.

xingyaoww · 2026-04-20T17:26:47Z

@OpenHands /iterate until the pr-review bot is happy

openhands-ai · 2026-04-20T17:27:00Z

I'm on it! xingyaoww can track my progress at all-hands.dev

Sub-agents DO have terminal and file_editor tools for reading source files and context — the README incorrectly stated they had no tools. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww · 2026-04-20T17:29:32Z

Addressed the critical documentation bug in 2b1fcf3: corrected the Known Limitation to accurately state that sub-agents have terminal and file_editor tools for read-only context inspection, while the coordinator handles all GitHub API interaction. Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

all-hands-bot

🟡 Acceptable - Well-designed experimental feature addressing agent context limits on large PRs. Previous critical documentation bug has been resolved. The opt-in default and clear documentation of limitations make this safe to merge.

KEY INSIGHT: Smart design choice to use LLM-driven delegation decision rather than hardcoded heuristics—aligns well with the experimental nature and allows the model to make context-aware choices.

[RISK ASSESSMENT]
⚠️ Risk: 🟡 MEDIUM

Factors considered:

✅ Opt-in experimental feature (no impact on default behavior)
✅ Clean separation: coordinator handles API, sub-agents analyze only
✅ Previous critical issue (documentation) was fixed in 2b1fcf3
⚠️ LLM-driven JSON parsing without validation (documented limitation)
⚠️ Limited integration test coverage (acceptable for experimental)
✅ Transparent documentation of constraints

The medium risk is acceptable for an opt-in experimental feature with clear documentation. No auto-merge blockers.

openhands-ai · 2026-04-20T17:34:26Z

Summary

The /iterate request has been fully addressed. Here's what was done:

Checklist

Identified the blocking review feedback: The all-hands-bot had requested changes with one 🔴 Critical item — the README's Known Limitations section incorrectly stated "Sub-agents have no tools" when the code actually gives them terminal and file_editor.
Fixed the documentation bug: Updated plugins/pr-review/README.md line 159 to accurately describe sub-agents as having "read-only tools" (terminal and file_editor) for source inspection, while only the coordinator handles GitHub API interaction.
Updated the PR description: The PR body also contained the outdated claim ("sub-agents have no tools (they only analyse)") — corrected it to match.
Verified all tests pass: 155/155 tests passing, sync check clean.
Committed, pushed, and re-requested review: Pushed 2b1fcf3, resolved the critical thread, and re-requested review from all-hands-bot.
Bot approved: The review bot returned APPROVED with "safe to merge" on the new SHA.
CI all green: 4/4 checks passing (test, sync-extensions, validate-claude-code, pr-review).

Conciseness

The change is minimal — a single line edit in README.md and a PR body update. No extraneous changes were made.

Final State

PR #164 is APPROVED, MERGEABLE, and all CI checks pass on SHA 2b1fcf3.

xingyaoww mentioned this pull request Apr 13, 2026

experiment: try sub-agent delegation for PR reviews OpenHands/software-agent-sdk#2809

Closed

VascoSch92 reviewed Apr 13, 2026

View reviewed changes

openhands-agent added 2 commits April 13, 2026 14:08

Merge remote-tracking branch 'origin/main' into feat/pr-review-sub-ag…

eb1b667

…ent-delegation

xingyaoww requested a review from all-hands-bot April 15, 2026 14:36

all-hands-bot requested changes Apr 15, 2026

View reviewed changes

xingyaoww requested a review from all-hands-bot April 15, 2026 14:44

all-hands-bot requested changes Apr 15, 2026

View reviewed changes

xingyaoww requested a review from all-hands-bot April 15, 2026 14:50

all-hands-bot reviewed Apr 15, 2026

View reviewed changes

fix: address nits from third review round

0ef1afc

- Fix spelling: 'analyse' -> 'analyze' in code comment - Clarify REVIEW_STYLE deprecation scope in sub-agent factory comment Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww requested a review from all-hands-bot April 15, 2026 14:57

all-hands-bot reviewed Apr 15, 2026

View reviewed changes

xingyaoww marked this pull request as ready for review April 15, 2026 15:00

all-hands-bot approved these changes Apr 15, 2026

View reviewed changes

xingyaoww mentioned this pull request Apr 15, 2026

feat(pr-review): enable sub-agent delegation for file-level reviews OpenHands/software-agent-sdk#2839

Merged

simonrosenberg reviewed Apr 15, 2026

View reviewed changes

VascoSch92 reviewed Apr 16, 2026

View reviewed changes

xingyaoww mentioned this pull request Apr 20, 2026

fix(ci): pass extensions-version so sub-agent review script is used OpenHands/software-agent-sdk#2890

Merged

simonrosenberg approved these changes Apr 20, 2026

View reviewed changes

openhands-agent added 2 commits April 20, 2026 13:58

openhands-agent added 2 commits April 20, 2026 14:17

xingyaoww requested a review from all-hands-bot April 20, 2026 16:35

all-hands-bot requested changes Apr 20, 2026

View reviewed changes

Comment thread plugins/pr-review/README.md Outdated

fix(pr-review): correct Known Limitation about sub-agent tools

2b1fcf3

Sub-agents DO have terminal and file_editor tools for reading source files and context — the README incorrectly stated they had no tools. Co-authored-by: openhands <openhands@all-hands.dev>

xingyaoww requested a review from all-hands-bot April 20, 2026 17:29

all-hands-bot approved these changes Apr 20, 2026

View reviewed changes


		You have access to the task tool (TaskToolSet). Follow these steps:

		1. Delegate file reviews — for each changed file (or small group of

		assert "Analyze the changes and post your review" in prompt


		def test_format_prompt_uses_sub_agent_prompt_when_enabled():

-- `body`: a concise description of the issue with a suggested fix when possible
+You are a **file-level code reviewer**. You will receive a diff for one or more
+files from a pull request together with PR metadata.
+Review style: {review_style_description}
+For each issue you find, return a JSON object with:
+- `path`: the file path
+- `line`: the diff line number (use the NEW file line number)
+- `severity`: one of `critical`, `major`, `minor`, `nit`
+- `body`: a concise description of the issue with a suggested fix when possible
+Return your findings as a JSON array. If you find no issues, return `[]`.
+**Example output:**
+```json
+[
+  {"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},
+  {"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}
+]

	- Potential information loss during consolidation: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
	- Sub-agents have no tools: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API.

		return pr_diff, commit_id, review_context


		def _create_file_reviewer_agent(llm: LLM) -> Agent:

	tools=[], # sub-agents only analyse; coordinator posts the review
	tools=[], # sub-agents only analyze; coordinator posts the review

	Skill(
	tools=[], # sub-agents only analyze; coordinator posts the review

Conversation

xingyaoww commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary

How to Test

Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xingyaoww commented Apr 15, 2026

Uh oh!

openhands-ai bot commented Apr 15, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xingyaoww commented Apr 15, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xingyaoww commented Apr 15, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xingyaoww commented Apr 15, 2026

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot commented Apr 15, 2026

Summary

✅ Merge from main

✅ /verify loop completed (3 iterations)

Checklist

Conciseness

⚠️ Note

Uh oh!

xingyaoww commented Apr 13, 2026 •

edited

Loading