Skip to content

feat(pr-review): add sub-agent delegation for file-level reviews#164

Open
xingyaoww wants to merge 13 commits intomainfrom
feat/pr-review-sub-agent-delegation
Open

feat(pr-review): add sub-agent delegation for file-level reviews#164
xingyaoww wants to merge 13 commits intomainfrom
feat/pr-review-sub-agent-delegation

Conversation

@xingyaoww
Copy link
Copy Markdown
Contributor

@xingyaoww xingyaoww commented Apr 13, 2026

  • A human has tested these changes.

Why

Per-PR code reviews can get too large for a single agent to handle well — the agent may get lazy and only look through part of the code. By delegating file-level reviews to sub-agents, we can:

  1. Give each sub-agent focused context on a single file (or small group of related files)
  2. Let the coordinator merge findings and post a cohesive review
  3. Leverage the SDK DelegateTool for parallel file-level analysis

This is the first step toward experimenting with sub-agent-based code review.

Summary

  • Add experimental sub-agent delegation support to the pr-review plugin via USE_SUB_AGENTS env var / use-sub-agents action input (default: false)
  • When enabled, the main agent becomes a review coordinator that spawns file_reviewer sub-agents via the SDK DelegateTool, delegates per-file review work, and consolidates findings into a single PR review
  • Add SUB_AGENT_PROMPT (coordinator prompt) and FILE_REVIEWER_SKILL (sub-agent persona) in prompt.py
  • Register file_reviewer agent type and DelegateTool in agent_script.py
  • All existing single-agent behavior is preserved when the flag is not set

How to Test

  1. Unit tests — the prompt tests pass:

    python -m pytest tests/test_pr_review_prompt.py -v

    All 7 tests pass (2 existing + 5 new).

  2. Integration — to test end-to-end, enable use-sub-agents: true in a workflow pointing to this branch:

    uses: OpenHands/extensions/plugins/pr-review@feat/pr-review-sub-agent-delegation
    with:
      use-sub-agents: true

Notes

  • This is marked as experimental — the use-sub-agents input defaults to false
  • The file_reviewer sub-agents have terminal and file_editor for read-only source inspection; the coordinator handles all GitHub API interaction
  • A companion PR will be opened in OpenHands/software-agent-sdk to point its PR review workflow at this branch for testing

This PR was created by an AI assistant (OpenHands) on behalf of a user.

Add experimental support for splitting PR reviews across multiple
sub-agents using the SDK DelegateTool. When USE_SUB_AGENTS=true:

- The main agent becomes a review coordinator
- It spawns file_reviewer sub-agents (one per changed file)
- Each sub-agent reviews its file and returns structured findings
- The coordinator consolidates results and posts a single PR review

New components:
- SUB_AGENT_PROMPT: coordinator prompt template in prompt.py
- FILE_REVIEWER_SKILL: sub-agent persona/instructions
- _create_file_reviewer_agent: factory for file_reviewer agents
- _register_sub_agents: registers the agent type and DelegateTool

The feature is opt-in via the use-sub-agents action input (default: false).
All existing behavior is preserved when the flag is not set.

Co-authored-by: openhands <openhands@all-hands.dev>
xingyaoww pushed a commit to OpenHands/software-agent-sdk that referenced this pull request Apr 13, 2026
Point the pr-review workflow at the extensions sub-agent branch
(OpenHands/extensions#164) and enable use-sub-agents: true to
experiment with file-level review delegation in this repo.

Revert to @main once the experiment concludes and the extensions
PR is merged.

Co-authored-by: openhands <openhands@all-hands.dev>
from openhands.sdk.plugin import PluginSource
from openhands.sdk.subagent import register_agent
from openhands.sdk.tool import register_tool
from openhands.tools.delegate import DelegateTool, DelegationVisualizer
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DelegateTool is the deprecated one.

TaskToolSet is the current SDK tool for sub-agent delegation:
- Sequential/blocking execution (one file review at a time)
- Single 'task' tool call with prompt, subagent_type, description
- Auto-registers on import (no register_tool() needed)
- Built-in resumption via task IDs

DelegateTool is the older parallel variant (spawn + delegate).

Co-authored-by: openhands <openhands@all-hands.dev>
Add missing stubs to test_pr_review_review_context.py for:
- sdk.Tool (added to agent_script imports)
- openhands.sdk.context.Skill
- openhands.sdk.plugin.PluginSource
- openhands.tools.delegate (DelegationVisualizer, register_agent)
- openhands.tools.task (TaskToolSet)

Also fix a prompt module cache collision: clear sys.modules['prompt']
before loading agent_script.py so it picks up the correct prompt.py
from pr-review/scripts/ instead of release-notes/scripts/.

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww
Copy link
Copy Markdown
Contributor Author

@OpenHands please merge from main, and /verify until this PR is approved

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 15, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@xingyaoww xingyaoww requested a review from all-hands-bot April 15, 2026 14:36
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Addresses a real problem (agent context limits on large PRs) with an experimental opt-in feature, but the unresolved deprecation issue must be addressed before merge.

from openhands.sdk.conversation import get_agent_final_response
from openhands.sdk.git.utils import run_git_command
from openhands.sdk.plugin import PluginSource
from openhands.tools.delegate import DelegationVisualizer, register_agent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: This addresses the unresolved review thread. The previous reviewer noted that DelegateTool is deprecated, but this code still imports register_agent and DelegationVisualizer from openhands.tools.delegate.

Questions to resolve:

  1. Does TaskToolSet provide its own agent registration API that should be used instead of register_agent?
  2. Is there a TaskToolSet-native visualizer that replaces DelegationVisualizer?
  3. If these imports from the delegate module are still the correct API, please document why they're still valid despite the deprecation concern.

Without clarity on the correct API pattern, this could break when the deprecated module is eventually removed.

TaskToolSet auto-registers on import, so no explicit
``register_tool()`` call is needed.
"""
register_agent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Related to the deprecation concern - verify that register_agent from openhands.tools.delegate is the correct API for TaskToolSet-based delegation. If TaskToolSet has its own registration mechanism (e.g., TaskToolSet.register_agent()), that should be used instead.

Comment thread plugins/pr-review/scripts/prompt.py Outdated
description.
- `description`: a short label like `"Review src/utils.py"`

2. **Collect results** — each task tool call returns the sub-agent's findings.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: The coordinator relies entirely on the LLM to parse and merge JSON responses from sub-agents with no code-level validation. This could be fragile:

  • What happens if a sub-agent returns malformed JSON?
  • What if the LLM fails to properly merge findings or loses information?
  • How do we handle errors in the delegation flow?

Consider adding explicit error handling in the code layer, or at minimum, document these limitations in the README as known constraints of the experimental feature.

Comment thread plugins/pr-review/scripts/prompt.py Outdated

You have access to the **task** tool (TaskToolSet). Follow these steps:

1. **Delegate file reviews** — for each changed file (or small group of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The coordinator prompt has significant responsibilities:

  1. Parse the diff and split by file
  2. Delegate to sub-agents with correct parameters
  3. Parse JSON responses
  4. Merge and deduplicate findings
  5. Map findings to correct line numbers
  6. Post the consolidated review

For a production feature, consider whether steps 1, 3, and 4 should be handled in code rather than relying on LLM prompt engineering. For an experimental opt-in feature this is acceptable, but worth documenting as a known limitation.

Comment thread tests/test_pr_review_prompt.py Outdated
assert "Analyze the changes and post your review" in prompt


def test_format_prompt_uses_sub_agent_prompt_when_enabled():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Current tests only verify prompt formatting. For a more robust experimental feature, consider adding at least a smoke test that verifies:

  1. The agent registration completes without errors
  2. The TaskToolSet is properly added to the agent's tools
  3. Basic mocking of a delegation flow to ensure the plumbing works

This would catch integration issues earlier than waiting for manual workflow testing. However, acceptable for an initial experimental feature if documented as a testing gap.

| `llm-base-url` | No | `''` | Custom LLM endpoint URL |
| `review-style` | No | `roasted` | **[DEPRECATED]** Previously chose between `standard` and `roasted` review styles. Now ignored — the styles have been merged into a single unified skill. |
| `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. |
| `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to `file_reviewer` sub-agents via the SDK TaskToolSet, then consolidates findings into a single PR review. Useful for large PRs with many changed files. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Consider adding a Known Limitations subsection under the experimental feature description that mentions:

  • JSON parsing is LLM-driven (no code validation)
  • Requires manual testing via workflow (no integration tests yet)
  • Potential for information loss during finding consolidation

This sets appropriate expectations for users trying the experimental feature.

…docs

- Move register_agent import from openhands.tools.delegate to openhands.sdk
  where it actually lives (fixes critical review feedback)
- Update test stub to match the corrected import path
- Add 'Known Limitations: Sub-Agent Delegation' section to README
  documenting experimental constraints (LLM-driven JSON parsing,
  potential consolidation info loss, no integration tests yet)

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Addressed review feedback in 7dcf8b6:

  1. 🔴 Critical — wrong import path: Moved register_agent import from openhands.tools.delegate to openhands.sdk where it actually lives. Updated test stubs to match.
  2. 🟠 Important — register_agent API: Same fix as above — openhands.sdk.register_agent() is the correct API for TaskToolSet-based delegation.
  3. 🟠 Important — error handling: Documented as a known limitation for this experimental feature (see new README section).
  4. 🟡 Suggestion — code vs prompt: Documented as acceptable for experimental opt-in; will be addressed as the feature matures.
  5. 🟡 Suggestion — smoke tests: Documented as a testing gap in Known Limitations; acceptable for initial experimental feature.
  6. 🟢 Nit — Known Limitations: Added a dedicated Known Limitations: Sub-Agent Delegation section to README.

Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

@xingyaoww xingyaoww requested a review from all-hands-bot April 15, 2026 14:44
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid experimental feature with good opt-in design and documentation. One critical unresolved issue must be fixed: DelegationVisualizer is still imported from the deprecated module.

from openhands.sdk.context.skills import load_project_skills
from openhands.sdk.conversation import get_agent_final_response
from openhands.sdk.git.utils import run_git_command
from openhands.sdk.plugin import PluginSource
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Partially unresolved deprecation issue. While register_agent has been correctly moved to openhands.sdk, DelegationVisualizer is still imported from the deprecated openhands.tools.delegate module.

Verify:

  1. Does TaskToolSet or openhands.sdk provide an updated DelegationVisualizer API?
  2. If not, is this visualizer still compatible with TaskToolSet-based delegation?
  3. Document why using the deprecated module is acceptable if it's the only option.

The previous critical review thread specifically flagged this import as problematic.

skill_content = get_file_reviewer_skill_content(review_style)

skills = [
Skill(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Verify the factory function signature matches register_agent's expectations.

The function accepts llm: LLM but it's unclear if this is the correct signature for the factory_func parameter. Check the SDK docs or type hints for register_agent to confirm the expected signature.

If the signature is wrong, agent instantiation will fail at runtime with no static type checking to catch it.

Comment thread plugins/pr-review/scripts/prompt.py Outdated
- `path`: the file path
- `line`: the diff line number (use the NEW file line number)
- `severity`: one of `critical`, `major`, `minor`, `nit`
- `body`: a concise description of the issue with a suggested fix when possible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: The FILE_REVIEWER_SKILL lacks a concrete JSON schema example. The coordinator will parse LLM-generated JSON with no validation.

Suggest adding an explicit example to reduce parsing ambiguity:

Suggested change
- `body`: a concise description of the issue with a suggested fix when possible
You are a **file-level code reviewer**. You will receive a diff for one or more
files from a pull request together with PR metadata.
Review style: {review_style_description}
For each issue you find, return a JSON object with:
- `path`: the file path
- `line`: the diff line number (use the NEW file line number)
- `severity`: one of `critical`, `major`, `minor`, `nit`
- `body`: a concise description of the issue with a suggested fix when possible
Return your findings as a JSON array. If you find no issues, return `[]`.
**Example output:**
```json
[
{"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},
{"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}
]

Do NOT post anything to the GitHub API — the coordinator agent will handle that.


This addresses the fragility concern raised in the previous review.

Comment thread plugins/pr-review/scripts/prompt.py Outdated
1. **Delegate file reviews** — for each changed file (or small group of
closely related files), call the task tool with:
- `subagent_type`: `"file_reviewer"`
- `prompt`: the diff chunk for the file(s), together with the PR context
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The coordinator has complex responsibilities with no explicit error handling guidance. Consider adding a step for handling malformed sub-agent responses:

Suggested change
- `prompt`: the diff chunk for the file(s), together with the PR context
You have access to the **task** tool (TaskToolSet). Follow these steps:
1. **Delegate file reviews**for each changed file (or small group of
closely related files), call the task tool with:
- `subagent_type`: `"file_reviewer"`
- `prompt`: the diff chunk for the file(s), together with the PR context
(title, description, base/head branch). Ask it to return a structured
list of findings with severity, file path, line number, and a short
description.
- `description`: a short label like `"Review src/utils.py"`
2. **Collect results**each task tool call returns the sub-agent's findings.
If a sub-agent returns malformed JSON or fails, log the error and continue
with other files. Merge all valid findings together. De-duplicate and drop
low-signal noise.
3. **Post the review**use the GitHub API (as described by /github-pr-review)
to submit a single PR review with inline comments on the relevant lines.
Keep the top-level review body brief.

This provides explicit guidance for handling failures, which is critical given the LLM-driven parsing noted in Known Limitations.

Comment thread tests/test_pr_review_prompt.py Outdated
assert "Analyze the changes and post your review" in prompt


def test_format_prompt_uses_sub_agent_prompt_when_enabled():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Current tests only verify prompt formatting. For an experimental feature, consider adding at least one integration smoke test that:

  1. Verifies _register_sub_agents() completes without errors
  2. Confirms TaskToolSet is added to agent tools when use_sub_agents=True
  3. Validates the _create_file_reviewer_agent() factory returns a valid Agent

This was flagged in the previous review and is still unaddressed. The Known Limitations section acknowledges this, but a basic smoke test would catch API signature mismatches before deployment.

The `use-sub-agents` feature is **experimental** and has the following known constraints:

- **LLM-driven JSON parsing**: The coordinator agent relies on the LLM to parse and merge JSON responses from sub-agents. There is no code-level validation of sub-agent output, so malformed responses may cause incomplete reviews.
- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Typo - "analyse" should be "analyze" for consistency with US English used elsewhere in the codebase.

Suggested change
- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
- **Sub-agents have no tools**: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API.

- Add JSON schema example to FILE_REVIEWER_SKILL for less ambiguous output
- Add error handling guidance to coordinator prompt for malformed sub-agent responses
- Fix typo: 'analyse' -> 'analyze' in README
- Add smoke tests for _register_sub_agents() and _create_file_reviewer_agent()
- Improve test stubs to accept keyword arguments (Skill, Agent, etc.)

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Addressed second round of review feedback in 199020e:

  1. 🟠 Important — factory signature: Confirmed register_agent expects factory_func: Callable[[LLM], Agent], which matches our _create_file_reviewer_agent(llm: LLM) -> Agent. Added smoke tests to verify.
  2. 🟠 Important — JSON schema: Added an explicit JSON example to FILE_REVIEWER_SKILL to reduce parsing ambiguity.
  3. 🟡 Suggestion — error handling: Added guidance to the coordinator prompt to skip malformed sub-agent responses and note them in the review body.
  4. 🟡 Suggestion — smoke tests: Added test_register_sub_agents_completes_without_error and test_create_file_reviewer_agent_factory_is_callable.
  5. 🟢 Nit — typo: Fixed "analyse" → "analyze".
  6. Re: DelegationVisualizer import: DelegationVisualizer is correctly located in openhands.tools.delegate — it's a tool-specific visualizer. Only the old DelegateTool class was deprecated (replaced by TaskToolSet); the visualizer is still the canonical import path.

Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

@xingyaoww xingyaoww requested a review from all-hands-bot April 15, 2026 14:50
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Solid experimental feature with good opt-in design. The critical deprecation concern from previous reviews is actually a non-issue, but there are a few minor items to address.

from openhands.sdk.context.skills import load_project_skills
from openhands.sdk.conversation import get_agent_final_response
from openhands.sdk.git.utils import run_git_command
from openhands.sdk.plugin import PluginSource
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Clarification on previous review: The concern about DelegationVisualizer being deprecated appears to be unfounded. I tested the import with deprecation warnings enabled and no warning was raised. The confusion likely stems from DelegateTool being deprecated (which you correctly replaced with TaskToolSet), not DelegationVisualizer. The import from openhands.tools.delegate is valid.

Note: register_agent was correctly moved to openhands.sdk as you've done.

return pr_diff, commit_id, review_context


def _create_file_reviewer_agent(llm: LLM) -> Agent:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Clarification on previous review: The factory function signature is correct. I verified that register_agent expects Callable[[LLM], Agent], which matches _create_file_reviewer_agent(llm: LLM) -> Agent. This concern has been properly addressed.

all GitHub API interaction.
"""
# review_style is read at registration time from the environment
review_style = os.getenv("REVIEW_STYLE", "standard").lower()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The deprecated REVIEW_STYLE env var is still being read here for sub-agents, even though action.yml says it's "deprecated and ignored." While this works fine (proper defaults exist), it's potentially confusing.

Consider either:

  1. Updating the action.yml description to clarify it's only deprecated for the main reviewer, not sub-agents
  2. Or removing this usage and hardcoding to "standard" if sub-agents should always use standard style

Not blocking, but worth clarifying the intended behavior.

]
return Agent(
llm=llm,
tools=[], # sub-agents only analyse; coordinator posts the review
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Spelling consistency - "analyse" (British) vs "analyze" (US). The README was updated to use US English ("analyze"), so this comment should match.

Suggested change
tools=[], # sub-agents only analyse; coordinator posts the review
tools=[], # sub-agents only analyze; coordinator posts the review

Comment thread plugins/pr-review/scripts/prompt.py Outdated

You have access to the **task** tool (TaskToolSet). Follow these steps:

1. **Delegate file reviews** — for each changed file (or small group of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Previous concern addressed: The coordinator prompt now includes explicit error handling guidance in step 2 ("If a sub-agent returns malformed output... skip its results and note the file in the review body"). Well done.

| `github-token` | Yes | - | GitHub token for API access |
| `lmnr-api-key` | No | `''` | Laminar API key for observability |

## Known Limitations: Sub-Agent Delegation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Previous concern addressed: The Known Limitations section clearly documents the experimental nature and constraints (LLM-driven JSON parsing, potential information loss, no integration tests yet, sub-agents have no tools). This transparency is excellent for an experimental feature.

- Fix spelling: 'analyse' -> 'analyze' in code comment
- Clarify REVIEW_STYLE deprecation scope in sub-agent factory comment

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Addressed remaining nits in 0ef1afc:

  • Fixed spelling: analyseanalyze in code comment
  • Clarified REVIEW_STYLE deprecation scope

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

@xingyaoww xingyaoww requested a review from all-hands-bot April 15, 2026 14:57
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults and clear documentation. Just one tiny spelling inconsistency to fix.

skill_content = get_file_reviewer_skill_content(review_style)

skills = [
Skill(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Spelling consistency - use US English to match the rest of the codebase (README uses "analyze").

Suggested change
Skill(
tools=[], # sub-agents only analyze; coordinator posts the review

"""
# REVIEW_STYLE is deprecated for the main reviewer (styles are merged),
# but still used here to configure sub-agent tone. Defaults to "standard".
review_style = os.getenv("REVIEW_STYLE", "standard").lower()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Using the deprecated REVIEW_STYLE for sub-agent tone is a pragmatic choice - it's confusing from a naming perspective but functionally correct. The default is sensible and the main reviewer correctly ignores this env var per the deprecation notice.

@xingyaoww xingyaoww marked this pull request as ready for review April 15, 2026 15:00
@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 15, 2026

Summary

The user's request had two parts: merge from main and /verify until the PR is approved. Here's the status:

✅ Merge from main

  • Successfully merged 6 commits from main into feat/pr-review-sub-agent-delegation with no conflicts.

✅ /verify loop completed (3 iterations)

Round 1 — Bot requested changes (critical import bug):

  • Moved register_agent import from openhands.tools.delegateopenhands.sdk
  • Updated test stub to match corrected import path
  • Added "Known Limitations: Sub-Agent Delegation" section to README

Round 2 — Bot requested changes (improvements):

  • Added JSON schema example to FILE_REVIEWER_SKILL
  • Added error handling guidance to coordinator prompt
  • Added 2 smoke tests for sub-agent registration and factory
  • Fixed test stubs (Skill, Agent, etc.) to accept keyword arguments
  • Fixed "analyse" → "analyze" typo in README
  • Explained DelegationVisualizer import is correct (only DelegateTool was deprecated)

Round 3 — Bot left COMMENTED review (all 🟢):

  • Fixed "analyse" → "analyze" in code comment
  • Clarified REVIEW_STYLE deprecation scope in comment

Checklist

  • Merged from main
  • Pushed merged branch
  • Ran /verify loop — addressed all CHANGES_REQUESTED feedback across 3 iterations
  • CI: All 3 checks pass (156 tests green)
  • PR Review bot: Final review is COMMENTED with all 🟢 items (no more changes requested)
  • PR marked as ready for review

Conciseness

All changes directly address either the merge request or review feedback from the bot. No extraneous changes were introduced. The 4 commits on top of the merge are: (1) critical import fix + docs, (2) schema example + error handling + smoke tests, (3) spelling nits — all driven by specific bot review comments.

⚠️ Note

The bot never explicitly posted an APPROVED review — its final state is COMMENTED with a "🟡 Acceptable" verdict and all-green inline comments. This appears to be the best outcome achievable from the automated reviewer for this PR. Human approval may still be needed.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Well-designed experimental feature with solid opt-in defaults. Previous review concerns (spelling consistency, error handling guidance, limitations documentation) have been addressed. Two minor observations below that haven't been discussed yet, but neither blocks merge.

KEY INSIGHT: Clean separation of concerns - coordinator handles API interaction, sub-agents analyze only. This constraint eliminates a whole class of potential issues.

Comment thread plugins/pr-review/scripts/prompt.py Outdated
- **Base Branch**: {base_branch}
- **Head Branch**: {head_branch}
- **PR Number**: {pr_number}
- **Commit ID**: {commit_id}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: "De-duplicate and drop low-signal noise" is subjective guidance for an LLM. One model's "noise" might be a useful finding to a human. For an experimental feature this is acceptable, but consider either:

  1. Being more specific (e.g., "drop duplicate findings for the same line")
  2. Erring on the side of keeping findings (let humans decide what's noise)

Not a blocker given the experimental flag and documented limitations.

use_sub_agents = config.get("use_sub_agents", False)
if use_sub_agents:
_register_sub_agents()
tools.append(Tool(name=TaskToolSet.name))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: No validation that sub-agent registration succeeded or that TaskToolSet.name is valid. If registration silently fails, the coordinator would try to use a non-existent tool.

For an experimental feature with manual workflow testing, this is acceptable. If/when this graduates from experimental, consider adding validation:

try:
    _register_sub_agents()
    tools.append(Tool(name=TaskToolSet.name))
    logger.info(f"Sub-agent delegation enabled — {TaskToolSet.name} registered")
except Exception as e:
    logger.error(f"Sub-agent registration failed: {e}")
    raise

@xingyaoww
Copy link
Copy Markdown
Contributor Author

@OpenHands please send the prompt to the OpenHands/software-agent-SDK and update the review script there to use the sub-agent mode defined here.

Note that you should not merge the current PR. The code review skill in that repo should select this code review skill from the feature branch.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 15, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

xingyaoww pushed a commit to OpenHands/software-agent-sdk that referenced this pull request Apr 15, 2026
Point pr-review workflow at the extensions feature branch
(feat/pr-review-sub-agent-delegation) and enable use-sub-agents: true.

This enables the experimental sub-agent delegation mode where the main
review agent acts as a coordinator, delegating per-file reviews to
file_reviewer sub-agents via the TaskToolSet, then consolidating
findings into a single PR review.

Companion to OpenHands/extensions#164.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Companion PR opened in the software-agent-SDK repo: OpenHands/software-agent-sdk#2839

Changes made:

  • Updated .github/workflows/pr-review-by-openhands.yml to point at OpenHands/extensions/plugins/pr-review@feat/pr-review-sub-agent-delegation (this branch)
  • Enabled use-sub-agents: 'true' to activate the experimental sub-agent delegation mode

The SDK workflow will now use the sub-agent coordinator prompt and file_reviewer sub-agents defined in this PR. Once this extensions PR is merged to main, the SDK PR should be updated to point back to @main.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 15, 2026

Summary of Work

What was done

The PR comment requested two things: (1) send the prompt/review configuration to the OpenHands/software-agent-sdk repo, and (2) update the review workflow there to use the sub-agent mode from the feature branch — without merging the current PR #164.

Checklist

Are the changes concise?

Yes — the SDK PR contains exactly one file change (.github/workflows/pr-review-by-openhands.yml) with only two modifications:

  1. @main@feat/pr-review-sub-agent-delegation (points to the feature branch)
  2. Added use-sub-agents: 'true' (enables sub-agent mode)

No extraneous changes were made. The PR description clearly notes that the extensions ref should be reverted to @main once extensions PR #164 is merged.

Comment thread plugins/pr-review/scripts/prompt.py Outdated
- {commit_id} - The HEAD commit SHA
- {review_context} - Previous review comments and thread resolution status

When sub-agent delegation is enabled, the main agent acts as a coordinator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have "smart activation" of delegation? Prompt the main agent so that it decides when to delegate vs when not to delegate. And maybe also add an override that forces the mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4f1e5da: added auto mode for use-sub-agents.

Tri-state modes:

  • false (default): single-agent review, no delegation
  • true: force delegation — coordinator + file_reviewer sub-agents
  • auto: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size/complexity (roughly 4+ files or 500+ changed lines → delegate, otherwise review directly)

The auto mode uses a dedicated AUTO_DELEGATION_PROMPT that gives the agent heuristics for when delegation is worthwhile vs overhead.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

]
return Agent(
llm=llm,
tools=[], # sub-agents only analyze; coordinator posts the review
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not a limitation? Should not the subagent having access to the Terminal? Just for read operation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4f1e5da: file_reviewer sub-agents now get terminal and file_editor tools.

They can use cat, grep, and file_editor view to inspect surrounding code context in the PR repo — so they're not limited to just the diff snippet. The coordinator still handles all GitHub API interaction.

Updated both the agent factory (_create_file_reviewer_agent) and the FILE_REVIEWER_SKILL prompt to document the available tools.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

Address two review comments on PR #164:

1. Smart activation (simonrosenberg): Add 'auto' mode for use-sub-agents.
   In auto mode the agent gets the TaskToolSet but decides at runtime
   whether to delegate based on diff size/complexity, vs reviewing
   directly for small PRs. 'true' forces delegation, 'false' disables it.

2. Sub-agent terminal access (VascoSch92): Give file_reviewer sub-agents
   terminal and file_editor tools so they can inspect surrounding code
   context (cat, grep, view) instead of relying only on the diff snippet.
   The coordinator still handles all GitHub API interaction.

Changes:
- action.yml: document 'auto'/'true'/'false' tri-state for use-sub-agents
- agent_script.py: add _get_sub_agents_mode() for tri-state parsing,
  give sub-agents terminal + file_editor tools
- prompt.py: add AUTO_DELEGATION_PROMPT with delegation heuristics,
  update FILE_REVIEWER_SKILL to mention tool access,
  update format_prompt to handle str|bool use_sub_agents
- tests: add 4 new tests for auto mode and tool access

All 11 prompt tests pass.

Co-authored-by: openhands <openhands@all-hands.dev>
Drop the tri-state (false/true/auto) in favour of a simple boolean:
- false (default): single-agent review, no delegation
- true: smart activation — agent gets the TaskToolSet and decides at
  runtime whether to delegate based on diff size and complexity

The forced-coordinator mode (SUB_AGENT_PROMPT) is removed; there is now
a single DELEGATION_PROMPT that includes heuristics for when delegation
is worthwhile vs overhead.

Sub-agent tool access (terminal + file_editor) is kept from the previous
commit so file_reviewer sub-agents can inspect surrounding code context.

All 8 prompt tests pass.

Co-authored-by: openhands <openhands@all-hands.dev>
Instead of a full duplicate DELEGATION_PROMPT, append a short
_DELEGATION_SUFFIX to the base PROMPT when use_sub_agents=True.
The main agent gets the same review prompt it always had, plus a
section explaining the task tool is available for large diffs.

Rewrote FILE_REVIEWER_SKILL with:
- clear section structure (task, tools, review style, output format)
- explicit JSON schema table with field types and descriptions
- severity guide (critical/major/minor/nit)
- example output and empty-array case

Removed redundant system_message_suffix from _create_file_reviewer_agent
since the skill content now covers everything.

All 8 prompt tests pass.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

✅ Sub-agent delegation is working — evidence from SDK PR #2890

We triggered the reviewer bot CI on software-agent-sdk#2890 (the PR that adds extensions-version: feat/pr-review-sub-agent-delegation). Here's what the workflow run logs show:

1. Extensions checked out from the feature branch ✅

Download action repository 'OpenHands/extensions@feat/pr-review-sub-agent-delegation' (SHA:722ea61131957d6062969ebc15999cc7d9452267)
extensions-version: feat/pr-review-sub-agent-delegation
USE_SUB_AGENTS: true

2. Sub-agent code path activated ✅

{"message": "Reviewing PR #2890: fix(ci): pass extensions-version so sub-agent review script is used"}
{"message": "Sub-agent delegation: True"}
{"message": "Sub-agent delegation enabled — TaskToolSet added"}

Compare with the previous run on PR #2886 (before the extensions-version fix): USE_SUB_AGENTS=true was set but none of these log lines appeared because the script from main had no sub-agent code.

3. Smart activation: agent reviewed directly (correct for small diff) ✅

PR #2890 is a 1-file, 3-line change — well below the delegation threshold (4+ files / 500+ lines). The agent correctly chose to review it directly instead of spawning sub-agents. The review was posted in 1m44s:

all-hands-bot — APPROVED
"LGTM! Clean fix aligning the script checkout with the action branch. The comments clearly explain the temporary workaround until extensions#164 merges to main."

Summary

Check Before (extensions-version: main) After (extensions-version: feat/...)
USE_SUB_AGENTS=true env set
"Sub-agent delegation enabled" log
TaskToolSet added to agent tools
Review posted ✅ (single-agent only) ✅ (smart activation)

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

Review styles (standard/roasted) were already merged into a single
unified style for the main agent. The sub-agent still had the old
style_descriptions dict and REVIEW_STYLE env var dispatch.

- Bake unified review description directly into FILE_REVIEWER_SKILL
- Simplify get_file_reviewer_skill_content() to take no arguments
- Remove REVIEW_STYLE env var usage from _create_file_reviewer_agent
- Merge two style-specific tests into one unified test

All 7 prompt tests pass.

Co-authored-by: openhands <openhands@all-hands.dev>
…sh tool

- Delete get_file_reviewer_skill_content() — callers now import the
  FILE_REVIEWER_SKILL constant directly
- Sub-agent returns its JSON findings via the finish tool instead of
  a vague 'return ONLY the JSON array' instruction

Co-authored-by: openhands <openhands@all-hands.dev>
@xingyaoww xingyaoww requested a review from all-hands-bot April 20, 2026 16:35
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable design with one critical documentation bug that must be fixed. The experimental sub-agent delegation feature is well-structured with solid opt-in defaults and clear separation of concerns. Most previous review concerns have been addressed.

Comment thread plugins/pr-review/README.md Outdated
@all-hands-bot
Copy link
Copy Markdown
Contributor

Review Summary

Taste Rating: 🟡 Acceptable - Well-designed experimental feature with one documentation bug that must be fixed before merge.


What Works Well

✅ Clean Architecture:

  • Coordinator handles GitHub API interaction; sub-agents only analyze
  • This separation eliminates a whole class of potential issues (no risk of sub-agents posting duplicate reviews or conflicting with coordinator)

✅ Smart Opt-In Design:

  • Defaults to false (zero impact on existing users)
  • When enabled, the LLM decides whether to delegate based on diff size/complexity (4+ files or 500+ lines suggested)
  • No forced tri-state mode complexity

✅ Sub-Agents Have Context Access:

  • File reviewers get terminal and file_editor tools
  • Can inspect surrounding code via cat, grep, or file_editor view
  • Not limited to just the diff snippet (addresses previous reviewer concern)

✅ Clear Output Schema:

  • FILE_REVIEWER_SKILL documents exact JSON format with example
  • Severity guide (critical/major/minor/nit) is explicit
  • Sub-agents instructed to call finish tool with results

✅ Previous Concerns Addressed:

  • Spelling consistency fixed (US English "analyze")
  • Error handling guidance added to coordinator prompt (skip malformed sub-agent output)
  • Known Limitations section added to README
  • Test coverage for prompt formatting

Critical Issue

🔴 Documentation Bug (README.md:159):

The Known Limitations section states: "Sub-agents have no tools"

This is factually incorrect. The code in _create_file_reviewer_agent() clearly shows:

return Agent(
    llm=llm,
    tools=[
        Tool(name="terminal"),
        Tool(name="file_editor"),
    ],
    agent_context=AgentContext(skills=skills),
)

And the FILE_REVIEWER_SKILL explicitly documents:

"You have terminal and file_editor so you can inspect the full source files for surrounding context — use cat, grep, or file_editor view when the diff alone is not enough to judge an issue."

Must fix: Update the README to reflect that sub-agents have limited (read-only) tools, not zero tools.


[RISK ASSESSMENT]

⚠️ Risk Assessment: 🟡 MEDIUM

Risk Level Justification:

  • Experimental opt-in feature with no behavior change when disabled (default)
  • No integration tests yet — validation requires manual workflow testing
  • LLM-driven JSON parsing — coordinator relies on LLM to parse/merge sub-agent responses with no code-level validation (documented as known limitation)
  • Potential information loss during finding consolidation (also documented)

Mitigating Factors:

  • Defaults to false (no impact on existing users)
  • Clear documentation of limitations
  • Clean separation of concerns (coordinator = GitHub API, sub-agents = analysis only)
  • Unit test coverage for prompt formatting
  • Uses established SDK APIs (TaskToolSet, register_agent, DelegationVisualizer)

Recommendation:
Safe to merge after fixing the README documentation bug. This is an experimental feature that users must explicitly enable, and the limitations are clearly documented. The architecture is sound and previous review concerns have been addressed.


KEY INSIGHT

The coordinator/sub-agent separation is the right architectural choice: by giving sub-agents read-only tools (terminal, file_editor) for context inspection while keeping all GitHub API interaction in the coordinator, you eliminate coordination complexity and potential race conditions. The documentation just needs to accurately reflect this design.


This review was created by an AI agent (OpenHands) on behalf of a user.

@xingyaoww
Copy link
Copy Markdown
Contributor Author

@OpenHands /iterate until the pr-review bot is happy

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 20, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

Sub-agents DO have terminal and file_editor tools for reading source
files and context — the README incorrectly stated they had no tools.

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Addressed the critical documentation bug in 2b1fcf3: corrected the Known Limitation to accurately state that sub-agents have terminal and file_editor tools for read-only context inspection, while the coordinator handles all GitHub API interaction. Ready for another look.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

@xingyaoww xingyaoww requested a review from all-hands-bot April 20, 2026 17:29
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Well-designed experimental feature addressing agent context limits on large PRs. Previous critical documentation bug has been resolved. The opt-in default and clear documentation of limitations make this safe to merge.

KEY INSIGHT: Smart design choice to use LLM-driven delegation decision rather than hardcoded heuristics—aligns well with the experimental nature and allows the model to make context-aware choices.

[RISK ASSESSMENT]
⚠️ Risk: 🟡 MEDIUM

Factors considered:

  • ✅ Opt-in experimental feature (no impact on default behavior)
  • ✅ Clean separation: coordinator handles API, sub-agents analyze only
  • ✅ Previous critical issue (documentation) was fixed in 2b1fcf3
  • ⚠️ LLM-driven JSON parsing without validation (documented limitation)
  • ⚠️ Limited integration test coverage (acceptable for experimental)
  • ✅ Transparent documentation of constraints

The medium risk is acceptable for an opt-in experimental feature with clear documentation. No auto-merge blockers.

@openhands-ai
Copy link
Copy Markdown

openhands-ai bot commented Apr 20, 2026

Summary

The /iterate request has been fully addressed. Here's what was done:

Checklist

  • Identified the blocking review feedback: The all-hands-bot had requested changes with one 🔴 Critical item — the README's Known Limitations section incorrectly stated "Sub-agents have no tools" when the code actually gives them terminal and file_editor.
  • Fixed the documentation bug: Updated plugins/pr-review/README.md line 159 to accurately describe sub-agents as having "read-only tools" (terminal and file_editor) for source inspection, while only the coordinator handles GitHub API interaction.
  • Updated the PR description: The PR body also contained the outdated claim ("sub-agents have no tools (they only analyse)") — corrected it to match.
  • Verified all tests pass: 155/155 tests passing, sync check clean.
  • Committed, pushed, and re-requested review: Pushed 2b1fcf3, resolved the critical thread, and re-requested review from all-hands-bot.
  • Bot approved: The review bot returned APPROVED with "safe to merge" on the new SHA.
  • CI all green: 4/4 checks passing (test, sync-extensions, validate-claude-code, pr-review).

Conciseness

The change is minimal — a single line edit in README.md and a PR body update. No extraneous changes were made.

Final State

PR #164 is APPROVED, MERGEABLE, and all CI checks pass on SHA 2b1fcf3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants