OpenHands · xingyaoww · Apr 13, 2026 · Apr 13, 2026 · Apr 13, 2026 · Apr 15, 2026
diff --git a/plugins/pr-review/README.md b/plugins/pr-review/README.md
@@ -24,6 +24,7 @@ Then configure the required secrets (see [Installation](#installation) below).
 - **A/B Testing**: Support for testing multiple LLM models
 - **Review Context Awareness**: Considers previous reviews and unresolved threads
 - **Evidence Enforcement**: Optional check that PR descriptions include concrete end-to-end proof the code works, not just test output
+- **Sub-Agent Delegation** *(Experimental)*: Split large PR reviews across multiple sub-agents, one per file, then consolidate findings (see [Known Limitations](#known-limitations-sub-agent-delegation))
 - **Observability**: Optional Laminar integration for tracing and evaluation
 
 ## Plugin Contents
@@ -141,12 +142,24 @@ PR reviews are automatically triggered when:
 | `llm-base-url` | No | `''` | Custom LLM endpoint URL |
 | `review-style` | No | `roasted` | **[DEPRECATED]** Previously chose between `standard` and `roasted` review styles. Now ignored — the styles have been merged into a single unified skill. |
 | `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. |
+| `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to `file_reviewer` sub-agents via the SDK TaskToolSet, then consolidates findings into a single PR review. Useful for large PRs with many changed files. |
 | `extensions-repo` | No | `OpenHands/extensions` | Extensions repository |
 | `extensions-version` | No | `main` | Git ref (tag, branch, or SHA) |
 | `llm-api-key` | Yes | - | LLM API key |
 | `github-token` | Yes | - | GitHub token for API access |
 | `lmnr-api-key` | No | `''` | Laminar API key for observability |
 
+## Known Limitations: Sub-Agent Delegation
+
+The `use-sub-agents` feature is **experimental** and has the following known constraints:
+
+- **LLM-driven JSON parsing**: The coordinator agent relies on the LLM to parse and merge JSON responses from sub-agents. There is no code-level validation of sub-agent output, so malformed responses may cause incomplete reviews.
+- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
+- **Sub-agents have no tools**: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API.
- **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues.
+- **Sub-agents have no tools**: File reviewer sub-agents analyze the diff in their context window only — they cannot run commands or query the GitHub API.
+- **No integration tests yet**: Current test coverage verifies prompt formatting only. End-to-end validation of the delegation flow requires manual workflow testing.
+- **Sub-agents have no tools**: File reviewer sub-agents analyse the diff in their context window only — they cannot run commands or query the GitHub API.
+
+These limitations are acceptable for an opt-in experimental feature and will be addressed as the feature matures.
+
 ## A/B Testing Multiple Models
 
 Test different LLM models by providing a comma-separated list:

diff --git a/plugins/pr-review/action.yml b/plugins/pr-review/action.yml
@@ -27,6 +27,10 @@ inputs:
         description: "When true, require the reviewer to check the PR description for an Evidence section proving the code works end-to-end (screenshots/videos for frontend changes; commands and runtime output for backend, CLI, or script changes; conversation link when agent-generated). Test output alone does not count."
         required: false
         default: 'false'
+    use-sub-agents:
+        description: "When true, enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to file_reviewer sub-agents via the TaskToolSet, then consolidates findings into a single PR review. Experimental."
+        required: false
+        default: 'false'
     extensions-repo:
         description: GitHub repository for extensions (owner/repo)
         required: false
@@ -125,6 +129,7 @@ runs:
               LLM_BASE_URL: ${{ inputs.llm-base-url }}
               REVIEW_STYLE: ${{ inputs.review-style }}
               REQUIRE_EVIDENCE: ${{ inputs.require-evidence }}
+              USE_SUB_AGENTS: ${{ inputs.use-sub-agents }}
               LLM_API_KEY: ${{ inputs.llm-api-key }}
               GITHUB_TOKEN: ${{ inputs.github-token }}
               LMNR_PROJECT_API_KEY: ${{ inputs.lmnr-api-key }}

diff --git a/plugins/pr-review/scripts/agent_script.py b/plugins/pr-review/scripts/agent_script.py
@@ -32,6 +32,11 @@
     REPO_NAME: Repository name in format owner/repo (required)
     REQUIRE_EVIDENCE: Whether to require PR description evidence showing the code
         works ('true'/'false', default: 'false')
+    USE_SUB_AGENTS: Enable sub-agent delegation for file-level reviews
+        ('true'/'false', default: 'false'). When enabled, the main agent acts
+        as a coordinator that delegates per-file review work to
+        file_reviewer sub-agents via the TaskToolSet, then consolidates
+        findings into a single GitHub PR review.
 
 For setup instructions, usage examples, and GitHub Actions integration,
 see README.md in this directory.
@@ -50,18 +55,29 @@
 from typing import Any
 
 from lmnr import Laminar
-from openhands.sdk import LLM, Agent, AgentContext, Conversation, get_logger
+from openhands.sdk import (
+    LLM,
+    Agent,
+    AgentContext,
+    Conversation,
+    Tool,
+    get_logger,
+    register_agent,
+)
+from openhands.sdk.context import Skill
 from openhands.sdk.context.skills import load_project_skills
 from openhands.sdk.conversation import get_agent_final_response
 from openhands.sdk.git.utils import run_git_command
 from openhands.sdk.plugin import PluginSource
+from openhands.tools.delegate import DelegationVisualizer
 from openhands.tools.preset.default import get_default_condenser, get_default_tools
+from openhands.tools.task import TaskToolSet
 
 # Add the script directory to Python path so we can import prompt.py
 script_dir = Path(__file__).parent
 sys.path.insert(0, str(script_dir))
 
-from prompt import format_prompt  # noqa: E402
+from prompt import format_prompt, get_file_reviewer_skill_content  # noqa: E402
 
 logger = get_logger(__name__)
 
@@ -728,6 +744,7 @@ def validate_environment() -> dict[str, Any]:
         "model": os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"),
         "base_url": os.getenv("LLM_BASE_URL"),
         "require_evidence": _get_bool_env("REQUIRE_EVIDENCE"),
+        "use_sub_agents": _get_bool_env("USE_SUB_AGENTS"),
         "pr_info": {
             "number": os.getenv("PR_NUMBER"),
             "title": os.getenv("PR_TITLE"),
@@ -763,6 +780,53 @@ def fetch_pr_context(pr_number: str) -> tuple[str, str, str]:
     return pr_diff, commit_id, review_context
 
 
+def _create_file_reviewer_agent(llm: LLM) -> Agent:
+    """Factory for file_reviewer sub-agents used during delegation.
+
+    Each sub-agent receives a skill that defines its review persona and
+    expected output format.  It has no tools — the coordinator handles
+    all GitHub API interaction.
+    """
+    # review_style is read at registration time from the environment
+    review_style = os.getenv("REVIEW_STYLE", "standard").lower()
+    skill_content = get_file_reviewer_skill_content(review_style)
+
+    skills = [
+        Skill(
-        Skill(
+        tools=[],  # sub-agents only analyze; coordinator posts the review
-        Skill(
+        tools=[],  # sub-agents only analyze; coordinator posts the review
+            name="file_review_instructions",
+            content=skill_content,
+            trigger=None,
+        ),
+    ]
+    return Agent(
+        llm=llm,
+        tools=[],  # sub-agents only analyse; coordinator posts the review
-        tools=[],  # sub-agents only analyse; coordinator posts the review
+        tools=[],  # sub-agents only analyze; coordinator posts the review
-        tools=[],  # sub-agents only analyse; coordinator posts the review
+        tools=[],  # sub-agents only analyze; coordinator posts the review
+        agent_context=AgentContext(
+            skills=skills,
+            system_message_suffix=(
+                "You are a file-level code reviewer sub-agent. "
+                "Return findings as a JSON array. Do NOT call the GitHub API."
+            ),
+        ),
+    )
+
+
+def _register_sub_agents() -> None:
+    """Register the file_reviewer agent type.
+
+    TaskToolSet auto-registers on import, so no explicit
+    ``register_tool()`` call is needed.
+    """
+    register_agent(
+        name="file_reviewer",
+        factory_func=_create_file_reviewer_agent,
+        description=(
+            "Reviews one or more files from a PR diff and returns structured "
+            "findings as a JSON array."
+        ),
+    )
+
+
 def create_conversation(
     config: dict[str, Any],
     secrets: dict[str, str],
@@ -773,6 +837,9 @@ def create_conversation(
     handles wiring skills, MCP config, and hooks automatically.
     Project-specific skills from the workspace are loaded separately.
 
+    When ``config["use_sub_agents"]`` is True the coordinator agent is
+    given the TaskToolSet so it can delegate to file_reviewer sub-agents.
+
     Args:
         config: Configuration dictionary from validate_environment()
         secrets: Secrets to mask in output
@@ -804,9 +871,17 @@ def create_conversation(
         skills=project_skills,
     )
 
+    tools = get_default_tools(enable_browser=False)
+
+    use_sub_agents = config.get("use_sub_agents", False)
+    if use_sub_agents:
+        _register_sub_agents()
+        tools.append(Tool(name=TaskToolSet.name))
+        logger.info("Sub-agent delegation enabled — TaskToolSet added")
+
     agent = Agent(
         llm=llm,
-        tools=get_default_tools(enable_browser=False),
+        tools=tools,
         agent_context=agent_context,
         system_prompt_kwargs={"cli_mode": True},
         condenser=get_default_condenser(
@@ -816,12 +891,18 @@ def create_conversation(
 
     # The plugin directory is the parent of the scripts/ directory
     plugin_dir = script_dir.parent  # plugins/pr-review/
-    return Conversation(
-        agent=agent,
-        workspace=cwd,
-        secrets=secrets,
-        plugins=[PluginSource(source=str(plugin_dir))],
-    )
+    conversation_kwargs: dict[str, Any] = {
+        "agent": agent,
+        "workspace": cwd,
+        "secrets": secrets,
+        "plugins": [PluginSource(source=str(plugin_dir))],
+    }
+    if use_sub_agents:
+        conversation_kwargs["visualizer"] = DelegationVisualizer(
+            name="PR Review Coordinator"
+        )
+
+    return Conversation(**conversation_kwargs)
 
 
 def run_review(
@@ -930,9 +1011,11 @@ def main():
     config = validate_environment()
     pr_info = config["pr_info"]
     require_evidence = config["require_evidence"]
+    use_sub_agents = config["use_sub_agents"]
 
     logger.info(f"Reviewing PR #{pr_info['number']}: {pr_info['title']}")
     logger.info(f"Require PR evidence: {require_evidence}")
+    logger.info(f"Sub-agent delegation: {use_sub_agents}")
 
     try:
         pr_diff, commit_id, review_context = fetch_pr_context(pr_info["number"])
@@ -952,6 +1035,7 @@ def main():
             diff=pr_diff,
             review_context=review_context,
             require_evidence=require_evidence,
+            use_sub_agents=use_sub_agents,
         )
 
         secrets = {}

diff --git a/plugins/pr-review/scripts/prompt.py b/plugins/pr-review/scripts/prompt.py
@@ -11,6 +11,11 @@
 - {pr_number} - The PR number
 - {commit_id} - The HEAD commit SHA
 - {review_context} - Previous review comments and thread resolution status
+
+When sub-agent delegation is enabled, the main agent acts as a coordinator
+that splits the diff by file and delegates individual file reviews to
+sub-agents via the TaskToolSet, then consolidates results and posts the
+final review.
 """
 
 # Template for when there is review context available
@@ -75,6 +80,76 @@
 Analyze the changes and post your review using the GitHub API.
 """
 
+# Prompt for the main coordinator agent when sub-agent delegation is enabled.
+# The coordinator splits the diff into per-file chunks and delegates each
+# to a "file_reviewer" sub-agent via the TaskToolSet, then consolidates
+# and posts the review.
+SUB_AGENT_PROMPT = """{skill_trigger}
+/github-pr-review
+
+You are a **review coordinator**. Your job is to delegate the actual file-level
+review work to sub-agents and then consolidate their findings into a single
+GitHub PR review.
+
+## Pull Request Information
+
+- **Title**: {title}
+- **Description**: {body}
+- **Repository**: {repo_name}
+- **Base Branch**: {base_branch}
+- **Head Branch**: {head_branch}
+- **PR Number**: {pr_number}
+- **Commit ID**: {commit_id}
+
+{review_context_section}{evidence_requirements_section}
+
+## Instructions
+
+You have access to the **task** tool (TaskToolSet). Follow these steps:
+
+1. **Delegate file reviews** — for each changed file (or small group of
+   closely related files), call the task tool with:
+   - `subagent_type`: `"file_reviewer"`
+   - `prompt`: the diff chunk for the file(s), together with the PR context
-   - `prompt`: the diff chunk for the file(s), together with the PR context
+You have access to the **task** tool (TaskToolSet). Follow these steps:
+
+1. **Delegate file reviews** — for each changed file (or small group of
+   closely related files), call the task tool with:
+   - `subagent_type`: `"file_reviewer"`
+   - `prompt`: the diff chunk for the file(s), together with the PR context
+     (title, description, base/head branch). Ask it to return a structured
+     list of findings with severity, file path, line number, and a short
+     description.
+   - `description`: a short label like `"Review src/utils.py"`
+
+2. **Collect results** — each task tool call returns the sub-agent's findings.
+   If a sub-agent returns malformed JSON or fails, log the error and continue
+   with other files. Merge all valid findings together. De-duplicate and drop
+   low-signal noise.
+
+3. **Post the review** — use the GitHub API (as described by /github-pr-review)
+   to submit a single PR review with inline comments on the relevant lines.
+   Keep the top-level review body brief.
-   - `prompt`: the diff chunk for the file(s), together with the PR context
+You have access to the **task** tool (TaskToolSet). Follow these steps:
+
+1. **Delegate file reviews** — for each changed file (or small group of
+   closely related files), call the task tool with:
+   - `subagent_type`: `"file_reviewer"`
+   - `prompt`: the diff chunk for the file(s), together with the PR context
+     (title, description, base/head branch). Ask it to return a structured
+     list of findings with severity, file path, line number, and a short
+     description.
+   - `description`: a short label like `"Review src/utils.py"`
+
+2. **Collect results** — each task tool call returns the sub-agent's findings.
+   If a sub-agent returns malformed JSON or fails, log the error and continue
+   with other files. Merge all valid findings together. De-duplicate and drop
+   low-signal noise.
+
+3. **Post the review** — use the GitHub API (as described by /github-pr-review)
+   to submit a single PR review with inline comments on the relevant lines.
+   Keep the top-level review body brief.
+     (title, description, base/head branch). Ask it to return a structured
+     list of findings with severity, file path, line number, and a short
+     description.
+   - `description`: a short label like `"Review src/utils.py"`
+
+2. **Collect results** — each task tool call returns the sub-agent's findings.
+   Merge them all together. De-duplicate and drop low-signal noise.
+
+3. **Post the review** — use the GitHub API (as described by /github-pr-review)
+   to submit a single PR review with inline comments on the relevant lines.
+   Keep the top-level review body brief.
+
+## Full Diff
+
+The complete diff is provided below. Split it by file when delegating.
+
+```diff
+{diff}
+```
+"""
+
+# System-level instruction injected into each file_reviewer sub-agent so it
+# knows its role, the review style, and the expected output format.
+FILE_REVIEWER_SKILL = """\
+You are a **file-level code reviewer**. You will receive a diff for one or more
+files from a pull request together with PR metadata.
+
+Review style: {review_style_description}
+
+For each issue you find, return a JSON object with:
+- `path`: the file path
+- `line`: the diff line number (use the NEW file line number)
+- `severity`: one of `critical`, `major`, `minor`, `nit`
+- `body`: a concise description of the issue with a suggested fix when possible
- `body`: a concise description of the issue with a suggested fix when possible
+You are a **file-level code reviewer**. You will receive a diff for one or more
+files from a pull request together with PR metadata.
+
+Review style: {review_style_description}
+
+For each issue you find, return a JSON object with:
+- `path`: the file path
+- `line`: the diff line number (use the NEW file line number)
+- `severity`: one of `critical`, `major`, `minor`, `nit`
+- `body`: a concise description of the issue with a suggested fix when possible
+
+Return your findings as a JSON array. If you find no issues, return `[]`.
+
+**Example output:**
+```json
+[
+  {"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},
+  {"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}
+]
- `body`: a concise description of the issue with a suggested fix when possible
+You are a **file-level code reviewer**. You will receive a diff for one or more
+files from a pull request together with PR metadata.
+
+Review style: {review_style_description}
+
+For each issue you find, return a JSON object with:
+- `path`: the file path
+- `line`: the diff line number (use the NEW file line number)
+- `severity`: one of `critical`, `major`, `minor`, `nit`
+- `body`: a concise description of the issue with a suggested fix when possible
+
+Return your findings as a JSON array. If you find no issues, return `[]`.
+
+**Example output:**
+```json
+[
+  {"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},
+  {"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}
+]
+
+Return your findings as a JSON array. If you find no issues, return `[]`.
+Do NOT post anything to the GitHub API — the coordinator agent will handle that.
+"""
+
 
 def format_prompt(
     skill_trigger: str,
@@ -88,6 +163,7 @@ def format_prompt(
     diff: str,
     review_context: str = "",
     require_evidence: bool = False,
+    use_sub_agents: bool = False,
 ) -> str:
     """Format the PR review prompt with all parameters.
 
@@ -105,6 +181,9 @@ def format_prompt(
                         the review context section is omitted from the prompt.
         require_evidence: Whether to instruct the reviewer to enforce PR description
                           evidence showing the code works.
+        use_sub_agents: When True, use the sub-agent coordinator prompt instead of
+                        the single-agent prompt. The coordinator will delegate
+                        file-level reviews to sub-agents and consolidate results.
 
     Returns:
         Formatted prompt string
@@ -121,7 +200,9 @@ def format_prompt(
         _EVIDENCE_REQUIREMENT_SECTION if require_evidence else ""
     )
 
-    return PROMPT.format(
+    template = SUB_AGENT_PROMPT if use_sub_agents else PROMPT
+
+    return template.format(
         skill_trigger=skill_trigger,
         title=title,
         body=body,
@@ -134,3 +215,26 @@ def format_prompt(
         evidence_requirements_section=evidence_requirements_section,
         diff=diff,
     )
+
+
+def get_file_reviewer_skill_content(review_style: str = "standard") -> str:
+    """Return the file_reviewer sub-agent skill content.
+
+    Args:
+        review_style: 'standard' or 'roasted'
+
+    Returns:
+        Formatted skill content string for the file_reviewer agent type
+    """
+    style_descriptions = {
+        "standard": (
+            "Balanced review covering correctness, style, readability, "
+            "and security. Be constructive."
+        ),
+        "roasted": (
+            "Linus Torvalds-style brutally honest review. Focus on data "
+            "structures, simplicity, and pragmatism. No hand-holding."
+        ),
+    }
+    description = style_descriptions.get(review_style, style_descriptions["standard"])
+    return FILE_REVIEWER_SKILL.format(review_style_description=description)