Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions plugins/pr-review/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Then configure the required secrets (see [Installation](#installation) below).
- **A/B Testing**: Support for testing multiple LLM models
- **Review Context Awareness**: Considers previous reviews and unresolved threads
- **Evidence Enforcement**: Optional check that PR descriptions include concrete end-to-end proof the code works, not just test output
- **Sub-Agent Delegation** *(Experimental)*: Split large PR reviews across multiple sub-agents, one per file, then consolidate findings
- **Observability**: Optional Laminar integration for tracing and evaluation

## Plugin Contents
Expand Down Expand Up @@ -143,6 +144,7 @@ PR reviews are automatically triggered when:
| `llm-base-url` | No | `''` | Custom LLM endpoint URL |
| `review-style` | No | `roasted` | Review style: `standard` or `roasted` |
| `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. |
| `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that spawns `file_reviewer` sub-agents via the SDK DelegateTool, delegates per-file review work, and consolidates findings into a single PR review. Useful for large PRs with many changed files. |
| `extensions-repo` | No | `OpenHands/extensions` | Extensions repository |
| `extensions-version` | No | `main` | Git ref (tag, branch, or SHA) |
| `llm-api-key` | Yes | - | LLM API key |
Expand Down
5 changes: 5 additions & 0 deletions plugins/pr-review/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ inputs:
description: "When true, require the reviewer to check the PR description for an Evidence section proving the code works end-to-end (screenshots/videos for frontend changes; commands and runtime output for backend, CLI, or script changes; conversation link when agent-generated). Test output alone does not count."
required: false
default: 'false'
use-sub-agents:
description: "When true, enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that spawns file_reviewer sub-agents via the DelegateTool, delegates per-file review work, and consolidates findings into a single PR review. Experimental."
required: false
default: 'false'
extensions-repo:
description: GitHub repository for extensions (owner/repo)
required: false
Expand Down Expand Up @@ -125,6 +129,7 @@ runs:
LLM_BASE_URL: ${{ inputs.llm-base-url }}
REVIEW_STYLE: ${{ inputs.review-style }}
REQUIRE_EVIDENCE: ${{ inputs.require-evidence }}
USE_SUB_AGENTS: ${{ inputs.use-sub-agents }}
LLM_API_KEY: ${{ inputs.llm-api-key }}
GITHUB_TOKEN: ${{ inputs.github-token }}
LMNR_PROJECT_API_KEY: ${{ inputs.lmnr-api-key }}
Expand Down
92 changes: 83 additions & 9 deletions plugins/pr-review/scripts/agent_script.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,11 @@
REVIEW_STYLE: Review style ('standard' or 'roasted', default: 'standard')
REQUIRE_EVIDENCE: Whether to require PR description evidence showing the code
works ('true'/'false', default: 'false')
USE_SUB_AGENTS: Enable sub-agent delegation for file-level reviews
('true'/'false', default: 'false'). When enabled, the main agent acts
as a coordinator that spawns file_reviewer sub-agents via the
DelegateTool, delegates per-file review work, and consolidates
findings into a single GitHub PR review.

For setup instructions, usage examples, and GitHub Actions integration,
see README.md in this directory.
Expand All @@ -53,18 +58,22 @@
from typing import Any

from lmnr import Laminar
from openhands.sdk import LLM, Agent, AgentContext, Conversation, get_logger
from openhands.sdk import LLM, Agent, AgentContext, Conversation, Tool, get_logger
from openhands.sdk.context import Skill
from openhands.sdk.context.skills import load_project_skills
from openhands.sdk.conversation import get_agent_final_response
from openhands.sdk.git.utils import run_git_command
from openhands.sdk.plugin import PluginSource
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Critical: Partially unresolved deprecation issue. While register_agent has been correctly moved to openhands.sdk, DelegationVisualizer is still imported from the deprecated openhands.tools.delegate module.

Verify:

  1. Does TaskToolSet or openhands.sdk provide an updated DelegationVisualizer API?
  2. If not, is this visualizer still compatible with TaskToolSet-based delegation?
  3. Document why using the deprecated module is acceptable if it's the only option.

The previous critical review thread specifically flagged this import as problematic.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Clarification on previous review: The concern about DelegationVisualizer being deprecated appears to be unfounded. I tested the import with deprecation warnings enabled and no warning was raised. The confusion likely stems from DelegateTool being deprecated (which you correctly replaced with TaskToolSet), not DelegationVisualizer. The import from openhands.tools.delegate is valid.

Note: register_agent was correctly moved to openhands.sdk as you've done.

from openhands.sdk.subagent import register_agent
from openhands.sdk.tool import register_tool
from openhands.tools.delegate import DelegateTool, DelegationVisualizer
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DelegateTool is the deprecated one.

from openhands.tools.preset.default import get_default_condenser, get_default_tools

# Add the script directory to Python path so we can import prompt.py
script_dir = Path(__file__).parent
sys.path.insert(0, str(script_dir))

from prompt import format_prompt # noqa: E402
from prompt import format_prompt, get_file_reviewer_skill_content # noqa: E402

logger = get_logger(__name__)

Expand Down Expand Up @@ -737,6 +746,7 @@ def validate_environment() -> dict[str, Any]:
"base_url": os.getenv("LLM_BASE_URL"),
"review_style": review_style,
"require_evidence": _get_bool_env("REQUIRE_EVIDENCE"),
"use_sub_agents": _get_bool_env("USE_SUB_AGENTS"),
"pr_info": {
"number": os.getenv("PR_NUMBER"),
"title": os.getenv("PR_TITLE"),
Expand Down Expand Up @@ -772,6 +782,50 @@ def fetch_pr_context(pr_number: str) -> tuple[str, str, str]:
return pr_diff, commit_id, review_context


def _create_file_reviewer_agent(llm: LLM) -> Agent:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Clarification on previous review: The factory function signature is correct. I verified that register_agent expects Callable[[LLM], Agent], which matches _create_file_reviewer_agent(llm: LLM) -> Agent. This concern has been properly addressed.

"""Factory for file_reviewer sub-agents used during delegation.

Each sub-agent receives a skill that defines its review persona and
expected output format. It has no tools — the coordinator handles
all GitHub API interaction.
"""
# review_style is read at registration time from the environment
review_style = os.getenv("REVIEW_STYLE", "standard").lower()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The deprecated REVIEW_STYLE env var is still being read here for sub-agents, even though action.yml says it's "deprecated and ignored." While this works fine (proper defaults exist), it's potentially confusing.

Consider either:

  1. Updating the action.yml description to clarify it's only deprecated for the main reviewer, not sub-agents
  2. Or removing this usage and hardcoding to "standard" if sub-agents should always use standard style

Not blocking, but worth clarifying the intended behavior.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Using the deprecated REVIEW_STYLE for sub-agent tone is a pragmatic choice - it's confusing from a naming perspective but functionally correct. The default is sensible and the main reviewer correctly ignores this env var per the deprecation notice.

skill_content = get_file_reviewer_skill_content(review_style)

skills = [
Skill(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Verify the factory function signature matches register_agent's expectations.

The function accepts llm: LLM but it's unclear if this is the correct signature for the factory_func parameter. Check the SDK docs or type hints for register_agent to confirm the expected signature.

If the signature is wrong, agent instantiation will fail at runtime with no static type checking to catch it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Spelling consistency - use US English to match the rest of the codebase (README uses "analyze").

Suggested change
Skill(
tools=[], # sub-agents only analyze; coordinator posts the review

name="file_review_instructions",
content=skill_content,
trigger=None,
),
]
return Agent(
llm=llm,
tools=[], # sub-agents only analyse; coordinator posts the review
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: Spelling consistency - "analyse" (British) vs "analyze" (US). The README was updated to use US English ("analyze"), so this comment should match.

Suggested change
tools=[], # sub-agents only analyse; coordinator posts the review
tools=[], # sub-agents only analyze; coordinator posts the review

agent_context=AgentContext(
skills=skills,
system_message_suffix=(
"You are a file-level code reviewer sub-agent. "
"Return findings as a JSON array. Do NOT call the GitHub API."
),
),
)


def _register_sub_agents() -> None:
"""Register the file_reviewer agent type and the DelegateTool."""
register_agent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: Related to the deprecation concern - verify that register_agent from openhands.tools.delegate is the correct API for TaskToolSet-based delegation. If TaskToolSet has its own registration mechanism (e.g., TaskToolSet.register_agent()), that should be used instead.

name="file_reviewer",
factory_func=_create_file_reviewer_agent,
description=(
"Reviews one or more files from a PR diff and returns structured "
"findings as a JSON array."
),
)
register_tool("DelegateTool", DelegateTool)


def create_conversation(
config: dict[str, Any],
secrets: dict[str, str],
Expand All @@ -782,6 +836,9 @@ def create_conversation(
handles wiring skills, MCP config, and hooks automatically.
Project-specific skills from the workspace are loaded separately.

When ``config["use_sub_agents"]`` is True the coordinator agent is
given the DelegateTool so it can spawn file_reviewer sub-agents.

Args:
config: Configuration dictionary from validate_environment()
secrets: Secrets to mask in output
Expand Down Expand Up @@ -813,9 +870,17 @@ def create_conversation(
skills=project_skills,
)

tools = get_default_tools(enable_browser=False)

use_sub_agents = config.get("use_sub_agents", False)
if use_sub_agents:
_register_sub_agents()
tools.append(Tool(name=DelegateTool.name))
logger.info("Sub-agent delegation enabled — DelegateTool added")

agent = Agent(
llm=llm,
tools=get_default_tools(enable_browser=False),
tools=tools,
agent_context=agent_context,
system_prompt_kwargs={"cli_mode": True},
condenser=get_default_condenser(
Expand All @@ -825,12 +890,18 @@ def create_conversation(

# The plugin directory is the parent of the scripts/ directory
plugin_dir = script_dir.parent # plugins/pr-review/
return Conversation(
agent=agent,
workspace=cwd,
secrets=secrets,
plugins=[PluginSource(source=str(plugin_dir))],
)
conversation_kwargs: dict[str, Any] = {
"agent": agent,
"workspace": cwd,
"secrets": secrets,
"plugins": [PluginSource(source=str(plugin_dir))],
}
if use_sub_agents:
conversation_kwargs["visualizer"] = DelegationVisualizer(
name="PR Review Coordinator"
)

return Conversation(**conversation_kwargs)


def run_review(
Expand Down Expand Up @@ -943,10 +1014,12 @@ def main():
pr_info = config["pr_info"]
review_style = config["review_style"]
require_evidence = config["require_evidence"]
use_sub_agents = config["use_sub_agents"]

logger.info(f"Reviewing PR #{pr_info['number']}: {pr_info['title']}")
logger.info(f"Review style: {review_style}")
logger.info(f"Require PR evidence: {require_evidence}")
logger.info(f"Sub-agent delegation: {use_sub_agents}")

try:
pr_diff, commit_id, review_context = fetch_pr_context(pr_info["number"])
Expand All @@ -968,6 +1041,7 @@ def main():
diff=pr_diff,
review_context=review_context,
require_evidence=require_evidence,
use_sub_agents=use_sub_agents,
)

secrets = {}
Expand Down
104 changes: 103 additions & 1 deletion plugins/pr-review/scripts/prompt.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
- {pr_number} - The PR number
- {commit_id} - The HEAD commit SHA
- {review_context} - Previous review comments and thread resolution status

When sub-agent delegation is enabled, the main agent acts as a coordinator
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have "smart activation" of delegation? Prompt the main agent so that it decides when to delegate vs when not to delegate. And maybe also add an override that forces the mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 4f1e5da: added auto mode for use-sub-agents.

Tri-state modes:

  • false (default): single-agent review, no delegation
  • true: force delegation — coordinator + file_reviewer sub-agents
  • auto: smart activation — agent gets the TaskToolSet and decides at runtime whether to delegate based on diff size/complexity (roughly 4+ files or 500+ changed lines → delegate, otherwise review directly)

The auto mode uses a dedicated AUTO_DELEGATION_PROMPT that gives the agent heuristics for when delegation is worthwhile vs overhead.

This comment was posted by an AI assistant (OpenHands) on behalf of a user.

that splits the diff by file and delegates individual file reviews to
sub-agents, then consolidates results and posts the final review.
"""

# Template for when there is review context available
Expand Down Expand Up @@ -75,6 +79,75 @@
Analyze the changes and post your review using the GitHub API.
"""

# Prompt for the main coordinator agent when sub-agent delegation is enabled.
# The coordinator splits the diff into per-file chunks and delegates each
# to a "file_reviewer" sub-agent, then consolidates and posts the review.
SUB_AGENT_PROMPT = """{skill_trigger}
/github-pr-review

You are a **review coordinator**. Your job is to delegate the actual file-level
review work to sub-agents and then consolidate their findings into a single
GitHub PR review.

## Pull Request Information

- **Title**: {title}
- **Description**: {body}
- **Repository**: {repo_name}
- **Base Branch**: {base_branch}
- **Head Branch**: {head_branch}
- **PR Number**: {pr_number}
- **Commit ID**: {commit_id}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Nit: "De-duplicate and drop low-signal noise" is subjective guidance for an LLM. One model's "noise" might be a useful finding to a human. For an experimental feature this is acceptable, but consider either:

  1. Being more specific (e.g., "drop duplicate findings for the same line")
  2. Erring on the side of keeping findings (let humans decide what's noise)

Not a blocker given the experimental flag and documented limitations.


{review_context_section}{evidence_requirements_section}

## Instructions

You have access to the **DelegateTool**. Follow these steps:

1. **Spawn sub-agents** — one `file_reviewer` sub-agent per changed file (or
small group of closely related files). Use `spawn` with descriptive IDs
based on the file paths (e.g. `"review_src_utils"`, `"review_tests"`).

2. **Delegate** — send each sub-agent the diff chunk for its file(s) together
with the PR context (title, description, base/head branch). Ask it to
return a structured list of findings with severity, file path, line number,
and a short description.

3. **Collect results** — after all sub-agents respond, merge their findings.
De-duplicate and drop low-signal noise.

4. **Post the review** — use the GitHub API (as described by /github-pr-review)
to submit a single PR review with inline comments on the relevant lines.
Keep the top-level review body brief.

## Full Diff

The complete diff is provided below. Split it by file when delegating.

```diff
{diff}
```
"""

# System-level instruction injected into each file_reviewer sub-agent so it
# knows its role, the review style, and the expected output format.
FILE_REVIEWER_SKILL = """\
You are a **file-level code reviewer**. You will receive a diff for one or more
files from a pull request together with PR metadata.

Review style: {review_style_description}

For each issue you find, return a JSON object with:
- `path`: the file path
- `line`: the diff line number (use the NEW file line number)
- `severity`: one of `critical`, `major`, `minor`, `nit`
- `body`: a concise description of the issue with a suggested fix when possible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Important: The FILE_REVIEWER_SKILL lacks a concrete JSON schema example. The coordinator will parse LLM-generated JSON with no validation.

Suggest adding an explicit example to reduce parsing ambiguity:

Suggested change
- `body`: a concise description of the issue with a suggested fix when possible
You are a **file-level code reviewer**. You will receive a diff for one or more
files from a pull request together with PR metadata.
Review style: {review_style_description}
For each issue you find, return a JSON object with:
- `path`: the file path
- `line`: the diff line number (use the NEW file line number)
- `severity`: one of `critical`, `major`, `minor`, `nit`
- `body`: a concise description of the issue with a suggested fix when possible
Return your findings as a JSON array. If you find no issues, return `[]`.
**Example output:**
```json
[
{"path": "src/app.py", "line": 42, "severity": "major", "body": "Unchecked null dereference"},
{"path": "src/utils.js", "line": 15, "severity": "nit", "body": "Consider const instead of let"}
]

Do NOT post anything to the GitHub API — the coordinator agent will handle that.


This addresses the fragility concern raised in the previous review.


Return your findings as a JSON array. If you find no issues, return `[]`.
Do NOT post anything to the GitHub API — the coordinator agent will handle that.
"""


def format_prompt(
skill_trigger: str,
Expand All @@ -88,6 +161,7 @@ def format_prompt(
diff: str,
review_context: str = "",
require_evidence: bool = False,
use_sub_agents: bool = False,
) -> str:
"""Format the PR review prompt with all parameters.

Expand All @@ -105,6 +179,9 @@ def format_prompt(
the review context section is omitted from the prompt.
require_evidence: Whether to instruct the reviewer to enforce PR description
evidence showing the code works.
use_sub_agents: When True, use the sub-agent coordinator prompt instead of
the single-agent prompt. The coordinator will delegate
file-level reviews to sub-agents and consolidate results.

Returns:
Formatted prompt string
Expand All @@ -121,7 +198,9 @@ def format_prompt(
_EVIDENCE_REQUIREMENT_SECTION if require_evidence else ""
)

return PROMPT.format(
template = SUB_AGENT_PROMPT if use_sub_agents else PROMPT

return template.format(
skill_trigger=skill_trigger,
title=title,
body=body,
Expand All @@ -134,3 +213,26 @@ def format_prompt(
evidence_requirements_section=evidence_requirements_section,
diff=diff,
)


def get_file_reviewer_skill_content(review_style: str = "standard") -> str:
"""Return the file_reviewer sub-agent skill content.

Args:
review_style: 'standard' or 'roasted'

Returns:
Formatted skill content string for the file_reviewer agent type
"""
style_descriptions = {
"standard": (
"Balanced review covering correctness, style, readability, "
"and security. Be constructive."
),
"roasted": (
"Linus Torvalds-style brutally honest review. Focus on data "
"structures, simplicity, and pragmatism. No hand-holding."
),
}
description = style_descriptions.get(review_style, style_descriptions["standard"])
return FILE_REVIEWER_SKILL.format(review_style_description=description)
Loading
Loading