-
Notifications
You must be signed in to change notification settings - Fork 26
feat(pr-review): add sub-agent delegation for file-level reviews #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
df860db
e0f7681
6669953
eb1b667
7dcf8b6
199020e
0ef1afc
4f1e5da
3226964
722ea61
bc425c4
c0447ca
2b1fcf3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -24,6 +24,7 @@ Then configure the required secrets (see [Installation](#installation) below). | |||||
| - **A/B Testing**: Support for testing multiple LLM models | ||||||
| - **Review Context Awareness**: Considers previous reviews and unresolved threads | ||||||
| - **Evidence Enforcement**: Optional check that PR descriptions include concrete end-to-end proof the code works, not just test output | ||||||
| - **Sub-Agent Delegation** *(Experimental)*: Split large PR reviews across multiple sub-agents, one per file, then consolidate findings (see [Known Limitations](#known-limitations-sub-agent-delegation)) | ||||||
| - **Observability**: Optional Laminar integration for tracing and evaluation | ||||||
|
|
||||||
| ## Plugin Contents | ||||||
|
|
@@ -141,12 +142,24 @@ PR reviews are automatically triggered when: | |||||
| | `llm-base-url` | No | `''` | Custom LLM endpoint URL | | ||||||
| | `review-style` | No | `roasted` | **[DEPRECATED]** Previously chose between `standard` and `roasted` review styles. Now ignored — the styles have been merged into a single unified skill. | | ||||||
| | `require-evidence` | No | `'false'` | Require the reviewer to enforce an `Evidence` section in the PR description with end-to-end proof: screenshots/videos for frontend work, commands and runtime output for backend or scripts, and an agent conversation link when applicable. Test output alone does not qualify. | | ||||||
| | `use-sub-agents` | No | `'false'` | **(Experimental)** Enable sub-agent delegation for file-level reviews. The main agent acts as a coordinator that delegates per-file review work to `file_reviewer` sub-agents via the SDK TaskToolSet, then consolidates findings into a single PR review. Useful for large PRs with many changed files. | | ||||||
| | `extensions-repo` | No | `OpenHands/extensions` | Extensions repository | | ||||||
| | `extensions-version` | No | `main` | Git ref (tag, branch, or SHA) | | ||||||
| | `llm-api-key` | Yes | - | LLM API key | | ||||||
| | `github-token` | Yes | - | GitHub token for API access | | ||||||
| | `lmnr-api-key` | No | `''` | Laminar API key for observability | | ||||||
|
|
||||||
| ## Known Limitations: Sub-Agent Delegation | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Previous concern addressed: The Known Limitations section clearly documents the experimental nature and constraints (LLM-driven JSON parsing, potential information loss, no integration tests yet, sub-agents have no tools). This transparency is excellent for an experimental feature. |
||||||
|
|
||||||
| The `use-sub-agents` feature is **experimental** and has the following known constraints: | ||||||
|
|
||||||
| - **LLM-driven JSON parsing**: The coordinator agent relies on the LLM to parse and merge JSON responses from sub-agents. There is no code-level validation of sub-agent output, so malformed responses may cause incomplete reviews. | ||||||
| - **Potential information loss during consolidation**: When merging findings from multiple sub-agents, the coordinator may lose or deduplicate findings imperfectly, especially for cross-file issues. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Nit: Typo - "analyse" should be "analyze" for consistency with US English used elsewhere in the codebase.
Suggested change
|
||||||
| - **No integration tests yet**: Current test coverage verifies prompt formatting only. End-to-end validation of the delegation flow requires manual workflow testing. | ||||||
| - **Sub-agents have read-only tools**: File reviewer sub-agents have access to `terminal` and `file_editor` for inspecting full source files and surrounding context, but they cannot query the GitHub API or post reviews — only the coordinator handles GitHub interaction. | ||||||
|
|
||||||
| These limitations are acceptable for an opt-in experimental feature and will be addressed as the feature matures. | ||||||
|
|
||||||
| ## A/B Testing Multiple Models | ||||||
|
|
||||||
| Test different LLM models by providing a comma-separated list: | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -32,6 +32,11 @@ | |||||
| REPO_NAME: Repository name in format owner/repo (required) | ||||||
| REQUIRE_EVIDENCE: Whether to require PR description evidence showing the code | ||||||
| works ('true'/'false', default: 'false') | ||||||
| USE_SUB_AGENTS: Enable sub-agent delegation for file-level reviews | ||||||
| ('true'/'false', default: 'false'). When enabled, the main agent acts | ||||||
| as a coordinator that delegates per-file review work to | ||||||
| file_reviewer sub-agents via the TaskToolSet, then consolidates | ||||||
| findings into a single GitHub PR review. | ||||||
|
|
||||||
| For setup instructions, usage examples, and GitHub Actions integration, | ||||||
| see README.md in this directory. | ||||||
|
|
@@ -50,18 +55,29 @@ | |||||
| from typing import Any | ||||||
|
|
||||||
| from lmnr import Laminar | ||||||
| from openhands.sdk import LLM, Agent, AgentContext, Conversation, get_logger | ||||||
| from openhands.sdk import ( | ||||||
| LLM, | ||||||
| Agent, | ||||||
| AgentContext, | ||||||
| Conversation, | ||||||
| Tool, | ||||||
| get_logger, | ||||||
| register_agent, | ||||||
| ) | ||||||
| from openhands.sdk.context import Skill | ||||||
| from openhands.sdk.context.skills import load_project_skills | ||||||
| from openhands.sdk.conversation import get_agent_final_response | ||||||
| from openhands.sdk.git.utils import run_git_command | ||||||
| from openhands.sdk.plugin import PluginSource | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 Critical: Partially unresolved deprecation issue. While Verify:
The previous critical review thread specifically flagged this import as problematic.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Clarification on previous review: The concern about Note: |
||||||
| from openhands.tools.delegate import DelegationVisualizer | ||||||
| from openhands.tools.preset.default import get_default_condenser, get_default_tools | ||||||
| from openhands.tools.task import TaskToolSet | ||||||
|
|
||||||
| # Add the script directory to Python path so we can import prompt.py | ||||||
| script_dir = Path(__file__).parent | ||||||
| sys.path.insert(0, str(script_dir)) | ||||||
|
|
||||||
| from prompt import format_prompt # noqa: E402 | ||||||
| from prompt import FILE_REVIEWER_SKILL, format_prompt # noqa: E402 | ||||||
|
|
||||||
| logger = get_logger(__name__) | ||||||
|
|
||||||
|
|
@@ -728,6 +744,7 @@ def validate_environment() -> dict[str, Any]: | |||||
| "model": os.getenv("LLM_MODEL", "anthropic/claude-sonnet-4-5-20250929"), | ||||||
| "base_url": os.getenv("LLM_BASE_URL"), | ||||||
| "require_evidence": _get_bool_env("REQUIRE_EVIDENCE"), | ||||||
| "use_sub_agents": _get_bool_env("USE_SUB_AGENTS"), | ||||||
| "pr_info": { | ||||||
| "number": os.getenv("PR_NUMBER"), | ||||||
| "title": os.getenv("PR_TITLE"), | ||||||
|
|
@@ -763,6 +780,47 @@ def fetch_pr_context(pr_number: str) -> tuple[str, str, str]: | |||||
| return pr_diff, commit_id, review_context | ||||||
|
|
||||||
|
|
||||||
| def _create_file_reviewer_agent(llm: LLM) -> Agent: | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Clarification on previous review: The factory function signature is correct. I verified that |
||||||
| """Factory for file_reviewer sub-agents used during delegation. | ||||||
|
|
||||||
| Each sub-agent receives a skill that defines its review persona and | ||||||
| expected output format. It has read-only terminal and file_editor | ||||||
| access so it can inspect surrounding code context in the PR repo, | ||||||
| but the coordinator handles all GitHub API interaction. | ||||||
| """ | ||||||
| skills = [ | ||||||
| Skill( | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 Important: Verify the factory function signature matches The function accepts If the signature is wrong, agent instantiation will fail at runtime with no static type checking to catch it.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Nit: Spelling consistency - use US English to match the rest of the codebase (README uses "analyze").
Suggested change
|
||||||
| name="file_review_instructions", | ||||||
| content=FILE_REVIEWER_SKILL, | ||||||
| trigger=None, | ||||||
| ), | ||||||
| ] | ||||||
| return Agent( | ||||||
| llm=llm, | ||||||
| tools=[ | ||||||
| Tool(name="terminal"), | ||||||
| Tool(name="file_editor"), | ||||||
| ], | ||||||
| agent_context=AgentContext(skills=skills), | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def _register_sub_agents() -> None: | ||||||
| """Register the file_reviewer agent type. | ||||||
|
|
||||||
| TaskToolSet auto-registers on import, so no explicit | ||||||
| ``register_tool()`` call is needed. | ||||||
| """ | ||||||
| register_agent( | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 Important: Related to the deprecation concern - verify that |
||||||
| name="file_reviewer", | ||||||
| factory_func=_create_file_reviewer_agent, | ||||||
| description=( | ||||||
| "Reviews one or more files from a PR diff and returns structured " | ||||||
| "findings as a JSON array." | ||||||
| ), | ||||||
| ) | ||||||
|
|
||||||
|
|
||||||
| def create_conversation( | ||||||
| config: dict[str, Any], | ||||||
| secrets: dict[str, str], | ||||||
|
|
@@ -773,6 +831,9 @@ def create_conversation( | |||||
| handles wiring skills, MCP config, and hooks automatically. | ||||||
| Project-specific skills from the workspace are loaded separately. | ||||||
|
|
||||||
| When ``config["use_sub_agents"]`` is True the coordinator agent is | ||||||
| given the TaskToolSet so it can delegate to file_reviewer sub-agents. | ||||||
|
|
||||||
| Args: | ||||||
| config: Configuration dictionary from validate_environment() | ||||||
| secrets: Secrets to mask in output | ||||||
|
|
@@ -804,9 +865,17 @@ def create_conversation( | |||||
| skills=project_skills, | ||||||
| ) | ||||||
|
|
||||||
| tools = get_default_tools(enable_browser=False) | ||||||
|
|
||||||
| use_sub_agents = config.get("use_sub_agents", False) | ||||||
| if use_sub_agents: | ||||||
| _register_sub_agents() | ||||||
| tools.append(Tool(name=TaskToolSet.name)) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟢 Acceptable: No validation that sub-agent registration succeeded or that TaskToolSet.name is valid. If registration silently fails, the coordinator would try to use a non-existent tool. For an experimental feature with manual workflow testing, this is acceptable. If/when this graduates from experimental, consider adding validation: try:
_register_sub_agents()
tools.append(Tool(name=TaskToolSet.name))
logger.info(f"Sub-agent delegation enabled — {TaskToolSet.name} registered")
except Exception as e:
logger.error(f"Sub-agent registration failed: {e}")
raise |
||||||
| logger.info("Sub-agent delegation enabled — TaskToolSet added") | ||||||
|
|
||||||
| agent = Agent( | ||||||
| llm=llm, | ||||||
| tools=get_default_tools(enable_browser=False), | ||||||
| tools=tools, | ||||||
| agent_context=agent_context, | ||||||
| system_prompt_kwargs={"cli_mode": True}, | ||||||
| condenser=get_default_condenser( | ||||||
|
|
@@ -816,12 +885,18 @@ def create_conversation( | |||||
|
|
||||||
| # The plugin directory is the parent of the scripts/ directory | ||||||
| plugin_dir = script_dir.parent # plugins/pr-review/ | ||||||
| return Conversation( | ||||||
| agent=agent, | ||||||
| workspace=cwd, | ||||||
| secrets=secrets, | ||||||
| plugins=[PluginSource(source=str(plugin_dir))], | ||||||
| ) | ||||||
| conversation_kwargs: dict[str, Any] = { | ||||||
| "agent": agent, | ||||||
| "workspace": cwd, | ||||||
| "secrets": secrets, | ||||||
| "plugins": [PluginSource(source=str(plugin_dir))], | ||||||
| } | ||||||
| if use_sub_agents: | ||||||
| conversation_kwargs["visualizer"] = DelegationVisualizer( | ||||||
| name="PR Review Coordinator" | ||||||
| ) | ||||||
|
|
||||||
| return Conversation(**conversation_kwargs) | ||||||
|
|
||||||
|
|
||||||
| def run_review( | ||||||
|
|
@@ -930,9 +1005,11 @@ def main(): | |||||
| config = validate_environment() | ||||||
| pr_info = config["pr_info"] | ||||||
| require_evidence = config["require_evidence"] | ||||||
| use_sub_agents = config["use_sub_agents"] | ||||||
|
|
||||||
| logger.info(f"Reviewing PR #{pr_info['number']}: {pr_info['title']}") | ||||||
| logger.info(f"Require PR evidence: {require_evidence}") | ||||||
| logger.info(f"Sub-agent delegation: {use_sub_agents}") | ||||||
|
|
||||||
| try: | ||||||
| pr_diff, commit_id, review_context = fetch_pr_context(pr_info["number"]) | ||||||
|
|
@@ -952,6 +1029,7 @@ def main(): | |||||
| diff=pr_diff, | ||||||
| review_context=review_context, | ||||||
| require_evidence=require_evidence, | ||||||
| use_sub_agents=use_sub_agents, | ||||||
| ) | ||||||
|
|
||||||
| secrets = {} | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟢 Nit: Consider adding a Known Limitations subsection under the experimental feature description that mentions:
This sets appropriate expectations for users trying the experimental feature.