Skip to content

Commit 25acda5

Browse files
committed
feat: add self-reflection step before presenting roadmap
Implements the self-reflection pattern from issue #6, where the agent reviews its own output before presenting it to users. This catches issues like missing files, generic advice, or unclear reasoning. Changes: - Add reflect_on_roadmap node that evaluates the generated roadmap - Add conditional routing to retry roadmap generation if issues found - Limit retries to 2 iterations to prevent infinite loops - Add --no-reflection flag to skip reflection for faster results - Update draft_roadmap to incorporate feedback on retries The reflection step checks for: - Completeness (all files mentioned) - Logical review order - Specificity (not generic boilerplate) - Accuracy of summaries Closes #6
1 parent dbe26a2 commit 25acda5

File tree

9 files changed

+550
-14
lines changed

9 files changed

+550
-14
lines changed

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ A CLI tool that uses LLMs (Claude) to generate a structured, human-friendly road
77
- **Topology Analysis**: Groups changed files into logical components (e.g., API, DB, Frontend).
88
- **Deep Linking**: Generates links to specific lines of code in the PR.
99
- **Review Guidance**: Suggests a logical order for reviewing files.
10+
- **Self-Reflection**: Reviews its own output before presenting, catching issues and improving quality.
1011
- **Integration**: Fetches PR metadata, diffs, and existing comments from GitHub.
1112

1213
## Installation
@@ -66,6 +67,7 @@ review_roadmap {PR link in the form owner/repo/pr_number or just a URL to the PR
6667
|--------|-------------|
6768
| `--output`, `-o` | Save the roadmap to a file instead of printing to stdout |
6869
| `--post`, `-p` | Post the roadmap as a comment directly on the PR |
70+
| `--no-reflection` | Skip the self-reflection step for faster results |
6971

7072
You can use both `-o` and `-p` together—the roadmap will be generated once and saved to both the file and the PR comment.
7173

@@ -91,6 +93,12 @@ Generate and both save to file and post to PR:
9193
review_roadmap https://github.com/llamastack/llama-stack/pull/3674 -o roadmap.md -p
9294
```
9395

96+
Generate quickly without self-reflection (faster but may have lower quality):
97+
98+
```bash
99+
review_roadmap https://github.com/llamastack/llama-stack/pull/3674 --no-reflection
100+
```
101+
94102
## Development
95103

96104
```bash
@@ -109,5 +117,8 @@ pytest -v
109117
The tool uses **LangGraph** to orchestrate the workflow:
110118

111119
1. **Analyze Structure**: LLM analyzes file paths to understand component groups.
112-
2. **Context Expansion**: (Planned) Fetches additional file content if diffs are ambiguous.
120+
2. **Context Expansion**: Fetches additional file content if diffs are ambiguous.
113121
3. **Draft Roadmap**: Synthesizes metadata, diffs, and comments into a coherent guide.
122+
4. **Self-Reflection**: Reviews the generated roadmap for completeness and accuracy, retrying if needed.
123+
124+
The self-reflection step implements the [self-review pattern](https://github.com/jeremyeder/reference/blob/main/docs/patterns/self-review-reflection.md), where the agent evaluates its own output before presenting it to users. This catches issues like missing files, generic advice, or unclear reasoning—improving quality without manual review.

review_roadmap/agent/graph.py

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,63 @@
88
from langgraph.graph.state import CompiledStateGraph
99

1010
from review_roadmap.agent.state import ReviewState
11-
from review_roadmap.agent.nodes import analyze_structure, context_expansion, draft_roadmap
11+
from review_roadmap.agent.nodes import (
12+
analyze_structure,
13+
context_expansion,
14+
draft_roadmap,
15+
reflect_on_roadmap,
16+
)
17+
from review_roadmap.agent.prompts import MAX_REFLECTION_ITERATIONS
18+
from review_roadmap.logging import get_logger
19+
20+
logger = get_logger(__name__)
21+
22+
23+
def _should_reflect(state: ReviewState) -> str:
24+
"""Determine whether to run reflection or skip to end.
25+
26+
Args:
27+
state: Current workflow state.
28+
29+
Returns:
30+
'reflect' to run reflection, 'end' to skip.
31+
"""
32+
if state.skip_reflection:
33+
logger.info("reflection_skipped", reason="disabled by user")
34+
return "end"
35+
return "reflect"
36+
37+
38+
def _after_reflection(state: ReviewState) -> str:
39+
"""Determine whether to retry roadmap generation or finish.
40+
41+
Args:
42+
state: Current workflow state with reflection results.
43+
44+
Returns:
45+
'retry' to regenerate roadmap, 'end' to finish.
46+
"""
47+
if state.reflection_passed:
48+
return "end"
49+
if state.reflection_iterations >= MAX_REFLECTION_ITERATIONS:
50+
logger.warning("max_reflection_iterations_reached",
51+
iterations=state.reflection_iterations)
52+
return "end"
53+
return "retry"
1254

1355

1456
def build_graph() -> CompiledStateGraph:
1557
"""Build and compile the LangGraph workflow for review roadmap generation.
1658
17-
The workflow consists of three sequential nodes:
59+
The workflow consists of four nodes with conditional routing:
1860
1. analyze_structure: Groups changed files into logical components
1961
2. context_expansion: Optionally fetches additional file content for context
2062
3. draft_roadmap: Generates the final Markdown roadmap
63+
4. reflect_on_roadmap: Self-reviews the roadmap and may trigger a retry
64+
65+
The reflection step can be skipped by setting skip_reflection=True in the
66+
initial state. If reflection fails, the workflow loops back to draft_roadmap
67+
with feedback, up to MAX_REFLECTION_ITERATIONS times.
2168
2269
Returns:
2370
A compiled LangGraph that can be invoked with a ReviewState containing
@@ -27,18 +74,41 @@ def build_graph() -> CompiledStateGraph:
2774
>>> graph = build_graph()
2875
>>> result = graph.invoke({"pr_context": pr_context})
2976
>>> roadmap = result["roadmap"]
77+
78+
# Skip reflection:
79+
>>> result = graph.invoke({"pr_context": pr_context, "skip_reflection": True})
3080
"""
3181
workflow = StateGraph(ReviewState)
3282

3383
# Add Nodes
3484
workflow.add_node("analyze_structure", analyze_structure)
3585
workflow.add_node("context_expansion", context_expansion)
3686
workflow.add_node("draft_roadmap", draft_roadmap)
87+
workflow.add_node("reflect_on_roadmap", reflect_on_roadmap)
3788

3889
# Define Edges
3990
workflow.set_entry_point("analyze_structure")
4091
workflow.add_edge("analyze_structure", "context_expansion")
4192
workflow.add_edge("context_expansion", "draft_roadmap")
42-
workflow.add_edge("draft_roadmap", END)
93+
94+
# After draft_roadmap, decide whether to reflect or skip
95+
workflow.add_conditional_edges(
96+
"draft_roadmap",
97+
_should_reflect,
98+
{
99+
"reflect": "reflect_on_roadmap",
100+
"end": END,
101+
}
102+
)
103+
104+
# After reflection, decide whether to retry or finish
105+
workflow.add_conditional_edges(
106+
"reflect_on_roadmap",
107+
_after_reflection,
108+
{
109+
"retry": "draft_roadmap",
110+
"end": END,
111+
}
112+
)
43113

44114
return workflow.compile()

review_roadmap/agent/nodes.py

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,7 @@ def _get_llm_instance() -> BaseChatModel:
112112
ANALYZE_STRUCTURE_SYSTEM_PROMPT,
113113
CONTEXT_EXPANSION_SYSTEM_PROMPT,
114114
DRAFT_ROADMAP_SYSTEM_PROMPT,
115+
REFLECT_ON_ROADMAP_SYSTEM_PROMPT,
115116
)
116117
from review_roadmap.agent.tools import read_file
117118

@@ -318,13 +319,17 @@ def draft_roadmap(state: ReviewState) -> Dict[str, Any]:
318319
file analysis, topology, comments, fetched content) into a structured
319320
roadmap with deep links to guide the reviewer.
320321
322+
If reflection feedback is present (from a previous iteration), it is
323+
included in the prompt to guide improvements.
324+
321325
Args:
322326
state: Current workflow state with all accumulated context.
323327
324328
Returns:
325329
Dict with 'roadmap' key containing the Markdown roadmap string.
326330
"""
327-
logger.info("node_started", node="draft_roadmap")
331+
logger.info("node_started", node="draft_roadmap",
332+
iteration=state.reflection_iterations)
328333

329334
files_context = _build_files_context(state)
330335
comments_context = _build_comments_context(state)
@@ -333,6 +338,15 @@ def draft_roadmap(state: ReviewState) -> Dict[str, Any]:
333338
repo_url = state.pr_context.metadata.repo_url
334339
pr_number = state.pr_context.metadata.number
335340

341+
# Include reflection feedback if this is a retry
342+
feedback_section = ""
343+
if state.reflection_feedback:
344+
feedback_section = f"""
345+
346+
## Self-Review Feedback (address these issues in your revision)
347+
{state.reflection_feedback}
348+
"""
349+
336350
context_str = f"""
337351
Title: {state.pr_context.metadata.title}
338352
Description: {state.pr_context.metadata.description}
@@ -349,6 +363,7 @@ def draft_roadmap(state: ReviewState) -> Dict[str, Any]:
349363
Existing Comments:
350364
{chr(10).join(comments_context) if comments_context else "No comments found."}
351365
{fetched_context_str}
366+
{feedback_section}
352367
"""
353368

354369
prompt = ChatPromptTemplate.from_messages([
@@ -360,3 +375,71 @@ def draft_roadmap(state: ReviewState) -> Dict[str, Any]:
360375
response = chain.invoke({"context": context_str})
361376

362377
return {"roadmap": response.content}
378+
379+
380+
def reflect_on_roadmap(state: ReviewState) -> Dict[str, Any]:
381+
"""Self-review the generated roadmap before presenting to user.
382+
383+
Evaluates the roadmap against quality criteria and either approves it
384+
or provides specific feedback for improvement. This implements the
385+
self-reflection pattern to catch issues before humans see them.
386+
387+
Args:
388+
state: Current workflow state with the generated roadmap.
389+
390+
Returns:
391+
Dict with reflection results:
392+
- reflection_passed: Whether the roadmap passed review
393+
- reflection_feedback: Specific feedback if failed
394+
- reflection_iterations: Incremented iteration count
395+
"""
396+
logger.info("node_started", node="reflect_on_roadmap",
397+
iteration=state.reflection_iterations)
398+
399+
# Build context for reflection
400+
files_list = "\n".join([f"- {f.path}" for f in state.pr_context.files])
401+
402+
context_str = f"""## PR Context
403+
Title: {state.pr_context.metadata.title}
404+
Changed Files:
405+
{files_list}
406+
407+
## Generated Roadmap
408+
{state.roadmap}
409+
410+
## Previous Feedback (if any)
411+
{state.reflection_feedback or "None - first review"}
412+
"""
413+
414+
prompt = ChatPromptTemplate.from_messages([
415+
("system", REFLECT_ON_ROADMAP_SYSTEM_PROMPT),
416+
("human", "{context}")
417+
])
418+
419+
chain = prompt | _get_llm_instance()
420+
response = chain.invoke({"context": context_str})
421+
422+
# Parse response (with fallback for non-JSON responses)
423+
import json
424+
try:
425+
result = json.loads(response.content)
426+
passed = result.get("passed", False)
427+
feedback = result.get("feedback", "")
428+
notes = result.get("notes", "")
429+
except json.JSONDecodeError:
430+
# If LLM didn't return valid JSON, assume it passed
431+
logger.warning("reflection_response_not_json", content=response.content[:200])
432+
passed = True
433+
feedback = ""
434+
notes = "Self-review: completed (non-JSON response)"
435+
436+
if passed:
437+
logger.info("reflection_passed", notes=notes)
438+
else:
439+
logger.info("reflection_failed", feedback=feedback)
440+
441+
return {
442+
"reflection_passed": passed,
443+
"reflection_feedback": feedback,
444+
"reflection_iterations": state.reflection_iterations + 1,
445+
}

review_roadmap/agent/prompts.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# System Prompts for the Agent
22

3+
# Maximum number of reflection iterations before accepting the roadmap
4+
MAX_REFLECTION_ITERATIONS = 2
5+
36
ANALYZE_STRUCTURE_SYSTEM_PROMPT = """You are a Senior Software Architect.
47
58
Analyze the list of changed files and group them into logical components (e.g., 'Backend API', 'Frontend Components', 'Database Schema', 'Configuration').
@@ -42,3 +45,30 @@
4245
4346
Do not be generic. Be specific to the file paths and names provided.
4447
"""
48+
49+
REFLECT_ON_ROADMAP_SYSTEM_PROMPT = """You are a Senior Staff Engineer reviewing a PR review roadmap before it's shown to a human reviewer.
50+
51+
## Your Task
52+
Critically evaluate this roadmap from the perspective of someone who will use it to review the PR.
53+
54+
## Checklist
55+
1. **Completeness**: Are all changed files mentioned? Is anything important missing?
56+
2. **Logical Order**: Does the suggested review order make sense? Would a reviewer get confused?
57+
3. **Specificity**: Are the "watch outs" specific to THIS PR, or generic boilerplate?
58+
4. **Deep Links**: Are file references actionable (include links where provided)?
59+
5. **Accuracy**: Do the summaries match the actual file changes described?
60+
6. **Assumptions**: Are there unstated assumptions that should be made explicit?
61+
62+
## Response Format
63+
If the roadmap passes review, respond with EXACTLY this JSON:
64+
```json
65+
{"passed": true, "notes": "Self-review: [brief note on quality]"}
66+
```
67+
68+
If issues need fixing, respond with EXACTLY this JSON:
69+
```json
70+
{"passed": false, "feedback": "[specific issues to fix, be concise]"}
71+
```
72+
73+
Be rigorous but not pedantic. Only fail roadmaps with genuine issues that would confuse or mislead a reviewer.
74+
"""

review_roadmap/agent/state.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,18 @@ class ReviewState(BaseModel):
2020
2. analyze_structure: populates topology
2121
3. context_expansion: populates fetched_content (if needed)
2222
4. draft_roadmap: populates roadmap (final output)
23+
5. reflect_on_roadmap: self-reviews and optionally triggers retry
2324
2425
Attributes:
2526
pr_context: Input PR data including metadata, files, and comments.
2627
topology: Analysis of file groupings from the structure analysis node.
2728
required_context: File paths identified for fetching (intermediate).
2829
fetched_content: Additional file contents fetched for context.
2930
roadmap: The final generated Markdown roadmap (output).
31+
reflection_feedback: Feedback from self-reflection step for improvements.
32+
reflection_passed: Whether the roadmap passed self-review.
33+
reflection_iterations: Number of reflection iterations completed.
34+
skip_reflection: Whether to skip the self-reflection step entirely.
3035
"""
3136

3237
# Input
@@ -45,3 +50,17 @@ class ReviewState(BaseModel):
4550

4651
# Output
4752
roadmap: str = ""
53+
54+
# Reflection
55+
reflection_feedback: str = Field(
56+
default="", description="Feedback from self-reflection step"
57+
)
58+
reflection_passed: bool = Field(
59+
default=False, description="Whether the roadmap passed self-review"
60+
)
61+
reflection_iterations: int = Field(
62+
default=0, description="Number of reflection iterations completed"
63+
)
64+
skip_reflection: bool = Field(
65+
default=False, description="Whether to skip the self-reflection step"
66+
)

review_roadmap/main.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ def format_pr_comment(roadmap_content: str) -> str:
3636
def generate(
3737
pr_url: str = typer.Argument(..., help="GitHub PR URL (e.g., owner/repo/123) or 'owner/repo/123' string"),
3838
output: str = typer.Option(None, "--output", "-o", help="Output file for the roadmap"),
39-
post: bool = typer.Option(False, "--post", "-p", help="Post the roadmap as a comment on the PR")
39+
post: bool = typer.Option(False, "--post", "-p", help="Post the roadmap as a comment on the PR"),
40+
no_reflection: bool = typer.Option(False, "--no-reflection", help="Skip the self-reflection step")
4041
):
4142
"""Generates a review roadmap for a given Pull Request."""
4243

@@ -87,9 +88,13 @@ def generate(
8788
console.print(f"[green]Found PR: {pr_context.metadata.title} (Changed files: {len(pr_context.files)})[/green]")
8889

8990
# 2. Run LangGraph
90-
console.print("[bold purple]Generating Roadmap (this may take a minute)...[/bold purple]")
91+
if no_reflection:
92+
console.print("[bold purple]Generating Roadmap (reflection disabled)...[/bold purple]")
93+
else:
94+
console.print("[bold purple]Generating Roadmap with self-reflection (this may take a minute)...[/bold purple]")
95+
9196
graph = build_graph()
92-
initial_state = {"pr_context": pr_context}
97+
initial_state = {"pr_context": pr_context, "skip_reflection": no_reflection}
9398

9499
result = graph.invoke(initial_state)
95100
roadmap_content = result.get("roadmap", "")

0 commit comments

Comments
 (0)