Skip to content

Commit ff6441d

Browse files
committed
Replace regex-based documentation with LLM-based code documentation review
The old approach copied text verbatim from design docs into comments, which was redundant. The new approach uses an LLM review step (Stage 5) that reads both the generated code and design doc, then writes insightful comments explaining invariants, protocol steps, and design rationale. - Remove ~500 lines of regex documentation methods from PCodePostProcessor - Add GenerationService.review_code_documentation() as new LLM review step - Add review_code_documentation.txt instruction prompt - Wire into all 4 MCP generation tools and all workflow steps - Update tests and CLAUDE.md pipeline documentation Made-with: Cursor
1 parent fb4cc7a commit ff6441d

File tree

8 files changed

+375
-13
lines changed

8 files changed

+375
-13
lines changed

CLAUDE.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,21 @@ Located in `GenerationService.review_spec_correctness()` (`Src/PeasyAI/src/core/
341341

342342
The review can fix the spec file and, rarely, other files (e.g., types/events). Fixed files are returned in the `spec_fixes` field of the MCP response.
343343

344+
#### Stage 5 — LLM Code Documentation Review (all files)
345+
346+
For all generated files, an LLM-based review step adds insightful documentation comments. Unlike the previous regex-based approach (which copied text verbatim from the design doc), this step asks the LLM to write contextual comments that explain:
347+
348+
- **Why** the code is structured the way it is (not just what it does)
349+
- What **invariants** are maintained by each machine/spec
350+
- What **protocol step** each event handler implements
351+
- What each **variable** tracks and why it's needed
352+
- What **safety property** each assertion checks
353+
- Non-obvious **design decisions** and tradeoffs
354+
355+
Located in `GenerationService.review_code_documentation()` (`Src/PeasyAI/src/core/services/generation.py`), using the prompt `Src/PeasyAI/resources/instructions/review_code_documentation.txt`.
356+
357+
The response parser (`_parse_documentation_review_response`) validates that the LLM didn't drop any machine/spec declarations from the code. If the LLM call fails or the response is malformed, the original code is returned unchanged.
358+
344359
#### Pipeline Data Flow
345360

346361
The `ValidationPipeline` is the **single place** where post-processing and validation happen. Both the MCP tool path and the workflow step path call it. `GenerationService._extract_p_code()` does NOT run any post-processing — it only extracts code from LLM responses.
@@ -382,14 +397,21 @@ LLM generates code
382397
383398
384399
┌─────────────────────────────┐
400+
│ Stage 5: LLM Doc Review │ all files — adds insightful documentation comments
401+
│ (services/generation.py) │ explains invariants, protocol steps, design rationale
402+
│ review_code_documentation() │ prompt: review_code_documentation.txt
403+
└─────────────┬───────────────┘
404+
405+
406+
┌─────────────────────────────┐
385407
│ PipelineResult + fixes │ is_valid, fixed_code, issues[], fixes_applied[]
386408
│ .to_review_dict() │ + wiring_fixes / spec_fixes for cross-file changes
387409
└─────────────────────────────┘
388410
```
389411

390412
Two call sites invoke the pipeline:
391-
- **MCP tools**: `_review_generated_code()` in `tools/generation.py` — returns `to_review_dict()` for the MCP response. For test files, `review_test_wiring()` runs as Stage 3. For spec files, `review_spec_correctness()` runs as Stage 4.
392-
- **Workflow steps**: `_run_validation_pipeline()` in `workflow/p_steps.py` — returns the fixed code string
413+
- **MCP tools**: `_review_generated_code()` in `tools/generation.py` — returns `to_review_dict()` for the MCP response. For test files, `review_test_wiring()` runs as Stage 3. For spec files, `review_spec_correctness()` runs as Stage 4. For all files, `review_code_documentation()` runs as Stage 5.
414+
- **Workflow steps**: `_run_validation_pipeline()` in `workflow/p_steps.py` — returns the fixed code string. `_run_documentation_review()` runs Stage 5 afterward.
393415

394416
#### MCP Response Severity
395417

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
You are adding documentation comments to a generated P program file.
2+
3+
You are given:
4+
1. The generated P code (already syntactically correct).
5+
2. The design document that describes the system being modeled.
6+
3. Other project files for cross-reference context.
7+
8+
Your job is to add insightful `//` comments that help a developer understand and maintain the code long-term. Do NOT just copy text from the design document — the developer can read the design doc themselves. Instead, explain the *why* behind the code: what invariant is being maintained, what protocol step is being executed, what the tricky parts are.
9+
10+
## COMMENT GUIDELINES
11+
12+
### File Header
13+
Add a 2-4 line header at the top of the file:
14+
- Name the system and file's role (e.g., "Coordinator for the Two Phase Commit protocol")
15+
- One sentence summarizing the key responsibility or invariant this file maintains
16+
17+
### Machine / Spec Declarations
18+
Above each `machine` or `spec` declaration, add a brief block comment (3-6 lines) explaining:
19+
- What role this component plays in the protocol
20+
- What key invariant or safety property it maintains or contributes to
21+
- Any non-obvious design decisions (e.g., "serializes concurrent requests to avoid conflicting prepares")
22+
23+
### State Declarations
24+
Above each `state` declaration, add 1-2 lines explaining:
25+
- What phase of the protocol this state represents
26+
- What the machine is waiting for or doing in this state
27+
- Any deferred/ignored events and WHY they are deferred/ignored (not just that they are)
28+
29+
### Variable Declarations
30+
Add inline comments on `var` declarations explaining:
31+
- What the variable tracks and WHY it's needed (not just restating the type)
32+
- For collections: what the keys/values represent in protocol terms
33+
34+
### Event Handlers (`on ... do`)
35+
Above each event handler, add 1-2 lines explaining:
36+
- What protocol step this handler implements
37+
- What the expected outcome is (e.g., "accumulates votes; triggers commit/abort decision when all received")
38+
- Any non-obvious logic (e.g., "uses choose() to model non-deterministic participant failure")
39+
40+
### Send Statements
41+
For important `send` statements (especially those that drive protocol transitions), add a brief inline or above-line comment explaining:
42+
- What protocol action this message represents
43+
- Why it's sent at this point
44+
45+
### Assertions
46+
Above `assert` statements, explain:
47+
- What safety property is being checked
48+
- Under what conditions it could be violated
49+
50+
### DO NOT
51+
- Do NOT add comments that just restate the code (e.g., `// send prepare request` above `send p, ePrepareReq`)
52+
- Do NOT add comments on every single line — focus on non-obvious logic
53+
- Do NOT change any code — only add `//` comments
54+
- Do NOT add comments inside type/event declaration files (Enums_Types_Events.p) beyond the file header — the type names and field names are self-documenting
55+
- Do NOT remove any existing comments
56+
57+
## RESPONSE FORMAT
58+
59+
Return the complete file with comments added, wrapped in the following format:
60+
61+
<documented_code>
62+
... the full P code with comments added ...
63+
</documented_code>
64+
65+
Return ONLY the documented code. Do not include analysis or explanation outside the tags.

Src/PeasyAI/src/core/compilation/p_post_processor.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
import re
99
import logging
10-
from typing import List, Tuple, Dict, Optional
10+
from typing import List, Optional
1111
from dataclasses import dataclass
1212

1313
logger = logging.getLogger(__name__)
@@ -37,7 +37,12 @@ def __init__(self):
3737
self.fixes_applied: List[str] = []
3838
self.warnings: List[str] = []
3939

40-
def process(self, code: str, filename: str = "", is_test_file: bool = False) -> PostProcessResult:
40+
def process(
41+
self,
42+
code: str,
43+
filename: str = "",
44+
is_test_file: bool = False,
45+
) -> PostProcessResult:
4146
"""
4247
Process P code and fix common issues.
4348
@@ -72,7 +77,7 @@ def process(self, code: str, filename: str = "", is_test_file: bool = False) ->
7277
if is_test_file:
7378
code = self._warn_timer_wired_to_this(code, filename)
7479
code = self._ensure_test_declarations(code, filename)
75-
80+
7681
if code != original_code:
7782
logger.info(f"Post-processing applied {len(self.fixes_applied)} fix(es) to {filename or 'code'}")
7883

@@ -81,7 +86,9 @@ def process(self, code: str, filename: str = "", is_test_file: bool = False) ->
8186
fixes_applied=self.fixes_applied,
8287
warnings=self.warnings
8388
)
84-
89+
90+
# ── Syntax fixes ──────────────────────────────────────────────────
91+
8592
def _fix_trailing_comma_in_params(self, code: str) -> str:
8693
"""
8794
Remove trailing commas from function, entry, and handler parameter lists.

Src/PeasyAI/src/core/services/generation.py

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -762,6 +762,91 @@ def review_spec_correctness(
762762

763763
return self._parse_wiring_review_response(response.content)
764764

765+
# ── LLM-based code documentation review ─────────────────────────
766+
767+
def review_code_documentation(
768+
self,
769+
code: str,
770+
design_doc: str,
771+
context_files: Optional[Dict[str, str]] = None,
772+
) -> Optional[str]:
773+
"""
774+
Use the LLM to add insightful documentation comments to generated P code.
775+
776+
Unlike regex-based approaches that copy text verbatim from the design
777+
doc, this asks the LLM to write contextual comments explaining *why*
778+
the code is structured the way it is, what invariants are maintained,
779+
and what protocol steps are being implemented.
780+
781+
Returns the documented code string, or None if the LLM call fails.
782+
"""
783+
self._status("Adding documentation comments via LLM…")
784+
785+
instruction = self._load_static_instruction("review_code_documentation.txt")
786+
messages: List[Message] = []
787+
788+
if context_files:
789+
messages.extend(self._compact_context_messages(context_files))
790+
791+
messages.append(Message(
792+
role=MessageRole.USER,
793+
content=f"<design_document>\n{self._compact_design_doc(design_doc)}\n</design_document>",
794+
))
795+
796+
messages.append(Message(
797+
role=MessageRole.USER,
798+
content=f"<code_to_document>\n{code}\n</code_to_document>",
799+
))
800+
801+
messages.append(Message(
802+
role=MessageRole.USER,
803+
content=instruction,
804+
))
805+
806+
system_prompt = (
807+
"You are an expert P language developer adding documentation "
808+
"comments to generated code. Write comments that explain the "
809+
"reasoning, invariants, and protocol semantics — not comments "
810+
"that merely restate what the code does."
811+
)
812+
config = LLMConfig(max_tokens=8192)
813+
try:
814+
response = self.llm.complete(messages, config, system_prompt)
815+
except Exception as e:
816+
logger.warning(f"Documentation review LLM call failed: {e}")
817+
return None
818+
819+
return self._parse_documentation_review_response(response.content, code)
820+
821+
@staticmethod
822+
def _parse_documentation_review_response(
823+
content: str, original_code: str
824+
) -> Optional[str]:
825+
"""Extract documented code from the LLM response."""
826+
match = re.search(
827+
r"<documented_code>(.*?)</documented_code>", content, re.DOTALL
828+
)
829+
if not match:
830+
logger.warning("Documentation review response missing <documented_code> tags")
831+
return None
832+
833+
documented = match.group(1).strip()
834+
if not documented:
835+
return None
836+
837+
# Sanity check: the documented code should contain the same machine/spec
838+
# declarations as the original (the LLM should not have changed the code)
839+
orig_machines = set(re.findall(r'\b(?:machine|spec)\s+(\w+)', original_code))
840+
doc_machines = set(re.findall(r'\b(?:machine|spec)\s+(\w+)', documented))
841+
if orig_machines and not orig_machines.issubset(doc_machines):
842+
logger.warning(
843+
f"Documentation review dropped declarations: "
844+
f"expected {orig_machines}, got {doc_machines}"
845+
)
846+
return None
847+
848+
return documented
849+
765850
def generate_machines_parallel(
766851
self,
767852
machine_names: List[str],

Src/PeasyAI/src/core/validation/pipeline.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,8 @@ def validate(
198198
from ..compilation.p_post_processor import PCodePostProcessor
199199
processor = PCodePostProcessor()
200200
pp_result = processor.process(
201-
current_code, filename, is_test_file=is_test_file
201+
current_code, filename,
202+
is_test_file=is_test_file,
202203
)
203204
current_code = pp_result.code
204205
fixes_applied.extend(pp_result.fixes_applied)

Src/PeasyAI/src/core/workflow/p_steps.py

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,30 @@ def _run_validation_pipeline(
5151
return code
5252

5353

54+
def _run_documentation_review(
55+
service: 'GenerationService',
56+
code: str,
57+
design_doc: str,
58+
context_files: Optional[Dict[str, str]] = None,
59+
) -> str:
60+
"""Run LLM-based documentation review and return the documented code.
61+
62+
Falls back to the original code if the LLM call fails.
63+
"""
64+
try:
65+
documented = service.review_code_documentation(
66+
code=code,
67+
design_doc=design_doc,
68+
context_files=context_files,
69+
)
70+
if documented:
71+
logger.info("Documentation review added comments")
72+
return documented
73+
except Exception as e:
74+
logger.warning(f"Documentation review failed: {e}")
75+
return code
76+
77+
5478
class CreateProjectStructureStep(WorkflowStep):
5579
"""Step to create P project directory structure."""
5680

@@ -124,7 +148,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
124148
if result.success:
125149
filename = result.filename or "Enums_Types_Events.p"
126150
code = _run_validation_pipeline(
127-
result.code, filename, project_path
151+
result.code, filename, project_path,
152+
)
153+
code = _run_documentation_review(
154+
self.service, code, design_doc,
155+
context_files=context.get("context_files"),
128156
)
129157
return StepResult.success(
130158
output={
@@ -211,7 +239,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
211239
if result.success:
212240
filename = result.filename or f"{self.machine_name}.p"
213241
code = _run_validation_pipeline(
214-
result.code, filename, project_path
242+
result.code, filename, project_path,
243+
)
244+
code = _run_documentation_review(
245+
self.service, code, design_doc,
246+
context_files=context_files,
215247
)
216248
return StepResult.success(
217249
output={
@@ -282,7 +314,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
282314
if result.success:
283315
filename = result.filename or f"{self.spec_name}.p"
284316
code = _run_validation_pipeline(
285-
result.code, filename, project_path
317+
result.code, filename, project_path,
318+
)
319+
code = _run_documentation_review(
320+
self.service, code, design_doc,
321+
context_files=context_files,
286322
)
287323
return StepResult.success(
288324
output={
@@ -381,6 +417,10 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
381417
result.code, filename, project_path,
382418
is_test_file=True,
383419
)
420+
code = _run_documentation_review(
421+
self.service, code, design_doc,
422+
context_files=context_files,
423+
)
384424
return StepResult.success(
385425
output={
386426
f"test_code_{self.test_name}": code,

0 commit comments

Comments
 (0)