Replace regex-based documentation with LLM-based code documentation review

ankushdesai · ankushdesai · commit ff6441d5d9f1 · 2026-02-27T13:11:55.000-08:00
The old approach copied text verbatim from design docs into comments,
which was redundant. The new approach uses an LLM review step (Stage 5)
that reads both the generated code and design doc, then writes insightful
comments explaining invariants, protocol steps, and design rationale.

- Remove ~500 lines of regex documentation methods from PCodePostProcessor
- Add GenerationService.review_code_documentation() as new LLM review step
- Add review_code_documentation.txt instruction prompt
- Wire into all 4 MCP generation tools and all workflow steps
- Update tests and CLAUDE.md pipeline documentation

Made-with: Cursor
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -341,6 +341,21 @@ Located in `GenerationService.review_spec_correctness()` (`Src/PeasyAI/src/core/
 
 The review can fix the spec file and, rarely, other files (e.g., types/events). Fixed files are returned in the `spec_fixes` field of the MCP response.
 
+#### Stage 5 — LLM Code Documentation Review (all files)
+
+For all generated files, an LLM-based review step adds insightful documentation comments. Unlike the previous regex-based approach (which copied text verbatim from the design doc), this step asks the LLM to write contextual comments that explain:
+
+- **Why** the code is structured the way it is (not just what it does)
+- What **invariants** are maintained by each machine/spec
+- What **protocol step** each event handler implements
+- What each **variable** tracks and why it's needed
+- What **safety property** each assertion checks
+- Non-obvious **design decisions** and tradeoffs
+
+Located in `GenerationService.review_code_documentation()` (`Src/PeasyAI/src/core/services/generation.py`), using the prompt `Src/PeasyAI/resources/instructions/review_code_documentation.txt`.
+
+The response parser (`_parse_documentation_review_response`) validates that the LLM didn't drop any machine/spec declarations from the code. If the LLM call fails or the response is malformed, the original code is returned unchanged.
+
 #### Pipeline Data Flow
 
 The `ValidationPipeline` is the **single place** where post-processing and validation happen. Both the MCP tool path and the workflow step path call it. `GenerationService._extract_p_code()` does NOT run any post-processing — it only extracts code from LLM responses.
@@ -382,14 +397,21 @@ LLM generates code
               │
               ▼
 ┌─────────────────────────────┐
+│ Stage 5: LLM Doc Review     │  all files — adds insightful documentation comments
+│  (services/generation.py)    │  explains invariants, protocol steps, design rationale
+│  review_code_documentation() │  prompt: review_code_documentation.txt
+└─────────────┬───────────────┘
+              │
+              ▼
+┌─────────────────────────────┐
 │ PipelineResult + fixes       │  is_valid, fixed_code, issues[], fixes_applied[]
 │  .to_review_dict()           │  + wiring_fixes / spec_fixes for cross-file changes
 └─────────────────────────────┘
 ```
 
 Two call sites invoke the pipeline:
-- **MCP tools**: `_review_generated_code()` in `tools/generation.py` — returns `to_review_dict()` for the MCP response. For test files, `review_test_wiring()` runs as Stage 3. For spec files, `review_spec_correctness()` runs as Stage 4.
-- **Workflow steps**: `_run_validation_pipeline()` in `workflow/p_steps.py` — returns the fixed code string
+- **MCP tools**: `_review_generated_code()` in `tools/generation.py` — returns `to_review_dict()` for the MCP response. For test files, `review_test_wiring()` runs as Stage 3. For spec files, `review_spec_correctness()` runs as Stage 4. For all files, `review_code_documentation()` runs as Stage 5.
+- **Workflow steps**: `_run_validation_pipeline()` in `workflow/p_steps.py` — returns the fixed code string. `_run_documentation_review()` runs Stage 5 afterward.
 
 #### MCP Response Severity
 
diff --git a/Src/PeasyAI/resources/instructions/review_code_documentation.txt b/Src/PeasyAI/resources/instructions/review_code_documentation.txt
@@ -0,0 +1,65 @@
+You are adding documentation comments to a generated P program file.
+
+You are given:
+1. The generated P code (already syntactically correct).
+2. The design document that describes the system being modeled.
+3. Other project files for cross-reference context.
+
+Your job is to add insightful `//` comments that help a developer understand and maintain the code long-term. Do NOT just copy text from the design document — the developer can read the design doc themselves. Instead, explain the *why* behind the code: what invariant is being maintained, what protocol step is being executed, what the tricky parts are.
+
+## COMMENT GUIDELINES
+
+### File Header
+Add a 2-4 line header at the top of the file:
+- Name the system and file's role (e.g., "Coordinator for the Two Phase Commit protocol")
+- One sentence summarizing the key responsibility or invariant this file maintains
+
+### Machine / Spec Declarations
+Above each `machine` or `spec` declaration, add a brief block comment (3-6 lines) explaining:
+- What role this component plays in the protocol
+- What key invariant or safety property it maintains or contributes to
+- Any non-obvious design decisions (e.g., "serializes concurrent requests to avoid conflicting prepares")
+
+### State Declarations
+Above each `state` declaration, add 1-2 lines explaining:
+- What phase of the protocol this state represents
+- What the machine is waiting for or doing in this state
+- Any deferred/ignored events and WHY they are deferred/ignored (not just that they are)
+
+### Variable Declarations
+Add inline comments on `var` declarations explaining:
+- What the variable tracks and WHY it's needed (not just restating the type)
+- For collections: what the keys/values represent in protocol terms
+
+### Event Handlers (`on ... do`)
+Above each event handler, add 1-2 lines explaining:
+- What protocol step this handler implements
+- What the expected outcome is (e.g., "accumulates votes; triggers commit/abort decision when all received")
+- Any non-obvious logic (e.g., "uses choose() to model non-deterministic participant failure")
+
+### Send Statements
+For important `send` statements (especially those that drive protocol transitions), add a brief inline or above-line comment explaining:
+- What protocol action this message represents
+- Why it's sent at this point
+
+### Assertions
+Above `assert` statements, explain:
+- What safety property is being checked
+- Under what conditions it could be violated
+
+### DO NOT
+- Do NOT add comments that just restate the code (e.g., `// send prepare request` above `send p, ePrepareReq`)
+- Do NOT add comments on every single line — focus on non-obvious logic
+- Do NOT change any code — only add `//` comments
+- Do NOT add comments inside type/event declaration files (Enums_Types_Events.p) beyond the file header — the type names and field names are self-documenting
+- Do NOT remove any existing comments
+
+## RESPONSE FORMAT
+
+Return the complete file with comments added, wrapped in the following format:
+
+<documented_code>
+... the full P code with comments added ...
+</documented_code>
+
+Return ONLY the documented code. Do not include analysis or explanation outside the tags.
diff --git a/Src/PeasyAI/src/core/compilation/p_post_processor.py b/Src/PeasyAI/src/core/compilation/p_post_processor.py
@@ -7,7 +7,7 @@
 
 import re
 import logging
-from typing import List, Tuple, Dict, Optional
+from typing import List, Optional
 from dataclasses import dataclass
 
 logger = logging.getLogger(__name__)
@@ -37,7 +37,12 @@ def __init__(self):
         self.fixes_applied: List[str] = []
         self.warnings: List[str] = []
     
-    def process(self, code: str, filename: str = "", is_test_file: bool = False) -> PostProcessResult:
+    def process(
+        self,
+        code: str,
+        filename: str = "",
+        is_test_file: bool = False,
+    ) -> PostProcessResult:
         """
         Process P code and fix common issues.
         
@@ -72,7 +77,7 @@ def process(self, code: str, filename: str = "", is_test_file: bool = False) ->
         if is_test_file:
             code = self._warn_timer_wired_to_this(code, filename)
             code = self._ensure_test_declarations(code, filename)
-        
+
         if code != original_code:
             logger.info(f"Post-processing applied {len(self.fixes_applied)} fix(es) to {filename or 'code'}")
         
@@ -81,7 +86,9 @@ def process(self, code: str, filename: str = "", is_test_file: bool = False) ->
             fixes_applied=self.fixes_applied,
             warnings=self.warnings
         )
-    
+
+    # ── Syntax fixes ──────────────────────────────────────────────────
+
     def _fix_trailing_comma_in_params(self, code: str) -> str:
         """
         Remove trailing commas from function, entry, and handler parameter lists.
diff --git a/Src/PeasyAI/src/core/services/generation.py b/Src/PeasyAI/src/core/services/generation.py
@@ -762,6 +762,91 @@ def review_spec_correctness(
 
         return self._parse_wiring_review_response(response.content)
 
+    # ── LLM-based code documentation review ─────────────────────────
+
+    def review_code_documentation(
+        self,
+        code: str,
+        design_doc: str,
+        context_files: Optional[Dict[str, str]] = None,
+    ) -> Optional[str]:
+        """
+        Use the LLM to add insightful documentation comments to generated P code.
+
+        Unlike regex-based approaches that copy text verbatim from the design
+        doc, this asks the LLM to write contextual comments explaining *why*
+        the code is structured the way it is, what invariants are maintained,
+        and what protocol steps are being implemented.
+
+        Returns the documented code string, or None if the LLM call fails.
+        """
+        self._status("Adding documentation comments via LLM…")
+
+        instruction = self._load_static_instruction("review_code_documentation.txt")
+        messages: List[Message] = []
+
+        if context_files:
+            messages.extend(self._compact_context_messages(context_files))
+
+        messages.append(Message(
+            role=MessageRole.USER,
+            content=f"<design_document>\n{self._compact_design_doc(design_doc)}\n</design_document>",
+        ))
+
+        messages.append(Message(
+            role=MessageRole.USER,
+            content=f"<code_to_document>\n{code}\n</code_to_document>",
+        ))
+
+        messages.append(Message(
+            role=MessageRole.USER,
+            content=instruction,
+        ))
+
+        system_prompt = (
+            "You are an expert P language developer adding documentation "
+            "comments to generated code. Write comments that explain the "
+            "reasoning, invariants, and protocol semantics — not comments "
+            "that merely restate what the code does."
+        )
+        config = LLMConfig(max_tokens=8192)
+        try:
+            response = self.llm.complete(messages, config, system_prompt)
+        except Exception as e:
+            logger.warning(f"Documentation review LLM call failed: {e}")
+            return None
+
+        return self._parse_documentation_review_response(response.content, code)
+
+    @staticmethod
+    def _parse_documentation_review_response(
+        content: str, original_code: str
+    ) -> Optional[str]:
+        """Extract documented code from the LLM response."""
+        match = re.search(
+            r"<documented_code>(.*?)</documented_code>", content, re.DOTALL
+        )
+        if not match:
+            logger.warning("Documentation review response missing <documented_code> tags")
+            return None
+
+        documented = match.group(1).strip()
+        if not documented:
+            return None
+
+        # Sanity check: the documented code should contain the same machine/spec
+        # declarations as the original (the LLM should not have changed the code)
+        orig_machines = set(re.findall(r'\b(?:machine|spec)\s+(\w+)', original_code))
+        doc_machines = set(re.findall(r'\b(?:machine|spec)\s+(\w+)', documented))
+        if orig_machines and not orig_machines.issubset(doc_machines):
+            logger.warning(
+                f"Documentation review dropped declarations: "
+                f"expected {orig_machines}, got {doc_machines}"
+            )
+            return None
+
+        return documented
+
     def generate_machines_parallel(
         self,
         machine_names: List[str],
diff --git a/Src/PeasyAI/src/core/validation/pipeline.py b/Src/PeasyAI/src/core/validation/pipeline.py
@@ -198,7 +198,8 @@ def validate(
                 from ..compilation.p_post_processor import PCodePostProcessor
                 processor = PCodePostProcessor()
                 pp_result = processor.process(
-                    current_code, filename, is_test_file=is_test_file
+                    current_code, filename,
+                    is_test_file=is_test_file,
                 )
                 current_code = pp_result.code
                 fixes_applied.extend(pp_result.fixes_applied)
diff --git a/Src/PeasyAI/src/core/workflow/p_steps.py b/Src/PeasyAI/src/core/workflow/p_steps.py
@@ -51,6 +51,30 @@ def _run_validation_pipeline(
         return code
 
 
+def _run_documentation_review(
+    service: 'GenerationService',
+    code: str,
+    design_doc: str,
+    context_files: Optional[Dict[str, str]] = None,
+) -> str:
+    """Run LLM-based documentation review and return the documented code.
+
+    Falls back to the original code if the LLM call fails.
+    """
+    try:
+        documented = service.review_code_documentation(
+            code=code,
+            design_doc=design_doc,
+            context_files=context_files,
+        )
+        if documented:
+            logger.info("Documentation review added comments")
+            return documented
+    except Exception as e:
+        logger.warning(f"Documentation review failed: {e}")
+    return code
+
+
 class CreateProjectStructureStep(WorkflowStep):
     """Step to create P project directory structure."""
     
@@ -124,7 +148,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
             if result.success:
                 filename = result.filename or "Enums_Types_Events.p"
                 code = _run_validation_pipeline(
-                    result.code, filename, project_path
+                    result.code, filename, project_path,
+                )
+                code = _run_documentation_review(
+                    self.service, code, design_doc,
+                    context_files=context.get("context_files"),
                 )
                 return StepResult.success(
                     output={
@@ -211,7 +239,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
             if result.success:
                 filename = result.filename or f"{self.machine_name}.p"
                 code = _run_validation_pipeline(
-                    result.code, filename, project_path
+                    result.code, filename, project_path,
+                )
+                code = _run_documentation_review(
+                    self.service, code, design_doc,
+                    context_files=context_files,
                 )
                 return StepResult.success(
                     output={
@@ -282,7 +314,11 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
             if result.success:
                 filename = result.filename or f"{self.spec_name}.p"
                 code = _run_validation_pipeline(
-                    result.code, filename, project_path
+                    result.code, filename, project_path,
+                )
+                code = _run_documentation_review(
+                    self.service, code, design_doc,
+                    context_files=context_files,
                 )
                 return StepResult.success(
                     output={
@@ -381,6 +417,10 @@ def execute(self, context: Dict[str, Any]) -> StepResult:
                     result.code, filename, project_path,
                     is_test_file=True,
                 )
+                code = _run_documentation_review(
+                    self.service, code, design_doc,
+                    context_files=context_files,
+                )
                 return StepResult.success(
                     output={
                         f"test_code_{self.test_name}": code,
diff --git a/Src/PeasyAI/src/ui/mcp/tools/generation.py b/Src/PeasyAI/src/ui/mcp/tools/generation.py
diff --git a/Src/PeasyAI/tests/test_validation.py b/Src/PeasyAI/tests/test_validation.py