Complete E23-T4 pr-review verification coverage (#109)

dmoliveira · web-flow · commit 02b29df73505 · 2026-02-14T10:20:21.000+11:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -151,6 +151,7 @@ All notable changes to this project are documented in this file.
 - Moved `plan_execution` runtime persistence out of `opencode.json` into `~/.config/opencode/my_opencode/runtime/plan_execution.json` to prevent OpenCode startup failures caused by unrecognized top-level config keys.
 - Added selftest coverage for `/pr-review` analyzer missing-evidence and blocker-evidence decision paths, and marked Task 23.2 complete in the roadmap.
 - Integrated `pr-review` checks into unified `/doctor`, updated installer self-check/hints for PR review workflows, and marked Task 23.3 complete in the roadmap.
+- Expanded pr-review verification coverage for risk-detection false-positive control and missing-evidence behavior, and marked Epic 23 Task 23.4/exit criteria complete in the roadmap.
 
 ## v0.2.0 - 2026-02-12
 
diff --git a/IMPLEMENTATION_ROADMAP.md b/IMPLEMENTATION_ROADMAP.md
@@ -59,7 +59,7 @@ Use this map to avoid overlapping implementations.
 | E20 | Execution Budget Guardrails | done | High | E2, E11 | bd-63f | Bound time/tool/token usage for autonomous runs |
 | E21 | Bounded Loop Mode Presets | merged | Medium | E22, E28 | TBD | Merged into E22/E28 loop controls |
 | E22 | Autoflow Unified Orchestration Command | done | High | E14, E15, E17, E19, E20 | TBD | One command for plan-run-resume-report lifecycle |
-| E23 | PR Review Copilot | in_progress | High | E3 | bd-1hc | Pre-PR quality, output, and risk review automation |
+| E23 | PR Review Copilot | done | High | E3 | bd-u6t | Pre-PR quality, output, and risk review automation |
 | E24 | Release Train Assistant | planned | High | E14, E23 | TBD | Validate, draft, and gate releases reliably |
 | E25 | Incident Hotfix Mode | planned | Medium | E20, E22 | TBD | Constrained emergency workflow with strict safety |
 | E26 | Repo Health Score and Drift Monitor | planned | Medium | E9, E12, E20 | TBD | Operational visibility and continuous diagnostics |
@@ -809,7 +809,7 @@ Every command-oriented epic must ship all of the following:
 
 ## Epic 23 - PR Review Copilot
 
-**Status:** `in_progress`
+**Status:** `done`
 **Priority:** High
 **Goal:** Add a command that reviews pending PR changes for risk, quality, and release readiness before merge.
 **Depends on:** Epic 3
@@ -829,12 +829,13 @@ Every command-oriented epic must ship all of the following:
   - [x] Subtask 23.3.2: Integrate with pre-merge checklist and doctor output
   - [x] Subtask 23.3.3: Document triage flow for warnings vs blockers
   - [x] Notes: Added `scripts/pr_review_command.py` command surface with concise/JSON output and `checklist`/`doctor` subcommands, wired aliases in `opencode.json`, integrated `pr-review` into unified doctor checks, and documented blocker-vs-warning triage guidance in README.
-- [ ] Task 23.4: Verification
-  - [ ] Subtask 23.4.1: Add tests for risk detection and false positive control
-  - [ ] Subtask 23.4.2: Add tests for missing-evidence behavior
-  - [ ] Subtask 23.4.3: Add install-test smoke checks
-- [ ] Exit criteria: copilot catches high-risk omissions before merge
-- [ ] Exit criteria: outputs are actionable and low-noise in default mode
+- [x] Task 23.4: Verification
+  - [x] Subtask 23.4.1: Add tests for risk detection and false positive control
+  - [x] Subtask 23.4.2: Add tests for missing-evidence behavior
+  - [x] Subtask 23.4.3: Add install-test smoke checks
+  - [x] Notes: Expanded `scripts/selftest.py` with docs-only false-positive guard assertions and tested-source-change missing-evidence checks, and installer smoke now exercises `/pr-review`, `/pr-review checklist`, and `/pr-review doctor` workflows.
+- [x] Exit criteria: copilot catches high-risk omissions before merge
+- [x] Exit criteria: outputs are actionable and low-noise in default mode
 
 ---
 
diff --git a/README.md b/README.md
@@ -255,6 +255,12 @@ Examples:
 /pr-review doctor --json
 ```
 
+Task 23.4 verification notes:
+
+- selftest validates blocker detection for hard-evidence security findings and missing-evidence recommendation behavior.
+- selftest validates false-positive control for docs-only diffs (`recommendation=approve`, no findings).
+- install smoke validates `/pr-review`, `/pr-review checklist`, and `/pr-review doctor` command paths.
+
 ## Installed plugin stack 🔌
 
 - `@mohak34/opencode-notifier@latest` - desktop and sound alerts for completion, errors, and permission prompts.
diff --git a/scripts/selftest.py b/scripts/selftest.py
@@ -1160,6 +1160,92 @@ def run_bg(*args: str) -> subprocess.CompletedProcess[str]:
             "pr-review doctor should confirm analyzer readiness",
         )
 
+        analyzer_docs_only_diff = tmp / "pr_review_docs_only.diff"
+        analyzer_docs_only_diff.write_text(
+            """diff --git a/README.md b/README.md
+index 1111111..2222222 100644
+--- a/README.md
++++ b/README.md
+@@ -10,0 +11,2 @@
++## Notes
++Updated documentation only.
+""",
+            encoding="utf-8",
+        )
+        analyzer_docs_only = subprocess.run(
+            [
+                sys.executable,
+                str(PR_REVIEW_ANALYZER_SCRIPT),
+                "analyze",
+                "--diff-file",
+                str(analyzer_docs_only_diff),
+                "--json",
+            ],
+            capture_output=True,
+            text=True,
+            env=refactor_env,
+            check=False,
+            cwd=REPO_ROOT,
+        )
+        expect(
+            analyzer_docs_only.returncode == 0,
+            "pr-review analyzer should parse docs-only diff",
+        )
+        analyzer_docs_only_report = parse_json_output(analyzer_docs_only.stdout)
+        expect(
+            analyzer_docs_only_report.get("recommendation") == "approve",
+            "pr-review analyzer should avoid false positives for docs-only changes",
+        )
+        expect(
+            not analyzer_docs_only_report.get("findings"),
+            "pr-review analyzer should keep docs-only default output low-noise",
+        )
+
+        analyzer_tested_change_diff = tmp / "pr_review_tested_change.diff"
+        analyzer_tested_change_diff.write_text(
+            """diff --git a/scripts/calc.py b/scripts/calc.py
+index 1111111..2222222 100644
+--- a/scripts/calc.py
++++ b/scripts/calc.py
+@@ -1,0 +1,2 @@
++def calc_total(values):
++    return sum(values)
+diff --git a/tests/test_calc.py b/tests/test_calc.py
+index 3333333..4444444 100644
+--- a/tests/test_calc.py
++++ b/tests/test_calc.py
+@@ -1,0 +1,2 @@
++def test_calc_total():
++    assert True
+""",
+            encoding="utf-8",
+        )
+        analyzer_tested_change = subprocess.run(
+            [
+                sys.executable,
+                str(PR_REVIEW_ANALYZER_SCRIPT),
+                "analyze",
+                "--diff-file",
+                str(analyzer_tested_change_diff),
+                "--json",
+            ],
+            capture_output=True,
+            text=True,
+            env=refactor_env,
+            check=False,
+            cwd=REPO_ROOT,
+        )
+        expect(
+            analyzer_tested_change.returncode == 0,
+            "pr-review analyzer should parse tested source-change diff",
+        )
+        analyzer_tested_change_report = parse_json_output(analyzer_tested_change.stdout)
+        expect(
+            "tests"
+            not in set(analyzer_tested_change_report.get("missing_evidence", [])),
+            "pr-review analyzer should not report missing tests when test files changed",
+        )
+
         result = subprocess.run(
             [
                 sys.executable,