Skip to content

Commit ff0dc3a

Browse files
committed
Balanced: soft candidates + strict P2 validator filtering
1 parent 33bd45e commit ff0dc3a

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

src/create-prompt/templates/review-candidates-prompt.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,7 @@ Precomputed data files:
6767
- Offset/cursor/pagination semantic mismatches (off-by-one, prev/next behavior, commit semantics)
6868
- Do NOT duplicate comments already in \`${commentsPath}\`.
6969
- Only flag issues you are confident about—avoid speculative or stylistic nitpicks.
70+
- **Confidence calibration:** For each finding, honestly assess how certain you are. Mark findings as P0 only if you are virtually certain of a crash/exploit. Mark as P1 for high-confidence correctness/security issues. Use P2 for findings where the bug is plausible but you cannot fully verify the trigger path from the available context. This severity rating will be used downstream for filtering.
7071
</review_guidelines>
7172
7273
<triage_phase>

src/create-prompt/templates/review-validator-prompt.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,12 +100,22 @@ Apply the same Reporting Gate as review:
100100
* Data corruption/loss
101101
* Breaking contract change (discoverable in code/tests)
102102
103-
Reject if:
103+
Reject if ANY of these are true:
104104
* It's speculative / "might" without a concrete trigger
105105
* It's stylistic / naming / formatting
106106
* It's not anchored to a valid changed line
107107
* It's already reported (dedupe against existing comments)
108108
* The anchor (path/side/line/startLine) would need to change to make the suggestion work — reject instead
109+
* It flags missing error handling / try-catch for a code path that won't crash in practice (e.g., the caller already handles the error, or the input is validated upstream)
110+
* It describes a hypothetical race condition or timing issue without identifying the specific concurrent access pattern that triggers it
111+
* It's about code that appears in the diff but is not part of the PR's primary change — e.g., adjacent functions, unrelated files in a multi-subsystem PR, or code from a different PR's changes that happen to be visible in context
112+
113+
### Confidence-based filtering
114+
115+
Pay attention to the candidate's priority level:
116+
- **P0 findings**: Approve if the trigger path checks out. These should be definite crashes/exploits.
117+
- **P1 findings**: Approve if you can verify the logic error or security issue is real.
118+
- **P2 findings**: Reject by default. Only approve a P2 finding if ALL of these are true: (1) you can independently verify the bug exists by examining the code, (2) the bug has a concrete trigger that a user or caller could realistically hit, and (3) the finding is NOT about edge cases, defensive coding, or style. When in doubt about a P2, reject it.
109119
110120
### Deduplication (STRICT)
111121

0 commit comments

Comments
 (0)