Skip to content

Commit e8aceef

Browse files
committed
fix: adopt superpowers patterns — review reception, debugging, self-review, adversarial framing
- Add code-review-reception rule (verify-before-implementing, YAGNI checks, no performative agreement) - Enhance debugging with defense-in-depth, root-cause tracing, condition-based waiting - Add self-review checklist (step 9) to spec-implement workflow - Add adversarial framing to all 3 spec-reviewer agents - Expand testing anti-patterns (test-only methods, mocking without understanding) - Add CSO/description trap guidance to learn command - Add conditional brainstorming step (1.3b) to spec-plan workflow
1 parent 325de9b commit e8aceef

File tree

12 files changed

+172
-22
lines changed

12 files changed

+172
-22
lines changed

pilot/agents/spec-reviewer-compliance.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ The orchestrator provides:
2121
- `test_framework_constraints` (optional): What the test framework can/cannot test
2222
- `plan_risks`: Risks and mitigations from the plan
2323

24+
## ⛔ Adversarial Posture
25+
26+
Plan tasks may be marked done without fully meeting their Definition of Done criteria. Do not trust self-reported completion status — verify every DoD criterion independently against the actual code. The implementation may be incomplete or optimistic.
27+
2428
## Verification Workflow (FOLLOW THIS ORDER EXACTLY)
2529

2630
1. **Read the plan file completely** - This is your source of truth

pilot/agents/spec-reviewer-goal.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ Detect the project language from:
3939
- If changed files include `.sh`, `.bash`, `.py` hook scripts → apply Shell/Python patterns
4040
- Mixed projects: apply patterns per-file
4141

42+
#### ⛔ Adversarial Posture
43+
44+
For code changes: artifacts may exist but be stubs, disconnected, or only partially wired together. Verify that components are substantive and actually connected — not just present. For Markdown-only changes: verify content is substantive and placed correctly — wiring checks are not applicable.
45+
4246
### Step 1: Goal-Backward Truth Derivation
4347

4448
Read the plan file completely. Then follow this preference order:

pilot/agents/spec-reviewer-quality.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ permissionMode: plan
1111

1212
You review implementation code for quality, security, testing, performance, and error handling. Your job is to find real issues that would cause problems in production: bugs, security vulnerabilities, poor error handling, missing tests, and performance problems.
1313

14+
## ⛔ Adversarial Posture
15+
16+
Code that passes tests can still have quality, security, or performance issues. Do not trust passing tests as proof of quality — verify independently that error handling, edge cases, and security boundaries are sound. The implementation may look correct while hiding real problems.
17+
1418
## ⛔ MANDATORY FIRST STEP: Read ALL Rules
1519

1620
**Before reviewing ANY code, you MUST read all rule files. This is NON-NEGOTIABLE.**

pilot/commands/learn.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: Extract reusable knowledge into skills - online learning system
2+
description: Use after significant debugging, workarounds, or multi-step workflows worth standardizing for future sessions
33
model: sonnet
44
---
55

@@ -71,9 +71,9 @@ Skills must be:
7171
name: descriptive-kebab-case-name
7272
description: |
7373
[CRITICAL: This determines when the skill triggers. Include:]
74-
- What the skill does
74+
- When to use it (trigger conditions, scenarios — NOT what the skill does)
7575
- Specific trigger conditions (exact error messages, symptoms)
76-
- When to use it (contexts, scenarios)
76+
- Contexts where it applies
7777
author: Claude Code
7878
version: 1.0.0
7979
---
@@ -106,11 +106,13 @@ version: 1.0.0
106106
```
107107

108108
<details>
109-
<summary>Writing Effective Descriptions</summary>
109+
<summary>Writing Effective Descriptions (⚠️ The Description Trap)</summary>
110110

111-
The description field is CRITICAL for skill discovery:
111+
The description field is CRITICAL for skill discovery. **Describe WHEN to use the skill, NEVER summarize HOW it works.**
112112

113-
**Good:**
113+
**Why this matters:** If the description summarizes the workflow/process, Claude follows the short description as a shortcut instead of reading the full SKILL.md. The skill's detailed steps, examples, and nuances get ignored.
114+
115+
**Good** (trigger conditions):
114116

115117
```yaml
116118
description: |
@@ -119,10 +121,10 @@ description: |
119121
not in packages, (3) symlinked dependencies cause failures.
120122
```
121123
122-
**Bad:**
124+
**Bad** (workflow summary — Claude will follow this instead of reading the skill):
123125
124126
```yaml
125-
description: Helps with npm problems in monorepos.
127+
description: Extract and organize npm monorepo fixes by analyzing symlinks and paths.
126128
```
127129
128130
</details>

pilot/commands/spec-implement.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -206,14 +206,19 @@ TaskCreate: "Task 4: Add documentation" → id=4, addBlockedBy: [2]
206206
6. **Run actual program** - Use the plan's Runtime Environment section to start the service/program. Show real output with sample data. Check port availability first: `lsof -i :<port>`. **If using `playwright-cli`, use session isolation:** `playwright-cli -s="${PILOT_SESSION_ID:-default}" open <url>` (see `playwright-cli.md`). Always close the session after verification.
207207
7. **Check diagnostics** - Must be zero errors
208208
8. **Validate Definition of Done** - Check all criteria from plan
209-
9. **Per-task commit (worktree mode only)** - If `Worktree: Yes` in the plan, commit task changes immediately:
209+
9. **Self-Review (Fresh Eyes)** - Before committing, re-examine your changes as if seeing them for the first time:
210+
- **Completeness:** Did I miss any requirements from the task objective?
211+
- **Quality:** Are names clear? Is the code readable without comments explaining what it does?
212+
- **Discipline:** Did I add anything not in the task scope? (YAGNI — remove it)
213+
- **Testing:** Do tests verify behavior (inputs → outputs), not implementation details (mock calls)?
214+
10. **Per-task commit (worktree mode only)** - If `Worktree: Yes` in the plan, commit task changes immediately:
210215
```bash
211216
git add <task-specific-files> # Stage only files related to this task
212217
git commit -m "{type}(spec): {task-name}"
213218
```
214219
Use `feat(spec):` for new features, `fix(spec):` for bug fixes, `test(spec):` for test-only tasks, `refactor(spec):` for refactoring. Skip this step when `Worktree: No` (normal git rules apply).
215-
10. **Mark task as `completed`** - `TaskUpdate(taskId="<id>", status="completed")`
216-
11. **UPDATE PLAN FILE IMMEDIATELY** (see Step 2.4)
220+
11. **Mark task as `completed`** - `TaskUpdate(taskId="<id>", status="completed")`
221+
12. **UPDATE PLAN FILE IMMEDIATELY** (see Step 2.4)
217222

218223
**⚠️ NEVER SKIP TASKS:**
219224

pilot/commands/spec-plan.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -300,6 +300,36 @@ If the task is clear and unambiguous with no meaningful gray areas, skip directl
300300
3. Identify integration points and potential risks
301301
4. Note reusable patterns
302302
303+
### Step 1.3b: Present Findings & Brainstorm (Scope Selection) — CONDITIONAL
304+
305+
**Only do this step when exploration revealed multiple possible directions or the scope is ambiguous.** Skip for straightforward tasks where the scope is clear from the user's request (e.g., "fix this bug", "add a logout button", "refactor the auth module").
306+
307+
**Triggers for this step:**
308+
- Task is exploratory or open-ended ("analyze X and see what we can improve", "find gaps in our system")
309+
- Exploration uncovered significantly more opportunities than the user likely expected
310+
- Multiple valid approaches exist and user preference matters for scoping
311+
312+
**Process:**
313+
314+
1. **Send notification** so the user knows their input is needed:
315+
316+
```bash
317+
~/.pilot/bin/pilot notify plan_approval "Findings Ready" "<plan_name> — review discovered items and select what to include" --plan-path "<plan_path>" 2>/dev/null || true
318+
```
319+
320+
2. **List ALL discovered gaps/opportunities** with a brief assessment for each (1-2 sentences: what it is, why it matters or doesn't, effort level).
321+
322+
3. **Use AskUserQuestion with `multiSelect: true`** to let the user pick which items to include:
323+
324+
```
325+
Question: "Which of these items should we include in the plan?"
326+
Header: "Scope"
327+
multiSelect: true
328+
Options: [each discovered item as an option with description]
329+
```
330+
331+
4. **Only proceed to Step 1.4 with the user's selected scope.** Items not selected go into the plan's "Out of Scope" or "Deferred Ideas" section.
332+
303333
### Step 1.4: Design Decisions
304334
305335
**Present findings and gather all design decisions (Question Batch 2).**
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
## Code Review Reception
2+
3+
When receiving code review feedback — from users, spec-reviewer agents, or external tools like CodeRabbit — apply these guidelines.
4+
5+
### Response Sequence
6+
7+
1. **Read** — Complete feedback without reacting
8+
2. **Understand** — Restate requirement in own words (or ask)
9+
3. **Verify** — Check against codebase reality
10+
4. **Evaluate** — Technically sound for THIS codebase?
11+
5. **Respond** — Technical acknowledgment or reasoned pushback
12+
6. **Implement** — One item at a time, test each
13+
14+
If any item is unclear: **STOP** — do not implement anything yet. Ask for clarification on all unclear items first. Partial understanding = wrong implementation.
15+
16+
### Source-Specific Handling
17+
18+
| Source | Approach |
19+
|--------|----------|
20+
| **User feedback** | Trusted — implement after understanding. Still ask if scope unclear. Skip to action or technical acknowledgment. |
21+
| **External reviewers** | Verify first: (1) technically correct for THIS codebase? (2) breaks existing functionality? (3) reason for current implementation? (4) conflicts with user's prior decisions? If conflicts → stop and discuss with user first. |
22+
| **Spec-reviewer agents** | `must_fix` and `should_fix` → fix immediately. `suggestion` → implement if quick. No discussion needed. |
23+
24+
### YAGNI Check
25+
26+
When a reviewer suggests adding or "properly implementing" a feature:
27+
28+
1. Search codebase for actual usage (`vexor`, `Grep`, or LSP `findReferences`)
29+
2. If unused → push back: "This isn't called anywhere. Remove it (YAGNI)?"
30+
3. If used → implement properly
31+
32+
### Implementation Order (Multi-Item Feedback)
33+
34+
1. Clarify anything unclear **first**
35+
2. Blocking issues (breaks, security)
36+
3. Simple fixes (typos, imports, naming)
37+
4. Complex fixes (refactoring, logic changes)
38+
5. Test each fix individually, verify no regressions
39+
40+
### Forbidden Responses
41+
42+
| Never Say | Instead |
43+
|-----------|---------|
44+
| "You're absolutely right!" | State the technical requirement |
45+
| "Great point!" / "Excellent feedback!" | Just start working — actions > words |
46+
| "Let me implement that now" (before verification) | Verify against codebase first |
47+
| "Thanks for catching that!" | "Fixed. [Brief description of what changed]" |
48+
49+
### When to Push Back
50+
51+
Push back with technical reasoning when: suggestion breaks existing functionality, reviewer lacks full context, violates YAGNI, technically incorrect for this stack, or conflicts with user's architectural decisions.
52+
53+
If you pushed back and were wrong: state the correction factually and move on. No apologies or over-explaining.

pilot/rules/development-practices.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,43 @@
3030

3131
**Meta-Debugging:** Treat your own code as foreign. Your mental model is a guess — the code's behavior is truth.
3232

33+
#### Defense-in-Depth & Root-Cause Tracing
34+
35+
**After fixing a bug, make it structurally impossible — not just patched.**
36+
37+
When a bug is caused by invalid data flowing through multiple layers:
38+
39+
1. **Trace backward** from symptom through the call chain to the original trigger. Use LSP `incomingCalls` or add `new Error().stack` instrumentation before the failing operation. Fix at the source — never fix just where the error appears.
40+
2. **Then add validation at every layer** the data passes through:
41+
42+
| Layer | Purpose | Example |
43+
|-------|---------|---------|
44+
| Entry point | Reject invalid input at API boundary | Validate non-empty, exists, correct type |
45+
| Business logic | Ensure data makes sense for this operation | Validate required fields for specific context |
46+
| Environment guards | Prevent dangerous operations in specific contexts | Refuse destructive ops outside temp dirs in tests |
47+
| Debug instrumentation | Capture context for forensics | Log directory, cwd, stack trace before risky ops |
48+
49+
**Single validation = "we fixed the bug". All four layers = "we made the bug impossible."**
50+
51+
#### Condition-Based Waiting (Test Flakiness)
52+
53+
**Replace arbitrary `sleep`/`setTimeout` with polling for the actual condition.**
54+
55+
```
56+
# ❌ Guessing at timing (flaky)
57+
await sleep(500)
58+
result = get_result()
59+
60+
# ✅ Wait for the condition (reliable)
61+
result = await wait_for(lambda: get_result() is not None, timeout=5.0)
62+
```
63+
64+
**When to use:** Tests with arbitrary delays, flaky tests, waiting for async operations.
65+
66+
**When NOT to use:** Testing actual timing behavior (debounce, throttle) — document WHY the timeout is needed.
67+
68+
**Rules:** Poll every 10ms (not 1ms — wastes CPU). Always include timeout with clear error message. Call getter inside loop for fresh data (no stale cache).
69+
3370
### Git Operations
3471

3572
**Read git state freely. NEVER execute write commands without EXPLICIT user permission.**

pilot/rules/standards-frontend.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -56,18 +56,27 @@ paths:
5656

5757
**Commit to a clear aesthetic. Every visual choice must be intentional.**
5858

59-
1. **Purpose** — What does this communicate? 2. **Audience** — Developers? Consumers? 3. **Tone** — Minimalist, editorial, playful, etc. 4. **Differentiator** — One memorable element.
59+
**Before coding, answer these four questions:**
6060

61-
- **Typography:** Display font for personality, body font for readability. Max 2 fonts. Avoid defaults (Inter, Roboto, Arial).
62-
- **Color:** Clear hierarchy: Primary (CTAs) → Accent → Neutral → Semantic. Dark mode: design separately, not just invert.
61+
1. **Purpose** — What does this interface communicate? Who uses it?
62+
2. **Audience** — Developers? Consumers? Internal tooling?
63+
3. **Tone** — Minimalist, editorial, playful, industrial, luxury, retro-futuristic, etc.
64+
4. **Differentiator** — What's the one thing someone will remember?
65+
66+
Then execute that direction with precision across every detail below.
67+
68+
- **Typography:** Display font for personality, body font for readability. Max 2 fonts. Avoid defaults (Inter, Roboto, Arial, system fonts). Distinctive, characterful choices elevate the entire interface.
69+
- **Color:** Clear hierarchy: Primary (CTAs) → Accent → Neutral → Semantic. Dominant colors with sharp accents outperform timid, evenly-distributed palettes. Dark mode: design separately, not just invert.
6370
- **Spacing:** Generous whitespace. Cramped = low quality.
64-
- **Motion:** Every animation has purpose. Max 500ms. Always respect `prefers-reduced-motion`.
65-
- **Avoid AI aesthetic:** Purple gradients on white, symmetric 3-column grids, rounded cards with shadow, gradient text everywhere.
71+
- **Spatial Composition:** Break out of predictable grid layouts. Use asymmetry, overlapping elements, diagonal flow, or grid-breaking accents to create visual interest. Controlled density and unexpected placement make interfaces feel designed rather than templated.
72+
- **Visual Depth:** Create atmosphere beyond solid backgrounds. Use gradient meshes, subtle noise textures, layered transparencies, dramatic shadows, decorative borders, or grain overlays — matched to the overall aesthetic. Flat and empty ≠ minimal; depth creates polish.
73+
- **Motion:** Every animation has purpose. Max 500ms. Always respect `prefers-reduced-motion`. Prefer one well-orchestrated page load with staggered reveals (`animation-delay`) over scattered micro-interactions. Use scroll-triggered animations and surprising hover states for high-impact moments. CSS-only for HTML; Motion library for React when available.
74+
- **Avoid AI aesthetic:** Purple gradients on white, symmetric 3-column grids, rounded cards with shadow, gradient text everywhere, overused font families (Inter, Space Grotesk), cookie-cutter component patterns.
6675

6776
## Checklist
6877

6978
- [ ] Components: single responsibility, typed props, local state
7079
- [ ] CSS: project methodology, design tokens, no `!important`
7180
- [ ] Accessible: keyboard, labels, contrast 4.5:1, alt text
7281
- [ ] Responsive: mobile-first, fluid, touch targets ≥ 44px
73-
- [ ] Design: intentional direction, no AI anti-patterns
82+
- [ ] Design: intentional direction, visual depth, composition, no AI anti-patterns

pilot/rules/testing.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,11 @@ Use `playwright-cli` to verify what the user sees. See `playwright-cli.md` for c
6666
### Anti-Patterns
6767

6868
- **Dependent tests** — each test must work independently
69-
- **Testing implementation, not behavior**test outputs, not internals
70-
- **Incomplete mock data** — must match real API structure
69+
- **Testing implementation, not behavior**assert outputs and state changes, not that specific mocks were called. `assert result == expected` not `mock.assert_called_with(...)`. If the implementation changes but behavior stays the same, tests should still pass.
70+
- **Incomplete mocks hiding structural assumptions**mocks must mirror the complete real API structure, not just the fields you think you need. Partial mocks hide coupling to downstream fields and break when the real API returns additional or different data.
7171
- **Unnecessary mocks** — only for external deps
72+
- **Test-only methods in production** — never add methods, properties, or flags to production classes purely for test access. If you need internal state for testing, refactor the design so the behavior is observable through public interfaces.
73+
- **Mocking without understanding** — before mocking a dependency, understand what it actually does. A mock that doesn't reflect real behavior is a lie — tests pass against the lie, then fail against reality.
7274

7375
### Completion Checklist
7476

0 commit comments

Comments
 (0)