You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pilot/agents/spec-reviewer-compliance.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,6 +21,10 @@ The orchestrator provides:
21
21
-`test_framework_constraints` (optional): What the test framework can/cannot test
22
22
-`plan_risks`: Risks and mitigations from the plan
23
23
24
+
## ⛔ Adversarial Posture
25
+
26
+
Plan tasks may be marked done without fully meeting their Definition of Done criteria. Do not trust self-reported completion status — verify every DoD criterion independently against the actual code. The implementation may be incomplete or optimistic.
27
+
24
28
## Verification Workflow (FOLLOW THIS ORDER EXACTLY)
25
29
26
30
1.**Read the plan file completely** - This is your source of truth
Copy file name to clipboardExpand all lines: pilot/agents/spec-reviewer-goal.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,10 @@ Detect the project language from:
39
39
- If changed files include `.sh`, `.bash`, `.py` hook scripts → apply Shell/Python patterns
40
40
- Mixed projects: apply patterns per-file
41
41
42
+
#### ⛔ Adversarial Posture
43
+
44
+
For code changes: artifacts may exist but be stubs, disconnected, or only partially wired together. Verify that components are substantive and actually connected — not just present. For Markdown-only changes: verify content is substantive and placed correctly — wiring checks are not applicable.
45
+
42
46
### Step 1: Goal-Backward Truth Derivation
43
47
44
48
Read the plan file completely. Then follow this preference order:
Copy file name to clipboardExpand all lines: pilot/agents/spec-reviewer-quality.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,10 @@ permissionMode: plan
11
11
12
12
You review implementation code for quality, security, testing, performance, and error handling. Your job is to find real issues that would cause problems in production: bugs, security vulnerabilities, poor error handling, missing tests, and performance problems.
13
13
14
+
## ⛔ Adversarial Posture
15
+
16
+
Code that passes tests can still have quality, security, or performance issues. Do not trust passing tests as proof of quality — verify independently that error handling, edge cases, and security boundaries are sound. The implementation may look correct while hiding real problems.
17
+
14
18
## ⛔ MANDATORY FIRST STEP: Read ALL Rules
15
19
16
20
**Before reviewing ANY code, you MUST read all rule files. This is NON-NEGOTIABLE.**
Copy file name to clipboardExpand all lines: pilot/commands/learn.md
+10-8Lines changed: 10 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
description: Extract reusable knowledge into skills - online learning system
2
+
description: Use after significant debugging, workarounds, or multi-step workflows worth standardizing for future sessions
3
3
model: sonnet
4
4
---
5
5
@@ -71,9 +71,9 @@ Skills must be:
71
71
name: descriptive-kebab-case-name
72
72
description: |
73
73
[CRITICAL: This determines when the skill triggers. Include:]
74
-
-What the skill does
74
+
-When to use it (trigger conditions, scenarios — NOT what the skill does)
75
75
- Specific trigger conditions (exact error messages, symptoms)
76
-
-When to use it (contexts, scenarios)
76
+
-Contexts where it applies
77
77
author: Claude Code
78
78
version: 1.0.0
79
79
---
@@ -106,11 +106,13 @@ version: 1.0.0
106
106
```
107
107
108
108
<details>
109
-
<summary>Writing Effective Descriptions</summary>
109
+
<summary>Writing Effective Descriptions (⚠️ The Description Trap)</summary>
110
110
111
-
The description field is CRITICAL for skill discovery:
111
+
The description field is CRITICAL for skill discovery. **Describe WHEN to use the skill, NEVER summarize HOW it works.**
112
112
113
-
**Good:**
113
+
**Why this matters:** If the description summarizes the workflow/process, Claude follows the short description as a shortcut instead of reading the full SKILL.md. The skill's detailed steps, examples, and nuances get ignored.
114
+
115
+
**Good** (trigger conditions):
114
116
115
117
```yaml
116
118
description: |
@@ -119,10 +121,10 @@ description: |
119
121
not in packages, (3) symlinked dependencies cause failures.
120
122
```
121
123
122
-
**Bad:**
124
+
**Bad** (workflow summary — Claude will follow this instead of reading the skill):
123
125
124
126
```yaml
125
-
description: Helps with npm problems in monorepos.
127
+
description: Extract and organize npm monorepo fixes by analyzing symlinks and paths.
6.**Run actual program** - Use the plan's Runtime Environment section to start the service/program. Show real output with sample data. Check port availability first: `lsof -i :<port>`. **If using `playwright-cli`, use session isolation:**`playwright-cli -s="${PILOT_SESSION_ID:-default}" open <url>` (see `playwright-cli.md`). Always close the session after verification.
207
207
7.**Check diagnostics** - Must be zero errors
208
208
8.**Validate Definition of Done** - Check all criteria from plan
209
-
9.**Per-task commit (worktree mode only)** - If `Worktree: Yes` in the plan, commit task changes immediately:
209
+
9.**Self-Review (Fresh Eyes)** - Before committing, re-examine your changes as if seeing them for the first time:
210
+
-**Completeness:** Did I miss any requirements from the task objective?
211
+
-**Quality:** Are names clear? Is the code readable without comments explaining what it does?
212
+
-**Discipline:** Did I add anything not in the task scope? (YAGNI — remove it)
213
+
-**Testing:** Do tests verify behavior (inputs → outputs), not implementation details (mock calls)?
214
+
10.**Per-task commit (worktree mode only)** - If `Worktree: Yes` in the plan, commit task changes immediately:
210
215
```bash
211
216
git add <task-specific-files># Stage only files related to this task
212
217
git commit -m "{type}(spec): {task-name}"
213
218
```
214
219
Use `feat(spec):` for new features, `fix(spec):` for bug fixes, `test(spec):` for test-only tasks, `refactor(spec):` for refactoring. Skip this step when `Worktree: No` (normal git rules apply).
215
-
10.**Mark task as `completed`** - `TaskUpdate(taskId="<id>", status="completed")`
216
-
11.**UPDATE PLAN FILE IMMEDIATELY** (see Step 2.4)
220
+
11.**Mark task as `completed`** - `TaskUpdate(taskId="<id>", status="completed")`
221
+
12.**UPDATE PLAN FILE IMMEDIATELY** (see Step 2.4)
**Only do this step when exploration revealed multiple possible directions or the scope is ambiguous.** Skip for straightforward tasks where the scope is clear from the user's request (e.g., "fix this bug", "add a logout button", "refactor the auth module").
306
+
307
+
**Triggers for this step:**
308
+
- Task is exploratory or open-ended ("analyze X and see what we can improve", "find gaps in our system")
309
+
- Exploration uncovered significantly more opportunities than the user likely expected
310
+
- Multiple valid approaches exist and user preference matters for scoping
311
+
312
+
**Process:**
313
+
314
+
1. **Send notification** so the user knows their input is needed:
315
+
316
+
```bash
317
+
~/.pilot/bin/pilot notify plan_approval "Findings Ready" "<plan_name> — review discovered items and select what to include" --plan-path "<plan_path>" 2>/dev/null || true
318
+
```
319
+
320
+
2. **List ALL discovered gaps/opportunities** with a brief assessment for each (1-2 sentences: what it is, why it matters or doesn't, effort level).
321
+
322
+
3. **Use AskUserQuestion with `multiSelect: true`** to let the user pick which items to include:
323
+
324
+
```
325
+
Question: "Which of these items should we include in the plan?"
326
+
Header: "Scope"
327
+
multiSelect: true
328
+
Options: [each discovered item as an option with description]
329
+
```
330
+
331
+
4. **Only proceed to Step 1.4 with the user's selected scope.** Items not selected go into the plan's "Out of Scope" or "Deferred Ideas" section.
332
+
303
333
### Step 1.4: Design Decisions
304
334
305
335
**Present findings and gather all design decisions (Question Batch 2).**
When receiving code review feedback — from users, spec-reviewer agents, or external tools like CodeRabbit — apply these guidelines.
4
+
5
+
### Response Sequence
6
+
7
+
1.**Read** — Complete feedback without reacting
8
+
2.**Understand** — Restate requirement in own words (or ask)
9
+
3.**Verify** — Check against codebase reality
10
+
4.**Evaluate** — Technically sound for THIS codebase?
11
+
5.**Respond** — Technical acknowledgment or reasoned pushback
12
+
6.**Implement** — One item at a time, test each
13
+
14
+
If any item is unclear: **STOP** — do not implement anything yet. Ask for clarification on all unclear items first. Partial understanding = wrong implementation.
15
+
16
+
### Source-Specific Handling
17
+
18
+
| Source | Approach |
19
+
|--------|----------|
20
+
|**User feedback**| Trusted — implement after understanding. Still ask if scope unclear. Skip to action or technical acknowledgment. |
21
+
|**External reviewers**| Verify first: (1) technically correct for THIS codebase? (2) breaks existing functionality? (3) reason for current implementation? (4) conflicts with user's prior decisions? If conflicts → stop and discuss with user first. |
22
+
|**Spec-reviewer agents**|`must_fix` and `should_fix` → fix immediately. `suggestion` → implement if quick. No discussion needed. |
23
+
24
+
### YAGNI Check
25
+
26
+
When a reviewer suggests adding or "properly implementing" a feature:
27
+
28
+
1. Search codebase for actual usage (`vexor`, `Grep`, or LSP `findReferences`)
29
+
2. If unused → push back: "This isn't called anywhere. Remove it (YAGNI)?"
30
+
3. If used → implement properly
31
+
32
+
### Implementation Order (Multi-Item Feedback)
33
+
34
+
1. Clarify anything unclear **first**
35
+
2. Blocking issues (breaks, security)
36
+
3. Simple fixes (typos, imports, naming)
37
+
4. Complex fixes (refactoring, logic changes)
38
+
5. Test each fix individually, verify no regressions
39
+
40
+
### Forbidden Responses
41
+
42
+
| Never Say | Instead |
43
+
|-----------|---------|
44
+
| "You're absolutely right!" | State the technical requirement |
45
+
| "Great point!" / "Excellent feedback!" | Just start working — actions > words |
46
+
| "Let me implement that now" (before verification) | Verify against codebase first |
47
+
| "Thanks for catching that!" | "Fixed. [Brief description of what changed]" |
48
+
49
+
### When to Push Back
50
+
51
+
Push back with technical reasoning when: suggestion breaks existing functionality, reviewer lacks full context, violates YAGNI, technically incorrect for this stack, or conflicts with user's architectural decisions.
52
+
53
+
If you pushed back and were wrong: state the correction factually and move on. No apologies or over-explaining.
Copy file name to clipboardExpand all lines: pilot/rules/development-practices.md
+37Lines changed: 37 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,43 @@
30
30
31
31
**Meta-Debugging:** Treat your own code as foreign. Your mental model is a guess — the code's behavior is truth.
32
32
33
+
#### Defense-in-Depth & Root-Cause Tracing
34
+
35
+
**After fixing a bug, make it structurally impossible — not just patched.**
36
+
37
+
When a bug is caused by invalid data flowing through multiple layers:
38
+
39
+
1.**Trace backward** from symptom through the call chain to the original trigger. Use LSP `incomingCalls` or add `new Error().stack` instrumentation before the failing operation. Fix at the source — never fix just where the error appears.
40
+
2.**Then add validation at every layer** the data passes through:
41
+
42
+
| Layer | Purpose | Example |
43
+
|-------|---------|---------|
44
+
| Entry point | Reject invalid input at API boundary | Validate non-empty, exists, correct type |
45
+
| Business logic | Ensure data makes sense for this operation | Validate required fields for specific context |
46
+
| Environment guards | Prevent dangerous operations in specific contexts | Refuse destructive ops outside temp dirs in tests |
47
+
| Debug instrumentation | Capture context for forensics | Log directory, cwd, stack trace before risky ops |
48
+
49
+
**Single validation = "we fixed the bug". All four layers = "we made the bug impossible."**
50
+
51
+
#### Condition-Based Waiting (Test Flakiness)
52
+
53
+
**Replace arbitrary `sleep`/`setTimeout` with polling for the actual condition.**
54
+
55
+
```
56
+
# ❌ Guessing at timing (flaky)
57
+
await sleep(500)
58
+
result = get_result()
59
+
60
+
# ✅ Wait for the condition (reliable)
61
+
result = await wait_for(lambda: get_result() is not None, timeout=5.0)
62
+
```
63
+
64
+
**When to use:** Tests with arbitrary delays, flaky tests, waiting for async operations.
65
+
66
+
**When NOT to use:** Testing actual timing behavior (debounce, throttle) — document WHY the timeout is needed.
67
+
68
+
**Rules:** Poll every 10ms (not 1ms — wastes CPU). Always include timeout with clear error message. Call getter inside loop for fresh data (no stale cache).
69
+
33
70
### Git Operations
34
71
35
72
**Read git state freely. NEVER execute write commands without EXPLICIT user permission.**
Copy file name to clipboardExpand all lines: pilot/rules/standards-frontend.md
+15-6Lines changed: 15 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -56,18 +56,27 @@ paths:
56
56
57
57
**Commit to a clear aesthetic. Every visual choice must be intentional.**
58
58
59
-
1.**Purpose** — What does this communicate? 2. **Audience** — Developers? Consumers? 3. **Tone** — Minimalist, editorial, playful, etc. 4. **Differentiator** — One memorable element.
59
+
**Before coding, answer these four questions:**
60
60
61
-
-**Typography:** Display font for personality, body font for readability. Max 2 fonts. Avoid defaults (Inter, Roboto, Arial).
62
-
-**Color:** Clear hierarchy: Primary (CTAs) → Accent → Neutral → Semantic. Dark mode: design separately, not just invert.
61
+
1.**Purpose** — What does this interface communicate? Who uses it?
3.**Tone** — Minimalist, editorial, playful, industrial, luxury, retro-futuristic, etc.
64
+
4.**Differentiator** — What's the one thing someone will remember?
65
+
66
+
Then execute that direction with precision across every detail below.
67
+
68
+
-**Typography:** Display font for personality, body font for readability. Max 2 fonts. Avoid defaults (Inter, Roboto, Arial, system fonts). Distinctive, characterful choices elevate the entire interface.
69
+
-**Color:** Clear hierarchy: Primary (CTAs) → Accent → Neutral → Semantic. Dominant colors with sharp accents outperform timid, evenly-distributed palettes. Dark mode: design separately, not just invert.
-**Motion:** Every animation has purpose. Max 500ms. Always respect `prefers-reduced-motion`.
65
-
-**Avoid AI aesthetic:** Purple gradients on white, symmetric 3-column grids, rounded cards with shadow, gradient text everywhere.
71
+
-**Spatial Composition:** Break out of predictable grid layouts. Use asymmetry, overlapping elements, diagonal flow, or grid-breaking accents to create visual interest. Controlled density and unexpected placement make interfaces feel designed rather than templated.
72
+
-**Visual Depth:** Create atmosphere beyond solid backgrounds. Use gradient meshes, subtle noise textures, layered transparencies, dramatic shadows, decorative borders, or grain overlays — matched to the overall aesthetic. Flat and empty ≠ minimal; depth creates polish.
73
+
-**Motion:** Every animation has purpose. Max 500ms. Always respect `prefers-reduced-motion`. Prefer one well-orchestrated page load with staggered reveals (`animation-delay`) over scattered micro-interactions. Use scroll-triggered animations and surprising hover states for high-impact moments. CSS-only for HTML; Motion library for React when available.
74
+
-**Avoid AI aesthetic:** Purple gradients on white, symmetric 3-column grids, rounded cards with shadow, gradient text everywhere, overused font families (Inter, Space Grotesk), cookie-cutter component patterns.
66
75
67
76
## Checklist
68
77
69
78
-[ ] Components: single responsibility, typed props, local state
70
79
-[ ] CSS: project methodology, design tokens, no `!important`
71
80
-[ ] Accessible: keyboard, labels, contrast 4.5:1, alt text
Copy file name to clipboardExpand all lines: pilot/rules/testing.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,9 +66,11 @@ Use `playwright-cli` to verify what the user sees. See `playwright-cli.md` for c
66
66
### Anti-Patterns
67
67
68
68
-**Dependent tests** — each test must work independently
69
-
-**Testing implementation, not behavior** — test outputs, not internals
70
-
-**Incomplete mock data** — must match real API structure
69
+
-**Testing implementation, not behavior** — assert outputs and state changes, not that specific mocks were called. `assert result == expected` not `mock.assert_called_with(...)`. If the implementation changes but behavior stays the same, tests should still pass.
70
+
-**Incomplete mocks hiding structural assumptions** — mocks must mirror the complete real API structure, not just the fields you think you need. Partial mocks hide coupling to downstream fields and break when the real API returns additional or different data.
71
71
-**Unnecessary mocks** — only for external deps
72
+
-**Test-only methods in production** — never add methods, properties, or flags to production classes purely for test access. If you need internal state for testing, refactor the design so the behavior is observable through public interfaces.
73
+
-**Mocking without understanding** — before mocking a dependency, understand what it actually does. A mock that doesn't reflect real behavior is a lie — tests pass against the lie, then fail against reality.
0 commit comments