refactor(skills): sharpen 7 task names for semantic precision

nick-inkeep · claude · nick-inkeep · commit b7c3f3de169c · 2026-03-23T18:49:05.000-07:00
Rename task subjects to more precisely describe the key actions: - spec #2: "Scaffold — create artifacts and build world model" → "Scaffold — create artifacts, investigate system and dependencies" - spec #5: "Freeze — scope freeze" → "Freeze — adversarial review, resolution status, completeness gate" - review #1: "Resolve PR and assess starting state" → "Assess starting state — detect PR, fetch feedback, check local changes" - qa #2: "Derive test plan" → "Derive test plan and apply formalization gate" - debug #1: "Phase 1 — triage" → "Phase 1 — classify bug and load playbook" - debug #3: "Phase 3 — investigate" → "Phase 3 — hypothesis-driven root cause investigation" - explore #2: "Investigate" → "Execute active lenses — map, trace, or inspect" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
diff --git a/plugins/eng/skills/debug/SKILL.md b/plugins/eng/skills/debug/SKILL.md
@@ -131,9 +131,9 @@ Before starting any work, create these tasks using `TaskCreate`. This makes the
 
 Create these tasks in order:
 
-1. **Debug: Phase 1 — triage** — Parse complete error output (every word). Classify bug into one of 9 categories. Load relevant triage playbook. Identify files to investigate from stack trace.
+1. **Debug: Phase 1 — classify bug and load playbook** — Parse complete error output (every word). Classify bug into one of 9 categories. Load relevant triage playbook. Identify files to investigate from stack trace.
 2. **Debug: Phase 2 — reproduce and comprehend** — Reproduce failure reliably. Map relevant code area (30-50 lines context, follow imports, read 2-3 siblings). Check system state. Check git history. Build mental model: expected vs actual behavior. State premises with file:line citations.
-3. **Debug: Phase 3 — investigate** — Present all plausible hypotheses in one batch ranked by confidence. Test each via hypothesis-test-refine cycle (predict before running). Record verdict per hypothesis. Switch strategy after 3 rejections. Escalate after 5 hypotheses.
+3. **Debug: Phase 3 — hypothesis-driven root cause investigation** — Present all plausible hypotheses in one batch ranked by confidence. Test each via hypothesis-test-refine cycle (predict before running). Record verdict per hypothesis. Switch strategy after 3 rejections. Escalate after 5 hypotheses.
 4. **Debug: Phase 4 — classify root cause** — Classify: dev environment/config issue vs code bug vs both. This determines the resolution path.
 5. **Debug: Phase 5 — report and recommend** — Deliver structured findings: root cause summary (file:function:logic + evidence chain), recommended fix strategy, similar patterns, hardening recommendations. Clean up diagnostic artifacts. NO FIX CODE.
 
@@ -143,9 +143,9 @@ Use `addBlockedBy` to enforce ordering. As each phase begins, mark its task `in_
 
 | Task | Done when |
 |---|---|
-| Triage | Bug category identified, playbook loaded, relevant files identified from error signal |
+| Classify + load playbook | Bug category identified, playbook loaded, relevant files identified from error signal |
 | Reproduce | Failure reproduced on demand (or documented why it can't be), expected vs actual behavior gap articulated, premises stated with file:line |
-| Investigate | Specific root cause identified with evidence from at least one diagnostic action |
+| Hypothesis-driven investigation | Specific root cause identified with evidence from at least one diagnostic action |
 | Classify | Root cause classified as env/config, code bug, or both |
 | Report | Structured findings delivered with file:line specificity, diagnostic artifacts documented, no fix code written |
 
diff --git a/plugins/eng/skills/explore/SKILL.md b/plugins/eng/skills/explore/SKILL.md
@@ -67,7 +67,7 @@ When the purpose is not stated, infer from context using this table.
 Before starting, create tasks to track progress through the phases:
 
 1. **Explore: Scope and load knowledge** — determine target, select lenses, check existing repo knowledge (skills, architecture docs, surface catalogs)
-2. **Explore: Investigate** — execute active phases (map surfaces, search and trace, inspect patterns — based on selected lenses)
+2. **Explore: Execute active lenses — map, trace, or inspect** — execute active phases (map surfaces, search and trace, inspect patterns — based on selected lenses)
 3. **Explore: Synthesize** — produce brief in appropriate format (pattern, trace, world model, or combined) with confidence provenance and gap discipline
 
 Mark each task `in_progress` when starting and `completed` when finished.
diff --git a/plugins/eng/skills/qa/SKILL.md b/plugins/eng/skills/qa/SKILL.md
@@ -39,7 +39,7 @@ Before starting any work, create these tasks using `TaskCreate`. This makes the
 Create these tasks in order:
 
 1. **QA: Detect tools and gather context** — Probe for available testing tools (browser/Playwright, desktop/Peekaboo, shell). Document tool gaps. Gather feature context from SPEC.md, PR, or feature description. Build mental model of what was built and what surfaces were touched.
-2. **QA: Derive test plan** — Identify concrete scenarios requiring manual verification. Apply formalization gate to each: if automatable with easy-medium effort, write the formal test instead. Categorize remaining scenarios. Create qa-progress.json with all scenarios in "planned" status (when tmp/ship/ exists) or persist checklist to PR body (standalone mode).
+2. **QA: Derive test plan and apply formalization gate** — Identify concrete scenarios requiring manual verification. Apply formalization gate to each: if automatable with easy-medium effort, write the formal test instead. Categorize remaining scenarios. Create qa-progress.json with all scenarios in "planned" status (when tmp/ship/ exists) or persist checklist to PR body (standalone mode).
 3. **QA: Execute test scenarios** — Work through each scenario using strongest available tool (browser > API > shell > inference). Test happy path first, then break it, then stress it. Record video evidence for browser scenarios. Fix bugs discovered during testing.
 4. **QA: Record results** — Update qa-progress.json for every scenario: set status (validated/failed/blocked/skipped), verifiedVia fidelity level, notes, and evidence URLs.
 5. **QA: Report** — Communicate results to invoker. Total scenarios tested vs passed vs failed vs skipped. Bugs found and fixed. Gaps that could NOT be tested. Judgment call on readiness.
@@ -51,7 +51,7 @@ Use `addBlockedBy` to enforce ordering. As each step begins, mark its task `in_p
 | Task | Done when |
 |---|---|
 | Detect tools + context | Tool inventory documented, feature context understood, mental model built |
-| Derive test plan | All scenarios identified, formalization gate applied, qa-progress.json created with all scenarios in `planned` status |
+| Derive test plan + formalization gate | All scenarios identified, formalization gate applied (automatable scenarios converted to formal tests), qa-progress.json created with all scenarios in `planned` status |
 | Execute | All planned scenarios executed (or marked blocked/skipped with reason), bugs found are fixed or documented |
 | Record results | Every scenario in qa-progress.json has non-`planned` status, verifiedVia populated, notes populated for non-clean-pass scenarios |
 | Report | Results communicated, gaps documented, readiness judgment stated |
diff --git a/plugins/eng/skills/review/SKILL.md b/plugins/eng/skills/review/SKILL.md
@@ -29,7 +29,7 @@ Before starting any work, create these tasks using `TaskCreate`. This makes the
 
 Create these tasks in order:
 
-1. **Review: Resolve PR and assess starting state** — Verify gh auth. Detect PR number. Check for unpushed local changes. Fetch existing review feedback. Update PR body if implementation changed.
+1. **Review: Assess starting state — detect PR, fetch feedback, check local changes** — Verify gh auth. Detect PR number. Check for unpushed local changes. Fetch existing review feedback. Update PR body if implementation changed.
 2. **Review: Stage 1 — review feedback loop** — Poll for reviewer feedback (6-min intervals). For each thread: investigate proportionally, evaluate across all dimensions, decide with evidence, resolve thread. Implement accepted changes, run tests, push. Re-poll after every push.
 3. **Review: Stage 2 — CI/CD resolution** — Monitor pipeline. Classify each failure with evidence (PR-caused, pre-existing, flaky, infrastructure, cancelled). Fix PR-caused failures. Re-trigger flaky/infra/cancelled and wait for result.
 4. **Review: Final verification** — Verify: all threads resolved, CI/CD green or documented, no pending runs. If security/auth/multi-subsystem changes, trigger second-pass review.
@@ -40,7 +40,7 @@ Use `addBlockedBy` to enforce ordering. As each stage begins, mark its task `in_
 
 | Task | Done when |
 |---|---|
-| Resolve PR | Starting state determined, PR body current, existing feedback fetched |
+| Assess starting state | Starting state determined, PR body current, existing feedback fetched |
 | Stage 1 | Every thread resolved with evidence-backed reply, latest changes pushed, re-polled after last push with no new comments |
 | Stage 2 | Pipeline green OR all failures documented as pre-existing/unrelated with `--compare-main` evidence. No cancelled or pending runs. |
 | Final verification | All exit checklist items pass. Second-pass triggered if applicable. |
diff --git a/plugins/eng/skills/spec/SKILL.md b/plugins/eng/skills/spec/SKILL.md
@@ -75,10 +75,10 @@ Before starting any work, create these tasks using `TaskCreate`. This makes the
 Create these tasks in order:
 
 1. **Spec: Intake — problem framing and stress-test** — Capture seed (what, why, who). Draft problem statement in SCR format. Run all 5 stress-test probes. Produce initial Open Questions list.
-2. **Spec: Scaffold — create artifacts and build world model** — Create SPEC.md, evidence/, meta/_changelog.md. Build product + internal surface-area maps. Dispatch /explore for blast radius. Investigate 3P dependencies. Produce scope hypothesis.
+2. **Spec: Scaffold — create artifacts, investigate system and dependencies** — Create SPEC.md, evidence/, meta/_changelog.md. Build product + internal surface-area maps. Dispatch /explore for blast radius. Investigate 3P dependencies. Produce scope hypothesis.
 3. **Spec: Backlog — extract and prioritize open questions** — Systematically extract OQs via walkthrough, tensions, and negative-space probes. Classify every item. Present priority triage to user for confirmation.
 4. **Spec: Iterate — investigate, decide, cascade** — Run the core loop: investigate P0 items, present decision batches, cascade through SPEC.md, completeness re-sweep every 2-3 iterations.
-5. **Spec: Freeze — scope freeze** — Run adversarial pre-freeze review. Assign resolution status to all decisions. Run resolution completeness gate. Classify Future Work by maturity tier.
+5. **Spec: Freeze — adversarial review, resolution status, completeness gate** — Run adversarial pre-freeze review. Assign resolution status to all decisions. Run resolution completeness gate. Classify Future Work by maturity tier.
 6. **Spec: Verify and finalize — technical accuracy and quality bar** — Refresh codebase. Extract load-bearing technical assertions. Dispatch parallel verification subagents. Present findings (Tier 1 design-affecting, Tier 2 factual corrections). Apply corrections. Run quality bar checklist.
 
 Use `addBlockedBy` to enforce ordering. As each step begins, mark its task `in_progress`. When the step completes, mark it `completed`.
@@ -88,10 +88,10 @@ Use `addBlockedBy` to enforce ordering. As each step begins, mark its task `in_p
 | Task | Done when |
 |---|---|
 | Intake | SCR problem statement drafted, all 5 stress-test probes run, initial Open Questions list exists |
-| Scaffold | SPEC.md exists on disk, evidence/ directory created, product + internal surface maps drafted, scope hypothesis presented to user |
+| Scaffold (artifacts + investigation) | SPEC.md exists on disk, evidence/ directory created, product + internal surface maps drafted, scope hypothesis presented to user |
 | Backlog | All items extracted (not filtered), classified with P0/P2 tags, user has confirmed priority assignments |
 | Iterate | All P0 open questions resolved, scope stabilized through iterative loop, no pending decision batches |
-| Freeze | All decisions have resolution status (LOCKED/DIRECTED/DELEGATED), all In Scope items pass completeness gate, Future Work classified by maturity tier |
+| Freeze (review + status + gate) | All decisions have resolution status (LOCKED/DIRECTED/DELEGATED), all In Scope items pass completeness gate, Future Work classified by maturity tier |
 | Verify and finalize | Codebase refreshed, all load-bearing assertions verified (CONFIRMED/CONTRADICTED/STALE/UNVERIFIABLE), Tier 1 issues resolved via iterative loop, Tier 2 corrections applied, quality bar checklist passes, SPEC.md finalized |
 
 **On re-entry:** Check `TaskList` first. If tasks exist, read SPEC.md and `meta/_changelog.md` to determine current state. Resume from the first non-completed task. If no tasks exist, create them and mark completed phases based on SPEC.md content (has SCR? → Intake done. Has surface maps? → Scaffold done. Etc.)