refactor: update documentation and skill metadata for improved clarity and workflow alignment

Akagi201 · Akagi201 · commit 018d814cfe7f · 2026-03-07T23:31:04.000+08:00
diff --git a/docs/design.md b/docs/design.md
@@ -101,7 +101,7 @@ Behavior guarantees:
 
 ### 6.1 pb-init
 
-Audits the repository and produces a **minimal** `AGENTS.md` containing only information that agents cannot discover from the codebase itself. Applies a strict three-part filter: each entry must be (1) not inferrable from code, (2) operationally decisive, and (3) not guessable from industry conventions. The ideal AGENTS.md is empty — every entry represents a codebase smell that should eventually be fixed at the root cause. Re-runs audit existing entries and flag any that are now discoverable.
+Audits the repository and updates a **managed snapshot block** inside `AGENTS.md`. The generated block captures current project context, key file locations, active specs, and an `Architecture Decision Snapshot` that later agents inherit. Re-runs replace only the managed block and preserve all user-authored content outside it.
 
 ### 6.2 pb-plan
 
@@ -122,6 +122,7 @@ Implements tasks sequentially with strict context hygiene and an outside-in doub
 3. Minimal context handoff between subagents.
 4. File-scoped rollback guidance for failed task attempts.
 5. Per-task verification criteria, scenario coverage mapping, and explicit completion status tracking in `tasks.md`.
+6. Managed `AGENTS.md` snapshot updates instead of whole-file rewrites.
 
 ## 8. Testing and Verification
 
@@ -131,6 +132,7 @@ Current automated coverage validates:
 2. Platform path/render behavior across all supported platforms.
 3. End-to-end structure generation for `--ai all`.
 4. Template loading and safety regressions (e.g., malformed wrappers, destructive command checks).
+5. Prompt/skill parity checks for workflow-critical instructions and architecture constraints.
 
 Primary verification commands:
 
@@ -142,5 +144,5 @@ uv run ruff check .
 ## 9. Known Constraints and Follow-ups
 
 1. Platform-specific runtime semantics can evolve; adapter paths/formats should be periodically re-validated against official tool docs.
-2. Prompt/skill content parity is maintained by template discipline, not code generation.
+2. Prompt/skill content parity is maintained by template discipline, and parity is guarded by regression tests for workflow-critical instructions.
 3. Additional platforms should be added only through new adapter classes and test expansion, not conditional sprawl in shared install logic.
diff --git a/src/pb_spec/platforms/base.py b/src/pb_spec/platforms/base.py
@@ -6,8 +6,8 @@
 # Skill metadata: name -> description
 SKILL_METADATA: dict[str, str] = {
     "pb-init": (
-        "Use to audit the repo and produce a minimal AGENTS.md containing only "
-        "undiscoverable gotchas, hard constraints, and non-obvious conventions."
+        "Use to audit the repo and update a managed AGENTS.md snapshot with "
+        "project context, architecture decisions, and non-obvious conventions."
     ),
     "pb-plan": (
         "Use when converting a requirement into a design proposal and executable tasks before coding."
@@ -17,7 +17,7 @@
         "and tasks.md."
     ),
     "pb-build": (
-        "Use when tasks.md is ready and you need sequential TDD implementation with recovery loops."
+        "Use when tasks.md is ready and you need sequential BDD+TDD implementation with recovery loops."
     ),
 }
 
diff --git a/src/pb_spec/templates/prompts/pb-build.prompt.md b/src/pb_spec/templates/prompts/pb-build.prompt.md
@@ -40,7 +40,7 @@ Never guess `<spec-dir>` from memory. Always resolve from actual directory names
 
 ## Step 2: Parse Unfinished Tasks
 
-Scan for all unchecked items (`- [ ]`). Build an ordered list preserving Phase → Task number order.
+Determine unfinished tasks from each `### Task X.Y:` block in `tasks.md`, then inspect the status and checkbox lines inside that block. Do not treat every `- [ ]` step as a separate task. Build an ordered list of task blocks preserving Phase → Task number order.
 
 **Use Task IDs for state tracking.** Each task has a unique ID in the format `Task X.Y` (e.g., `Task 1.1`, `Task 2.3`). When locating tasks, match on the `### Task X.Y:` heading pattern, not just bare checkboxes.
 
@@ -49,7 +49,7 @@ Scan for all unchecked items (`- [ ]`). Build an ordered list preserving Phase 
 - If `tasks.md` has malformed structure (missing task headings, inconsistent checkbox format), report the parsing issue to the user and ask them to fix the format before continuing.
 - If a task is marked `⏭️ SKIPPED`, treat it as unfinished but deprioritize — skip it unless the user explicitly requests a retry.
 
-For execution reliability, represent the queue as explicit task units: `Task ID`, `Task Name`, `Status`, `Scenario Coverage`, `Loop Type`, `BDD Verification`, `Verification`.
+For execution reliability, represent the queue as explicit task-block units: `Task ID`, `Task Name`, `Status`, `Scenario Coverage`, `Loop Type`, `BDD Verification`, and `Verification`.
 
 If all tasks are checked (`- [x]`), report:
 
@@ -282,7 +282,7 @@ You are implementing **Task {{TASK_NUMBER}}: {{TASK_NAME}}**.
 
 ### Your Job
 
-Execute in strict order:
+Execute in strict order. Report concise decisions and evidence for each step:
 
 Before coding, define a compact task contract from the provided task block:
 
diff --git a/src/pb_spec/templates/skills/pb-build/SKILL.md b/src/pb_spec/templates/skills/pb-build/SKILL.md
@@ -44,7 +44,7 @@ Never guess `<spec-dir>` from memory. Always resolve from actual directory names
 
 ### Step 2: Parse Unfinished Tasks
 
-Scan `tasks.md` for all unchecked task items (`- [ ]`). Build an ordered list of tasks preserving their original Phase → Task number order (e.g., Task 1.1, Task 1.2, Task 2.1, …).
+Determine unfinished tasks from each `### Task X.Y:` block in `tasks.md`, then inspect the status and checkbox lines inside that block. Do not treat every `- [ ]` step as a separate task. Build an ordered list of task blocks preserving their original Phase → Task number order (e.g., Task 1.1, Task 1.2, Task 2.1, …).
 
 **Use Task IDs for state tracking.** Each task has a unique ID in the format `Task X.Y` (e.g., `Task 1.1`, `Task 2.3`). When locating tasks, match on the `### Task X.Y:` heading pattern, not just bare checkboxes.
 
@@ -53,7 +53,7 @@ Scan `tasks.md` for all unchecked task items (`- [ ]`). Build an ordered list of
 - If `tasks.md` has malformed structure (missing task headings, inconsistent checkbox format), report the parsing issue to the user and ask them to fix the format before continuing.
 - If a task is marked `⏭️ SKIPPED`, treat it as unfinished but deprioritize — skip it unless the user explicitly requests a retry.
 
-For execution reliability, represent the queue as explicit task units: `Task ID`, `Task Name`, `Status`, `Scenario Coverage`, `Loop Type`, `BDD Verification`, `Verification`.
+For execution reliability, represent the queue as explicit task-block units: `Task ID`, `Task Name`, `Status`, `Scenario Coverage`, `Loop Type`, `BDD Verification`, and `Verification`.
 
 If all tasks are already checked (`- [x]`), report:
 
diff --git a/src/pb_spec/templates/skills/pb-build/references/implementer_prompt.md b/src/pb_spec/templates/skills/pb-build/references/implementer_prompt.md
@@ -24,7 +24,7 @@ You are implementing **Task {{TASK_NUMBER}}: {{TASK_NAME}}**.
 
 ## Your Job
 
-Execute the following steps in strict order. **You must output your reasoning for each step.** Do not skip or reorder any step.
+Execute the following steps in strict order. Report concise decisions and evidence for each step. Do not skip or reorder any step.
 
 Before coding, define a compact task contract from the provided task block:
 
diff --git a/tests/test_platforms.py b/tests/test_platforms.py
@@ -5,6 +5,7 @@
 import pytest
 
 from pb_spec.platforms import get_platform, resolve_targets
+from pb_spec.platforms.base import SKILL_METADATA
 from pb_spec.platforms.claude import ClaudePlatform
 from pb_spec.platforms.codex import CodexPlatform
 from pb_spec.platforms.copilot import CopilotPlatform
@@ -19,6 +20,11 @@ def test_skill_names_returns_four_skills():
     assert platform.skill_names == ["pb-init", "pb-plan", "pb-refine", "pb-build"]
 
 
+def test_skill_metadata_descriptions_match_current_workflow():
+    assert "managed AGENTS.md snapshot" in SKILL_METADATA["pb-init"]
+    assert "BDD+TDD" in SKILL_METADATA["pb-build"]
+
+
 # --- get_skill_path ---
 
 
diff --git a/tests/test_templates.py b/tests/test_templates.py
@@ -364,6 +364,21 @@ def test_pb_build_templates_escalate_after_three_failures():
         assert "retry budget" in content
 
 
+def test_pb_build_templates_parse_task_blocks_instead_of_raw_checkboxes():
+    """pb-build should treat Task X.Y blocks as the execution unit, not each checkbox line."""
+    for content in (load_skill_content("pb-build"), load_prompt("pb-build")):
+        assert "Determine unfinished tasks from each `### Task X.Y:` block" in content
+        assert "Do not treat every `- [ ]` step as a separate task." in content
+
+
+def test_pb_build_implementer_templates_require_concise_evidence_not_reasoning_dump():
+    """Implementer templates should ask for concise evidence, not full reasoning traces."""
+    build_refs = load_references("pb-build")
+    for content in (build_refs["implementer_prompt.md"], load_prompt("pb-build")):
+        assert "output your reasoning for each step" not in content
+        assert "Report concise decisions and evidence for each step" in content
+
+
 def test_pb_build_implementer_templates_require_runtime_evidence():
     """Implementer guidance should require runtime log/probe evidence when applicable."""
     build_refs = load_references("pb-build")
@@ -378,3 +393,13 @@ def test_pb_refine_templates_accept_build_block_packets():
     for content in (load_skill_content("pb-refine"), load_prompt("pb-refine")):
         assert "Build-block packets" in content
         assert "🛑 Build Blocked" in content
+
+
+def test_project_design_doc_matches_current_snapshot_workflow():
+    """docs/design.md should describe the same managed-snapshot workflow implemented by templates."""
+    design = Path("docs/design.md").read_text(encoding="utf-8")
+
+    assert "managed snapshot block" in design
+    assert "Architecture Decision Snapshot" in design
+    assert "BDD outer loop" in design
+    assert "parity is guarded by regression tests" in design