Update checklist.md

localden · localden · commit a6be9bea3165 · 2025-10-05T22:55:34.000-07:00
diff --git a/templates/commands/checklist.md b/templates/commands/checklist.md
@@ -5,23 +5,24 @@ scripts:
   ps: scripts/powershell/check-prerequisites.ps1 -Json
 ---
 
-## Checklist Purpose
+## Checklist Purpose: "Unit Tests for English"
 
-**CRITICAL CLARIFICATION**: Checklists generated by this command are for **requirements validation**, NOT:
-- ❌ Verifying code execution or functionality
-- ❌ Testing whether code matches the specification
-- ❌ Checking implementation correctness
-- ❌ Code review or quality assurance
+**CRITICAL CONCEPT**: Checklists are **UNIT TESTS FOR REQUIREMENTS WRITING** - they validate the quality, clarity, and completeness of requirements in a given domain.
 
-**What checklists ARE for**:
-- ✅ Ensuring requirements are clearly captured and complete
-- ✅ Identifying ambiguities in specifications or plans
-- ✅ Verifying proper scenario coverage across the spec and plan
-- ✅ Confirming acceptance criteria are well-defined and measurable
-- ✅ Detecting gaps, conflicts, or missing edge cases in requirements
-- ✅ Validating that the problem domain is properly understood before implementation
+**NOT for verification/testing**:
+- ❌ NOT "Verify the button clicks correctly"
+- ❌ NOT "Test error handling works"
+- ❌ NOT "Confirm the API returns 200"
+- ❌ NOT checking if code/implementation matches the spec
 
-Think of checklists as a **pre-implementation review** to ensure the spec and plan are solid, not a post-implementation verification tool.
+**FOR requirements quality validation**:
+- ✅ "Are visual hierarchy requirements defined for all card types?" (completeness)
+- ✅ "Is 'prominent display' quantified with specific sizing/positioning?" (clarity)
+- ✅ "Are hover state requirements consistent across all interactive elements?" (consistency)
+- ✅ "Are accessibility requirements defined for keyboard navigation?" (coverage)
+- ✅ "Does the spec define what happens when logo image fails to load?" (edge cases)
+
+**Metaphor**: If your spec is code written in English, the checklist is its unit test suite. You're testing whether the requirements are well-written, complete, unambiguous, and ready for implementation - NOT whether the implementation works.
 
 ## User Input
 
@@ -85,69 +86,122 @@ You **MUST** consider the user input before proceeding (if not empty).
    - Use progressive disclosure: add follow-on retrieval only if gaps detected
    - If source docs are large, generate interim summary items instead of embedding raw text
 
-5. **Generate checklist**:
+5. **Generate checklist** - Create "Unit Tests for Requirements":
    - Create `FEATURE_DIR/checklists/` directory if it doesn't exist
    - Generate unique checklist filename:
-     - Use short, descriptive name based on checklist type
-     - Format: `[type].md` (e.g., `ux.md`, `test.md`, `security.md`, `deploy.md`)
-     - If file exists, append to existing file (e.g., use the same UX checklist)
+     - Use short, descriptive name based on domain (e.g., `ux.md`, `api.md`, `security.md`)
+     - Format: `[domain].md` 
+     - If file exists, append to existing file
    - Number items sequentially starting from CHK001
    - Each `/checklist` run creates a NEW file (never overwrites existing checklists)
 
-   **Category Structure** - Group items ONLY using this controlled set:
-   - Primary Flows
-   - Alternate Flows
-   - Exception / Error Flows
-   - Recovery & Resilience
-   - Non-Functional Domains (sub‑grouped or prefixed: Performance, Reliability, Security & Privacy, Accessibility, Observability, Scalability, Data Lifecycle)
-   - Traceability & Coverage
-   - Ambiguities & Conflicts
-   - Assumptions & Dependencies
+   **CORE PRINCIPLE - Test the Requirements, Not the Implementation**:
+   Every checklist item MUST evaluate the REQUIREMENTS THEMSELVES for:
+   - **Completeness**: Are all necessary requirements present?
+   - **Clarity**: Are requirements unambiguous and specific?
+   - **Consistency**: Do requirements align with each other?
+   - **Measurability**: Can requirements be objectively verified?
+   - **Coverage**: Are all scenarios/edge cases addressed?
+   
+   **Category Structure** - Group items by requirement quality dimensions:
+   - **Requirement Completeness** (Are all necessary requirements documented?)
+   - **Requirement Clarity** (Are requirements specific and unambiguous?)
+   - **Requirement Consistency** (Do requirements align without conflicts?)
+   - **Acceptance Criteria Quality** (Are success criteria measurable?)
+   - **Scenario Coverage** (Are all flows/cases addressed?)
+   - **Edge Case Coverage** (Are boundary conditions defined?)
+   - **Non-Functional Requirements** (Performance, Security, Accessibility, etc. - are they specified?)
+   - **Dependencies & Assumptions** (Are they documented and validated?)
+   - **Ambiguities & Conflicts** (What needs clarification?)
+   
+   **HOW TO WRITE CHECKLIST ITEMS - "Unit Tests for English"**:
+   
+   ❌ **WRONG** (Testing implementation):
+   - "Verify landing page displays 3 episode cards"
+   - "Test hover states work on desktop"
+   - "Confirm logo click navigates home"
+   
+   ✅ **CORRECT** (Testing requirements quality):
+   - "Are the exact number and layout of featured episodes specified?" [Completeness]
+   - "Is 'prominent display' quantified with specific sizing/positioning?" [Clarity]
+   - "Are hover state requirements consistent across all interactive elements?" [Consistency]
+   - "Are keyboard navigation requirements defined for all interactive UI?" [Coverage]
+   - "Is the fallback behavior specified when logo image fails to load?" [Edge Cases]
+   - "Are loading states defined for asynchronous episode data?" [Completeness]
+   - "Does the spec define visual hierarchy for competing UI elements?" [Clarity]
+   
+   **ITEM STRUCTURE**:
+   Each item should follow this pattern:
+   - Question format asking about requirement quality
+   - Focus on what's WRITTEN (or not written) in the spec/plan
+   - Include quality dimension in brackets [Completeness/Clarity/Consistency/etc.]
+   - Reference spec section `[Spec §X.Y]` when checking existing requirements
+   - Use `[Gap]` marker when checking for missing requirements
+   
+   **EXAMPLES BY QUALITY DIMENSION**:
+   
+   Completeness:
+   - "Are error handling requirements defined for all API failure modes? [Gap]"
+   - "Are accessibility requirements specified for all interactive elements? [Completeness]"
+   - "Are mobile breakpoint requirements defined for responsive layouts? [Gap]"
+   
+   Clarity:
+   - "Is 'fast loading' quantified with specific timing thresholds? [Clarity, Spec §NFR-2]"
+   - "Are 'related episodes' selection criteria explicitly defined? [Clarity, Spec §FR-5]"
+   - "Is 'prominent' defined with measurable visual properties? [Ambiguity, Spec §FR-4]"
    
-   Do NOT invent ad-hoc categories; merge sparse categories (<2 items) into the closest higher-signal category.
+   Consistency:
+   - "Do navigation requirements align across all pages? [Consistency, Spec §FR-10]"
+   - "Are card component requirements consistent between landing and detail pages? [Consistency]"
+   
+   Coverage:
+   - "Are requirements defined for zero-state scenarios (no episodes)? [Coverage, Edge Case]"
+   - "Are concurrent user interaction scenarios addressed? [Coverage, Gap]"
+   - "Are requirements specified for partial data loading failures? [Coverage, Exception Flow]"
+   
+   Measurability:
+   - "Are visual hierarchy requirements measurable/testable? [Acceptance Criteria, Spec §FR-1]"
+   - "Can 'balanced visual weight' be objectively verified? [Measurability, Spec §FR-2]"
 
-   **Scenario Classification & Coverage** (Requirements Validation Focus):
-   - Classify scenarios into: Primary, Alternate, Exception/Error, Recovery/Resilience, Non-Functional
-   - At least one item per present scenario class; if intentionally absent add: `Confirm intentional absence of <Scenario Class> scenarios`
-   - Include resilience/rollback coverage when state mutation or migrations occur (partial write, degraded mode, backward compatibility, rollback preconditions)
-   - If a major scenario lacks acceptance criteria, add an item to define measurable criteria
-   - **Focus on requirements validation**: Are scenarios clearly defined? Are acceptance criteria measurable? Are edge cases identified in the spec?
+   **Scenario Classification & Coverage** (Requirements Quality Focus):
+   - Check if requirements exist for: Primary, Alternate, Exception/Error, Recovery, Non-Functional scenarios
+   - For each scenario class, ask: "Are [scenario type] requirements complete, clear, and consistent?"
+   - If scenario class missing: "Are [scenario type] requirements intentionally excluded or missing? [Gap]"
+   - Include resilience/rollback when state mutation occurs: "Are rollback requirements defined for migration failures? [Gap]"
 
    **Traceability Requirements**:
    - MINIMUM: ≥80% of items MUST include at least one traceability reference
-   - Each item should include ≥1 of: scenario class tag, spec ref `[Spec §X.Y]`, acceptance criterion `[AC-##]`, or marker `(Assumption)/(Dependency)/(Ambiguity)/(Conflict)`
-   - If no ID system exists, create an item: `Establish requirement & acceptance criteria ID scheme before proceeding`
+   - Each item should reference: spec section `[Spec §X.Y]`, or use markers: `[Gap]`, `[Ambiguity]`, `[Conflict]`, `[Assumption]`
+   - If no ID system exists: "Is a requirement & acceptance criteria ID scheme established? [Traceability]"
 
-   **Surface & Resolve Issues** (Pre-Implementation Validation):
-   - Cluster and create one resolution item per cluster for:
-     - Ambiguities (vague terms in spec: "fast", "robust", "secure" - these need quantification)
-     - Conflicts (contradictory statements in requirements)
-     - Assumptions (unvalidated premises in the spec or plan)
-     - Dependencies (external systems, feature flags, migrations, upstream APIs - are they documented?)
-   - Items should focus on "Is this requirement clear enough to implement?" not "Does the code work?"
+   **Surface & Resolve Issues** (Requirements Quality Problems):
+   Ask questions about the requirements themselves:
+   - Ambiguities: "Is the term 'fast' quantified with specific metrics? [Ambiguity, Spec §NFR-1]"
+   - Conflicts: "Do navigation requirements conflict between §FR-10 and §FR-10a? [Conflict]"
+   - Assumptions: "Is the assumption of 'always available podcast API' validated? [Assumption]"
+   - Dependencies: "Are external podcast API requirements documented? [Dependency, Gap]"
+   - Missing definitions: "Is 'visual hierarchy' defined with measurable criteria? [Gap]"
 
    **Content Consolidation**:
-   - Soft cap: If raw candidate items > 40, prioritize by risk/impact and add: `Consolidate remaining low-impact scenarios (see source docs) after priority review`
-   - Merge near-duplicates when: same scenario class + same spec section + overlapping acceptance intent
-   - If >5 low-impact edge cases, cluster into a single aggregated item
-   - Do not repeat identical spec or acceptance refs in >3 items unless covering distinct scenario classes
-   - Treat context budget as finite: do not restate already-tagged requirements verbatim across multiple items
-
-   **🚫 PROHIBITED CONTENT** (Requirements Focus ONLY):
-   - Focus on requirements & scenario coverage quality, NOT implementation
-   - NEVER include: specific tests ("unit test", "integration test"), code symbols, frameworks, algorithmic prescriptions, deployment steps, test plan details, implementation strategy
-   - Rephrase any such user input into requirement clarity or coverage validation
-   - Optional brief rationale ONLY if it clarifies requirement intent or risk
-
-   **✅ HOW TO PHRASE CHECKLIST ITEMS** (Requirements Validation):
-   - Good: "Verify error handling scenarios are defined for network failures"
-   - Bad: "Test error handling for network failures"
-   - Good: "Confirm acceptance criteria are measurable for performance requirements"
-   - Bad: "Run performance tests to verify requirements"
-   - Good: "Identify edge cases for concurrent user access in spec"
-   - Bad: "Implement thread-safe concurrent access"
-   - Good: "Clarify ambiguous term 'fast response' with specific timing requirements"
-   - Bad: "Verify response time is under 100ms"
+   - Soft cap: If raw candidate items > 40, prioritize by risk/impact
+   - Merge near-duplicates checking the same requirement aspect
+   - If >5 low-impact edge cases, create one item: "Are edge cases X, Y, Z addressed in requirements? [Coverage]"
+
+   **🚫 ABSOLUTELY PROHIBITED** - These make it an implementation test, not a requirements test:
+   - ❌ Any item starting with "Verify", "Test", "Confirm", "Check" + implementation behavior
+   - ❌ References to code execution, user actions, system behavior
+   - ❌ "Displays correctly", "works properly", "functions as expected"
+   - ❌ "Click", "navigate", "render", "load", "execute"
+   - ❌ Test cases, test plans, QA procedures
+   - ❌ Implementation details (frameworks, APIs, algorithms)
+   
+   **✅ REQUIRED PATTERNS** - These test requirements quality:
+   - ✅ "Are [requirement type] defined/specified/documented for [scenario]?"
+   - ✅ "Is [vague term] quantified/clarified with specific criteria?"
+   - ✅ "Are requirements consistent between [section A] and [section B]?"
+   - ✅ "Can [requirement] be objectively measured/verified?"
+   - ✅ "Are [edge cases/scenarios] addressed in requirements?"
+   - ✅ "Does the spec define [missing aspect]?"
 
 6. **Structure Reference**: Generate the checklist following the canonical template in `templates/checklist-template.md` for title, meta section, category headings, and ID formatting. If template is unavailable, use: H1 title, purpose/created meta lines, `##` category sections containing `- [ ] CHK### <requirement item>` lines with globally incrementing IDs starting at CHK001.
 
@@ -165,39 +219,71 @@ You **MUST** consider the user input before proceeding (if not empty).
 
 To avoid clutter, use descriptive types and clean up obsolete checklists when done.
 
-## Example Checklist Types
+## Example Checklist Types & Sample Items
+
+**UX Requirements Quality:** `ux.md`
+
+Sample items (testing the requirements, NOT the implementation):
+- "Are visual hierarchy requirements defined with measurable criteria? [Clarity, Spec §FR-1]"
+- "Is the number and positioning of UI elements explicitly specified? [Completeness, Spec §FR-1]"
+- "Are interaction state requirements (hover, focus, active) consistently defined? [Consistency]"
+- "Are accessibility requirements specified for all interactive elements? [Coverage, Gap]"
+- "Is fallback behavior defined when images fail to load? [Edge Case, Gap]"
+- "Can 'prominent display' be objectively measured? [Measurability, Spec §FR-4]"
 
-**Specification Review:** `spec-review.md`
+**API Requirements Quality:** `api.md`
 
-- Requirement completeness and clarity
-- User scenarios and edge cases coverage
-- Acceptance criteria definition
-- Domain-specific considerations
+Sample items:
+- "Are error response formats specified for all failure scenarios? [Completeness]"
+- "Are rate limiting requirements quantified with specific thresholds? [Clarity]"
+- "Are authentication requirements consistent across all endpoints? [Consistency]"
+- "Are retry/timeout requirements defined for external dependencies? [Coverage, Gap]"
+- "Is versioning strategy documented in requirements? [Gap]"
 
-**Requirements Quality:** `requirements.md`
+**Performance Requirements Quality:** `performance.md`
 
-- Testable and measurable outcomes
-- Stakeholder alignment verification
-- Assumptions and constraints documentation
-- Success metrics definition
+Sample items:
+- "Are performance requirements quantified with specific metrics? [Clarity]"
+- "Are performance targets defined for all critical user journeys? [Coverage]"
+- "Are performance requirements under different load conditions specified? [Completeness]"
+- "Can performance requirements be objectively measured? [Measurability]"
+- "Are degradation requirements defined for high-load scenarios? [Edge Case, Gap]"
 
-**UX/Accessibility Scenarios:** `ux.md` or `a11y.md`
+**Security Requirements Quality:** `security.md`
 
-- User journey completeness
-- Accessibility requirement coverage
-- Responsive design considerations
-- Internationalization needs
+Sample items:
+- "Are authentication requirements specified for all protected resources? [Coverage]"
+- "Are data protection requirements defined for sensitive information? [Completeness]"
+- "Is the threat model documented and requirements aligned to it? [Traceability]"
+- "Are security requirements consistent with compliance obligations? [Consistency]"
+- "Are security failure/breach response requirements defined? [Gap, Exception Flow]"
 
-**Security Requirements:** `security.md`
+## Anti-Examples: What NOT To Do
 
-- Threat model coverage
-- Authentication/authorization requirements
-- Data protection requirements
-- Compliance and regulatory needs
+**❌ WRONG - These test implementation, not requirements:**
 
-**API/Integration Scenarios:** `api.md`
+```markdown
+- [ ] CHK001 - Verify landing page displays 3 episode cards [Spec §FR-001]
+- [ ] CHK002 - Test hover states work correctly on desktop [Spec §FR-003]
+- [ ] CHK003 - Confirm logo click navigates to home page [Spec §FR-010]
+- [ ] CHK004 - Check that related episodes section shows 3-5 items [Spec §FR-005]
+```
+
+**✅ CORRECT - These test requirements quality:**
+
+```markdown
+- [ ] CHK001 - Are the number and layout of featured episodes explicitly specified? [Completeness, Spec §FR-001]
+- [ ] CHK002 - Are hover state requirements consistently defined for all interactive elements? [Consistency, Spec §FR-003]
+- [ ] CHK003 - Are navigation requirements clear for all clickable brand elements? [Clarity, Spec §FR-010]
+- [ ] CHK004 - Is the selection criteria for related episodes documented? [Gap, Spec §FR-005]
+- [ ] CHK005 - Are loading state requirements defined for asynchronous episode data? [Gap]
+- [ ] CHK006 - Can "visual hierarchy" requirements be objectively measured? [Measurability, Spec §FR-001]
+```
 
-- Contract completeness
-- Error handling scenarios
-- Backward compatibility considerations
-- Integration touchpoint coverage
+**Key Differences:**
+- Wrong: Tests if the system works correctly
+- Correct: Tests if the requirements are written correctly
+- Wrong: Verification of behavior
+- Correct: Validation of requirement quality
+- Wrong: "Does it do X?" 
+- Correct: "Is X clearly specified?"