Skip to content

Commit 48881f8

Browse files
doc: test design refinements (bmad-code-org#1382)
1 parent efbe839 commit 48881f8

File tree

4 files changed

+671
-351
lines changed

4 files changed

+671
-351
lines changed

src/bmm/workflows/testarch/test-design/checklist.md

Lines changed: 147 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -80,23 +80,29 @@
8080
- [ ] Owners assigned where applicable
8181
- [ ] No duplicate coverage (same behavior at multiple levels)
8282

83-
### Execution Order
83+
### Execution Strategy
8484

85-
- [ ] Smoke tests defined (<5 min target)
86-
- [ ] P0 tests listed (<10 min target)
87-
- [ ] P1 tests listed (<30 min target)
88-
- [ ] P2/P3 tests listed (<60 min target)
89-
- [ ] Order optimizes for fast feedback
85+
**CRITICAL: Keep execution strategy simple, avoid redundancy**
86+
87+
- [ ] **Simple structure**: PR / Nightly / Weekly (NOT complex smoke/P0/P1/P2 tiers)
88+
- [ ] **PR execution**: All functional tests unless significant infrastructure overhead
89+
- [ ] **Nightly/Weekly**: Only performance, chaos, long-running, manual tests
90+
- [ ] **No redundancy**: Don't re-list all tests (already in coverage plan)
91+
- [ ] **Philosophy stated**: "Run everything in PRs if <15 min, defer only if expensive/long"
92+
- [ ] **Playwright parallelization noted**: 100s of tests in 10-15 min
9093

9194
### Resource Estimates
9295

93-
- [ ] P0 hours calculated (count × 2 hours)
94-
- [ ] P1 hours calculated (count × 1 hour)
95-
- [ ] P2 hours calculated (count × 0.5 hours)
96-
- [ ] P3 hours calculated (count × 0.25 hours)
97-
- [ ] Total hours summed
98-
- [ ] Days estimate provided (hours / 8)
99-
- [ ] Estimates include setup time
96+
**CRITICAL: Use intervals/ranges, NOT exact numbers**
97+
98+
- [ ] P0 effort provided as interval range (e.g., "~25-40 hours" NOT "36 hours")
99+
- [ ] P1 effort provided as interval range (e.g., "~20-35 hours" NOT "27 hours")
100+
- [ ] P2 effort provided as interval range (e.g., "~10-30 hours" NOT "15.5 hours")
101+
- [ ] P3 effort provided as interval range (e.g., "~2-5 hours" NOT "2.5 hours")
102+
- [ ] Total effort provided as interval range (e.g., "~55-110 hours" NOT "81 hours")
103+
- [ ] Timeline provided as week range (e.g., "~1.5-3 weeks" NOT "11 days")
104+
- [ ] Estimates include setup time and account for complexity variations
105+
- [ ] **No false precision**: Avoid exact calculations like "18 tests × 2 hours = 36 hours"
100106

101107
### Quality Gate Criteria
102108

@@ -126,11 +132,16 @@
126132

127133
### Priority Assignment Accuracy
128134

129-
- [ ] P0: Truly blocks core functionality
130-
- [ ] P0: High-risk (score ≥6)
131-
- [ ] P0: No workaround exists
132-
- [ ] P1: Important but not blocking
133-
- [ ] P2/P3: Nice-to-have or edge cases
135+
**CRITICAL: Priority classification is separate from execution timing**
136+
137+
- [ ] **Priority sections (P0/P1/P2/P3) do NOT include execution context** (e.g., no "Run on every commit" in headers)
138+
- [ ] **Priority sections have only "Criteria" and "Purpose"** (no "Execution:" field)
139+
- [ ] **Execution Strategy section** is separate and handles timing based on infrastructure overhead
140+
- [ ] P0: Truly blocks core functionality + High-risk (≥6) + No workaround
141+
- [ ] P1: Important features + Medium-risk (3-4) + Common workflows
142+
- [ ] P2: Secondary features + Low-risk (1-2) + Edge cases
143+
- [ ] P3: Nice-to-have + Exploratory + Benchmarks
144+
- [ ] **Note at top of Test Coverage Plan**: Clarifies P0/P1/P2/P3 = priority/risk, NOT execution timing
134145

135146
### Test Level Selection
136147

@@ -176,58 +187,90 @@
176187
- [ ] 🚨 BLOCKERS - Team Must Decide (Sprint 0 critical path items)
177188
- [ ] ⚠️ HIGH PRIORITY - Team Should Validate (recommendations for approval)
178189
- [ ] 📋 INFO ONLY - Solutions Provided (no decisions needed)
179-
- [ ] **Risk Assessment** section
190+
- [ ] **Risk Assessment** section - **ACTIONABLE**
180191
- [ ] Total risks identified count
181192
- [ ] High-priority risks table (score ≥6) with all columns: Risk ID, Category, Description, Probability, Impact, Score, Mitigation, Owner, Timeline
182193
- [ ] Medium and low-priority risks tables
183194
- [ ] Risk category legend included
184-
- [ ] **Testability Concerns** section (if system has architectural constraints)
185-
- [ ] Blockers to fast feedback table
186-
- [ ] Explanation of why standard CI/CD may not apply (if applicable)
187-
- [ ] Tiered testing strategy table (if forced by architecture)
188-
- [ ] Architectural improvements needed (or acknowledgment system supports testing well)
195+
- [ ] **Testability Concerns and Architectural Gaps** section - **ACTIONABLE**
196+
- [ ] **Sub-section: 🚨 ACTIONABLE CONCERNS** at TOP
197+
- [ ] Blockers to Fast Feedback table (WHAT architecture must provide)
198+
- [ ] Architectural Improvements Needed (WHAT must be changed)
199+
- [ ] Each concern has: Owner, Timeline, Impact
200+
- [ ] **Sub-section: Testability Assessment Summary** at BOTTOM (FYI)
201+
- [ ] What Works Well (passing items)
202+
- [ ] Accepted Trade-offs (no action required)
203+
- [ ] This section only included if worth mentioning; otherwise omitted
189204
- [ ] **Risk Mitigation Plans** for all high-priority risks (≥6)
190205
- [ ] Each plan has: Strategy (numbered steps), Owner, Timeline, Status, Verification
206+
- [ ] **Only Backend/DevOps/Arch/Security mitigations** (production code changes)
207+
- [ ] QA-owned mitigations belong in QA doc instead
191208
- [ ] **Assumptions and Dependencies** section
209+
- [ ] **Architectural assumptions only** (SLO targets, replication lag, system design)
192210
- [ ] Assumptions list (numbered)
193211
- [ ] Dependencies list with required dates
194212
- [ ] Risks to plan with impact and contingency
213+
- [ ] QA execution assumptions belong in QA doc instead
195214
- [ ] **NO test implementation code** (long examples belong in QA doc)
215+
- [ ] **NO test scripts** (no Playwright test(...) blocks, no assertions, no test setup code)
216+
- [ ] **NO NFR test examples** (NFR sections describe WHAT to test, not HOW to test)
196217
- [ ] **NO test scenario checklists** (belong in QA doc)
197-
- [ ] **Cross-references to QA doc** where appropriate
218+
- [ ] **NO bloat or repetition** (consolidate repeated notes, avoid over-explanation)
219+
- [ ] **Cross-references to QA doc** where appropriate (instead of duplication)
220+
- [ ] **RECIPE SECTIONS NOT IN ARCHITECTURE DOC:**
221+
- [ ] NO "Test Levels Strategy" section (unit/integration/E2E split belongs in QA doc only)
222+
- [ ] NO "NFR Testing Approach" section with detailed test procedures (belongs in QA doc only)
223+
- [ ] NO "Test Environment Requirements" section (belongs in QA doc only)
224+
- [ ] NO "Recommendations for Sprint 0" section with test framework setup (belongs in QA doc only)
225+
- [ ] NO "Quality Gate Criteria" section (pass rates, coverage targets belong in QA doc only)
226+
- [ ] NO "Tool Selection" section (Playwright, k6, etc. belongs in QA doc only)
198227

199228
### test-design-qa.md
200229

201-
- [ ] **Purpose statement** at top (execution recipe for QA team)
202-
- [ ] **Quick Reference for QA** section
203-
- [ ] Before You Start checklist
204-
- [ ] Test Execution Order
205-
- [ ] Need Help? guidance
206-
- [ ] **System Architecture Summary** (brief overview of services and data flow)
207-
- [ ] **Test Environment Requirements** in early section (section 1-3, NOT buried at end)
208-
- [ ] Table with Local/Dev/Staging environments
209-
- [ ] Key principles listed (shared DB, randomization, parallel-safe, self-cleaning, shift-left)
210-
- [ ] Code example provided
211-
- [ ] **Testability Assessment** with prerequisites checklist
212-
- [ ] References Architecture doc blockers (not duplication)
213-
- [ ] **Test Levels Strategy** with unit/integration/E2E split
214-
- [ ] System type identified
215-
- [ ] Recommended split percentages with rationale
216-
- [ ] Test count summary (P0/P1/P2/P3 totals)
230+
**NEW STRUCTURE (streamlined from 375 to ~287 lines):**
231+
232+
- [ ] **Purpose statement** at top (test execution recipe)
233+
- [ ] **Executive Summary** with risk summary and coverage summary
234+
- [ ] **Dependencies & Test Blockers** section in POSITION 2 (right after Executive Summary)
235+
- [ ] Backend/Architecture dependencies listed (what QA needs from other teams)
236+
- [ ] QA infrastructure setup listed (factories, fixtures, environments)
237+
- [ ] Code example with playwright-utils if config.tea_use_playwright_utils is true
238+
- [ ] Test from '@seontechnologies/playwright-utils/api-request/fixtures'
239+
- [ ] Expect from '@playwright/test' (playwright-utils does not re-export expect)
240+
- [ ] Code examples include assertions (no unused imports)
241+
- [ ] **Risk Assessment** section (brief, references Architecture doc)
242+
- [ ] High-priority risks table
243+
- [ ] Medium/low-priority risks table
244+
- [ ] Each risk shows "QA Test Coverage" column (how QA validates)
217245
- [ ] **Test Coverage Plan** with P0/P1/P2/P3 sections
218-
- [ ] Each priority has: Execution details, Purpose, Criteria, Test Count
219-
- [ ] Detailed test scenarios WITH CHECKBOXES
220-
- [ ] Coverage table with columns: Requirement | Test Level | Risk Link | Test Count | Owner | Notes
221-
- [ ] **Sprint 0 Setup Requirements**
222-
- [ ] Architecture/Backend blockers listed with cross-references to Architecture doc
223-
- [ ] QA Test Infrastructure section (factories, fixtures)
224-
- [ ] Test Environments section (Local, CI/CD, Staging, Production)
225-
- [ ] Sprint 0 NFR Gates checklist
226-
- [ ] Sprint 1 Items clearly separated
227-
- [ ] **NFR Readiness Summary** (reference to Architecture doc, not duplication)
228-
- [ ] Table with NFR categories, status, evidence, blocker, next action
229-
- [ ] **Cross-references to Architecture doc** (not duplication)
230-
- [ ] **NO architectural theory** (just reference Architecture doc)
246+
- [ ] Priority sections have ONLY "Criteria" (no execution context)
247+
- [ ] Note at top: "P0/P1/P2/P3 = priority, NOT execution timing"
248+
- [ ] Test tables with columns: Test ID | Requirement | Test Level | Risk Link | Notes
249+
- [ ] **Execution Strategy** section (organized by TOOL TYPE)
250+
- [ ] Every PR: Playwright tests (~10-15 min)
251+
- [ ] Nightly: k6 performance tests (~30-60 min)
252+
- [ ] Weekly: Chaos & long-running (~hours)
253+
- [ ] Philosophy: "Run everything in PRs unless expensive/long-running"
254+
- [ ] **QA Effort Estimate** section (QA effort ONLY)
255+
- [ ] Interval-based estimates (e.g., "~1-2 weeks" NOT "36 hours")
256+
- [ ] NO DevOps, Backend, Data Eng, Finance effort
257+
- [ ] NO Sprint breakdowns (too prescriptive)
258+
- [ ] **Appendix A: Code Examples & Tagging**
259+
- [ ] **Appendix B: Knowledge Base References**
260+
261+
**REMOVED SECTIONS (bloat):**
262+
- [ ] ❌ NO Quick Reference section (bloat)
263+
- [ ] ❌ NO System Architecture Summary (bloat)
264+
- [ ] ❌ NO Test Environment Requirements as separate section (integrated into Dependencies)
265+
- [ ] ❌ NO Testability Assessment section (bloat - covered in Dependencies)
266+
- [ ] ❌ NO Test Levels Strategy section (bloat - obvious from test scenarios)
267+
- [ ] ❌ NO NFR Readiness Summary (bloat)
268+
- [ ] ❌ NO Quality Gate Criteria section (teams decide for themselves)
269+
- [ ] ❌ NO Follow-on Workflows section (bloat - BMAD commands self-explanatory)
270+
- [ ] ❌ NO Approval section (unnecessary formality)
271+
- [ ] ❌ NO Infrastructure/DevOps/Finance effort tables (out of scope)
272+
- [ ] ❌ NO Sprint 0/1/2/3 breakdown tables (too prescriptive)
273+
- [ ] ❌ NO Next Steps section (bloat)
231274

232275
### Cross-Document Consistency
233276

@@ -238,6 +281,40 @@
238281
- [ ] Dates and authors match across documents
239282
- [ ] ADR and PRD references consistent
240283

284+
### Document Quality (Anti-Bloat Check)
285+
286+
**CRITICAL: Check for bloat and repetition across BOTH documents**
287+
288+
- [ ] **No repeated notes 10+ times** (e.g., "Timing is pessimistic until R-005 fixed" on every section)
289+
- [ ] **Repeated information consolidated** (write once at top, reference briefly if needed)
290+
- [ ] **No excessive detail** that doesn't add value (obvious concepts, redundant examples)
291+
- [ ] **Focus on unique/critical info** (only document what's different from standard practice)
292+
- [ ] **Architecture doc**: Concerns-focused, NOT implementation-focused
293+
- [ ] **QA doc**: Implementation-focused, NOT theory-focused
294+
- [ ] **Clear separation**: Architecture = WHAT and WHY, QA = HOW
295+
- [ ] **Professional tone**: No AI slop markers
296+
- [ ] Avoid excessive ✅/❌ emojis (use sparingly, only when adding clarity)
297+
- [ ] Avoid "absolutely", "excellent", "fantastic", overly enthusiastic language
298+
- [ ] Write professionally and directly
299+
- [ ] **Architecture doc length**: Target ~150-200 lines max (focus on actionable concerns only)
300+
- [ ] **QA doc length**: Keep concise, remove bloat sections
301+
302+
### Architecture Doc Structure (Actionable-First Principle)
303+
304+
**CRITICAL: Validate structure follows actionable-first, FYI-last principle**
305+
306+
- [ ] **Actionable sections at TOP:**
307+
- [ ] Quick Guide (🚨 BLOCKERS first, then ⚠️ HIGH PRIORITY, then 📋 INFO ONLY last)
308+
- [ ] Risk Assessment (high-priority risks ≥6 at top)
309+
- [ ] Testability Concerns (concerns/blockers at top, passing items at bottom)
310+
- [ ] Risk Mitigation Plans (for high-priority risks ≥6)
311+
- [ ] **FYI sections at BOTTOM:**
312+
- [ ] Testability Assessment Summary (what works well - only if worth mentioning)
313+
- [ ] Assumptions and Dependencies
314+
- [ ] **ASRs categorized correctly:**
315+
- [ ] Actionable ASRs included in 🚨 or ⚠️ sections
316+
- [ ] FYI ASRs included in 📋 section or omitted if obvious
317+
241318
## Completion Criteria
242319

243320
**All must be true:**
@@ -295,17 +372,30 @@ If workflow fails:
295372

296373
- **Solution**: Use test pyramid - E2E for critical paths only
297374

298-
**Issue**: Resource estimates too high
375+
**Issue**: Resource estimates too high or too precise
376+
377+
- **Solution**:
378+
- Invest in fixtures/factories to reduce per-test setup time
379+
- Use interval ranges (e.g., "~55-110 hours") instead of exact numbers (e.g., "81 hours")
380+
- Widen intervals if high uncertainty exists
381+
382+
**Issue**: Execution order section too complex or redundant
299383

300-
- **Solution**: Invest in fixtures/factories to reduce per-test setup time
384+
- **Solution**:
385+
- Default: Run everything in PRs (<15 min with Playwright parallelization)
386+
- Only defer to nightly/weekly if expensive (k6, chaos, 4+ hour tests)
387+
- Don't create smoke/P0/P1/P2/P3 tier structure
388+
- Don't re-list all tests (already in coverage plan)
301389

302390
### Best Practices
303391

304392
- Base risk assessment on evidence, not assumptions
305393
- High-priority risks (≥6) require immediate mitigation
306394
- P0 tests should cover <10% of total scenarios
307395
- Avoid testing same behavior at multiple levels
308-
- Include smoke tests (P0 subset) for fast feedback
396+
- **Use interval-based estimates** (e.g., "~25-40 hours") instead of exact numbers to avoid false precision and provide flexibility
397+
- **Keep execution strategy simple**: Default to "run everything in PRs" (<15 min with Playwright), only defer if expensive/long-running
398+
- **Avoid execution order redundancy**: Don't create complex tier structures or re-list tests
309399

310400
---
311401

0 commit comments

Comments
 (0)