|
| 1 | +# Faster PR Workflows Without Sacrificing Quality |
| 2 | + |
| 3 | +> Generated by swarm planning session on 2026-02-15 |
| 4 | +
|
| 5 | +## Summary |
| 6 | + |
| 7 | +Improve the speed of landing PRs for the maintainer (wwwillchen/wwwillchen-bot) by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The current workflow is already highly automated — this plan targets the remaining friction: CI wall-clock time, human-in-the-loop steps, and the "waiting black hole" between push and merge. |
| 8 | + |
| 9 | +## Problem Statement |
| 10 | + |
| 11 | +The current PR landing flow involves a multi-cycle loop: push → wait for CI (~10-15 min) → wait for AI review (~10-30 min) → address comments → re-push → wait again. Each iteration adds latency, and for 3-4 PRs per day, this compounds into 1-2 hours of waiting plus context-switching costs. The quality gates (multi-reviewer AI review, comprehensive E2E testing, lint/type checks) are working well — the problem is **throughput**, not quality. |
| 12 | + |
| 13 | +## Scope |
| 14 | + |
| 15 | +### In Scope (MVP) |
| 16 | + |
| 17 | +- CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation) |
| 18 | +- Auto-trigger fix loop for privileged author PRs (remove label requirement) |
| 19 | +- Tiered test selection based on changed files |
| 20 | +- PR status watcher for terminal notifications |
| 21 | +- Unified `/pr` skill entry point |
| 22 | +- Move pr-review-responder to self-hosted runners (enable E2E fix capability) |
| 23 | + |
| 24 | +### Out of Scope (Follow-up) |
| 25 | + |
| 26 | +- Auto-merge changes (keep manual "merge-when-ready" label) |
| 27 | +- External contributor experience improvements |
| 28 | +- Bash script alternative to `/fast-push` (optimize existing skill instead) |
| 29 | +- Dry-run/preview mode for skills |
| 30 | +- Incremental Electron builds |
| 31 | +- Notification channels beyond terminal (desktop, Slack) |
| 32 | +- Multi-PR-in-flight status dashboard |
| 33 | + |
| 34 | +## User Stories |
| 35 | + |
| 36 | +- As a maintainer, I want CI to complete faster on my self-hosted runners, so I spend less time waiting between push and feedback. |
| 37 | +- As a maintainer, I want CI failures to be automatically diagnosed and fixed without me triggering a skill, so the fix-push-verify loop runs hands-free. |
| 38 | +- As a maintainer, I want to be notified in my terminal when CI completes (pass or fail), so I can confidently context-switch to other work during CI runs. |
| 39 | +- As a maintainer, I want a single `/pr` command that detects what my PR needs and does the right thing, so I don't need to remember which of 10+ skills to invoke. |
| 40 | +- As a maintainer, I want CI to skip E2E tests for low-risk changes (docs, config, test-only), so trivial PRs land faster without wasting runner time. |
| 41 | +- As a maintainer, I want the automated fix loop to be able to fix E2E failures (not just lint/type/unit), so fewer PRs get stuck at `needs-human:review-issue`. |
| 42 | + |
| 43 | +## UX Design |
| 44 | + |
| 45 | +### User Flow (Optimized) |
| 46 | + |
| 47 | +1. Developer writes code (with Claude Code assistance) |
| 48 | +2. Runs `/pr` — unified entry point detects uncommitted changes, runs lint/format/type-check/unit-tests locally, commits, pushes, creates/updates PR |
| 49 | +3. `/pr` kicks off a background PR watcher that polls CI status |
| 50 | +4. Developer context-switches to other work |
| 51 | +5. Terminal notification: "CI passed / CI failed on PR #123" |
| 52 | +6. If CI failed: auto-fix loop triggers automatically (no label needed), developer sees notification when fix is pushed and re-CI starts |
| 53 | +7. If all checks pass: developer adds "merge-when-ready" label → auto-merge |
| 54 | +8. If auto-fix loop fails after 4 iterations: `needs-human:review-issue` label applied, developer investigates |
| 55 | + |
| 56 | +### Key States |
| 57 | + |
| 58 | +- **Pushing**: `/pr` runs local validation and pushes. Terminal shows progress via task tracking. |
| 59 | +- **Waiting**: Background watcher polls. Developer sees no output unless something happens. |
| 60 | +- **CI Passed**: Terminal notification (bell + message). Developer returns to add merge label. |
| 61 | +- **CI Failed**: Terminal notification. Auto-fix loop kicks in automatically. Developer notified of each iteration. |
| 62 | +- **Fix Loop Exhausted**: `needs-human:review-issue` label. Terminal notification. Developer investigates. |
| 63 | +- **Merged**: After "merge-when-ready" label + CI pass, auto-merge fires. PR closed. |
| 64 | + |
| 65 | +### Interaction Details |
| 66 | + |
| 67 | +- `/pr` is the primary entry point. It inspects current state (uncommitted changes? open PR? failing checks? review comments?) and routes to the appropriate action. |
| 68 | +- Sub-commands available for power users: `/pr push`, `/pr fix`, `/pr watch`, `/pr rebase` |
| 69 | +- Background watcher uses `gh pr checks --watch` or polling `gh pr checks` every 30 seconds |
| 70 | +- Terminal notifications use bell character + colored status line |
| 71 | + |
| 72 | +### Accessibility |
| 73 | + |
| 74 | +- All feedback is text-based (terminal output), compatible with screen readers |
| 75 | +- Status notifications include text labels, not just colors |
| 76 | +- No custom UI components — leverages GitHub's accessible web interface for detailed status |
| 77 | + |
| 78 | +## Technical Design |
| 79 | + |
| 80 | +### Architecture |
| 81 | + |
| 82 | +All changes are to the CI/CD pipeline, GitHub Actions workflows, and Claude Code skills. No application code changes. |
| 83 | + |
| 84 | +### Components Affected |
| 85 | + |
| 86 | +| Component | Change | Complexity | |
| 87 | +| ------------------------------------------- | ------------------------------------------------------------------------------ | ---------- | |
| 88 | +| `.github/workflows/ci.yml` | Add persistent caching, merge build+E2E for self-hosted, tiered test selection | Medium | |
| 89 | +| `.github/workflows/pr-review-responder.yml` | Remove label requirement for privileged authors, move to self-hosted runners | Small | |
| 90 | +| `scripts/ci-cleanup-macos.sh` | Preserve caches (Playwright, pnpm store, nextjs-template, scaffold deps) | Small | |
| 91 | +| `playwright.config.ts` | Add test tagging/grouping for tiered selection | Medium | |
| 92 | +| `.claude/skills/pr/SKILL.md` (new) | Unified `/pr` entry point that routes to sub-skills | Medium | |
| 93 | +| `.claude/skills/pr-watch/SKILL.md` (new) | Background CI status watcher with terminal notifications | Small | |
| 94 | +| `.claude/skills/fast-push/SKILL.md` | Optimize for speed — reduce unnecessary steps | Small | |
| 95 | + |
| 96 | +### Data Model Changes |
| 97 | + |
| 98 | +None. All improvements are to pipeline/tooling configuration. |
| 99 | + |
| 100 | +### API Changes |
| 101 | + |
| 102 | +None. Internal tooling only. |
| 103 | + |
| 104 | +## Implementation Plan |
| 105 | + |
| 106 | +### Phase 0: Instrument (prerequisite) |
| 107 | + |
| 108 | +- [ ] Add CI job timing annotations (start/end timestamps per job) to measure baseline |
| 109 | +- [ ] Track PR cycle time (first push to merge) — could be a simple GitHub Action that logs timestamps |
| 110 | +- [ ] Measure auto-fix loop iteration frequency (how often does `cc:request:2`, `cc:request:3`, `cc:request:4` get reached?) |
| 111 | +- [ ] Measure CI first-push pass rate for privileged author PRs |
| 112 | +- [ ] Document current `scripts/ci-cleanup-macos.sh` behavior — confirm what's already cached vs. cleaned |
| 113 | + |
| 114 | +### Phase 1: CI Pipeline Quick Wins |
| 115 | + |
| 116 | +- [ ] **Optimize self-hosted runner caching**: Preserve Playwright browser binaries, pnpm store, nextjs-template clone + deps, scaffold deps between runs. Update `scripts/ci-cleanup-macos.sh` to keep these cached (keyed on lockfile/dependency hashes). Note: `node_modules/` already persists (cleanup script doesn't remove it), so `npm ci` is already partially cached — focus caching effort on Playwright, pnpm store, and template deps. |
| 117 | +- [ ] **Skip redundant installs**: Check if Chromium is already installed before running `npx playwright install chromium`. Check if nextjs-template exists and is up-to-date before cloning. |
| 118 | +- [ ] **Merge build + E2E into single job for self-hosted runners**: Eliminate artifact tar/upload/download overhead and redundant setup steps (second checkout, second npm ci, second pnpm/scaffold/nextjs-template setup). Keep separate jobs for external contributors (they need cross-OS artifacts). Estimated savings: 3-5 min per CI run. |
| 119 | +- [ ] **Validate caching gains**: Compare CI timing before/after to confirm actual savings. Run a single diagnostic PR with `date` commands at each step boundary to establish baseline. |
| 120 | + |
| 121 | +### Phase 2: Auto-Fix Loop Enhancement |
| 122 | + |
| 123 | +- [ ] **Auto-trigger fix loop for privileged authors**: Modify `pr-review-responder.yml` to trigger for all privileged-author PRs, not just those with `cc:request` label. Auto-apply `cc:request` on PR creation for privileged authors, or remove the label check entirely. |
| 124 | +- [ ] **Move pr-review-responder to self-hosted macOS runners**: Currently runs on `ubuntu-latest` which cannot execute Electron Playwright tests. Moving to self-hosted enables the fix loop to actually fix E2E failures, not just lint/type/unit failures. |
| 125 | +- [ ] **Add skip-review for trivial fixes**: When the fix commit only changes lint/formatting or updates snapshots, automatically skip re-triggering the Claude PR Review. Use a heuristic: if diff is under N lines and only touches files mentioned in resolved review comments. |
| 126 | +- [ ] **Add notification comments**: When auto-fix triggers, post a brief PR comment ("CI failed on lint. Auto-fix triggered.") so the developer has context without reading Action logs. |
| 127 | + |
| 128 | +### Phase 3: Tiered Test Selection |
| 129 | + |
| 130 | +- [ ] **Implement change classification in `check-changes` job**: Classify changed files into tiers: |
| 131 | + - **Tier 0** (docs/config/.claude/rules only): Skip CI entirely (already exists) |
| 132 | + - **Tier 1** (test files only): Run lint + type-check + only the changed test files |
| 133 | + - **Tier 2** (source code, no schema/config changes): Run lint + type-check + unit tests + targeted E2E |
| 134 | + - **Tier 3** (broad changes, deps, config): Full CI suite |
| 135 | +- [ ] **Tag E2E tests by feature area**: Add metadata to Playwright tests indicating which source files/features they cover. Start with coarse groupings (chat, database, deployment, settings, etc.). |
| 136 | +- [ ] **Add `full-ci` escape label**: Allow forcing full CI via a label for cases where the tier classification seems wrong. Post a CI comment explaining which tier was selected and why. |
| 137 | +- [ ] **Maintain nightly full-suite coverage**: The existing deflake-e2e scheduled run provides a safety net for regressions missed by tiered selection. |
| 138 | + |
| 139 | +### Phase 4: Developer Experience |
| 140 | + |
| 141 | +- [ ] **Create unified `/pr` skill**: Single entry point that inspects state and routes: |
| 142 | + - Uncommitted changes → push flow (lint, commit, push, create/update PR) |
| 143 | + - Open PR with failing checks → fix flow (diagnose + fix CI failures) |
| 144 | + - Open PR with unresolved review comments → comment fix flow |
| 145 | + - Open PR with all checks passing → suggest adding "merge-when-ready" label |
| 146 | + - Open PR needing rebase → rebase flow |
| 147 | + - Fold `/fast-push` behavior into `/pr` as the default for deterministic operations (use haiku), with opus reserved for reasoning-heavy tasks (conflict resolution, complex review responses). Deprecate `/fast-push` as a separate entry point. |
| 148 | +- [ ] **Create `/pr-watch` skill**: Background watcher that polls `gh pr checks` and notifies on completion. Should surface specific failure type (lint, type-check, unit test, E2E) so developer knows whether to intervene or let auto-fix handle it. Could be as simple as `gh pr checks --watch` with a terminal bell on exit. |
| 149 | +- [ ] **Add pre-push snapshot warning**: If UI source files changed but E2E snapshot files didn't, warn that snapshots likely need updating before push. Lightweight heuristic to prevent the most common E2E failure cause (~30% of E2E failures). |
| 150 | + |
| 151 | +## Testing Strategy |
| 152 | + |
| 153 | +- [ ] **CI timing**: Compare wall-clock CI duration before/after each phase. Target: 30-50% reduction for typical PRs. |
| 154 | +- [ ] **Cache correctness**: Verify builds with cached dependencies produce identical outputs to clean builds. Run both in parallel for a verification period. |
| 155 | +- [ ] **Tier classification accuracy**: Run full CI alongside tiered CI for 2 weeks to measure false-negative rate (changes that should have triggered full CI but didn't). |
| 156 | +- [ ] **Auto-fix loop effectiveness**: Track how often the fix loop resolves issues automatically vs. escalating to human. Target: fix loop handles >80% of CI failures without human intervention. |
| 157 | +- [ ] **Quality regression**: Monitor for bugs/reverts within 24 hours of merge. Must not increase from current baseline. |
| 158 | + |
| 159 | +## Risks & Mitigations |
| 160 | + |
| 161 | +| Risk | Likelihood | Impact | Mitigation | |
| 162 | +| --------------------------------------------------------------- | ---------- | ------ | ----------------------------------------------------------------------------------------------------------- | |
| 163 | +| Stale caches cause false CI passes on self-hosted runners | Medium | High | Hash-based cache keys, periodic full cache clear, nightly clean build | |
| 164 | +| Tiered test selection misses regressions | Medium | High | Conservative tier classification, `full-ci` escape label, nightly full suite, always full CI on `main` push | |
| 165 | +| Auto-fix loop on self-hosted runners consumes capacity | Low | Medium | 2+ runners available; add concurrency limits if needed | |
| 166 | +| Merged build+E2E job can't retry just E2E on flakes | Medium | Medium | Monitor flake rate; revert to split jobs if flake-retry savings > artifact-transfer savings | |
| 167 | +| Unified `/pr` skill makes wrong routing decision | Low | Low | Show what action will be taken before executing; sub-commands for manual override | |
| 168 | +| Moving pr-review-responder to self-hosted increases runner load | Low | Low | 2+ runners available; responder runs are short-lived | |
| 169 | + |
| 170 | +## Open Questions |
| 171 | + |
| 172 | +1. **What is the actual CI time breakdown per job?** Phase 0 instrumentation will answer this. The answer determines whether Phase 1 (caching) or Phase 3 (test selection) delivers more value. |
| 173 | +2. **How often does the auto-fix loop iterate beyond 1 cycle?** If most PRs converge in 1 iteration, loop optimization matters less. If 3-4 iterations are common, optimizing the loop is critical path. |
| 174 | +3. **What is the E2E flake rate?** Determines whether merging build+E2E jobs (losing independent E2E retry) is a net win or net loss. |
| 175 | +4. **How should E2E tests be tagged/grouped?** Coarse groupings (by feature area) are easier to maintain but less precise. Fine-grained file-level dependency tracking is more accurate but harder to keep updated. Start coarse and refine. |
| 176 | + |
| 177 | +## Decision Log |
| 178 | + |
| 179 | +| Decision | Options Considered | Chosen | Reasoning | |
| 180 | +| ------------------------ | ------------------------------------------------------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | |
| 181 | +| Auto-merge approach | Default auto-merge, auto-merge on cc:done, keep current | **Keep current** | User prefers the deliberate "merge-when-ready" checkpoint. Focus speed improvements elsewhere. | |
| 182 | +| Target persona | Maintainer only, both maintainer + contributors | **Maintainer only** | Focused scope delivers faster. External contributor improvements deferred. | |
| 183 | +| Fast push implementation | Bash script, optimize existing skill, both paths | **Optimize existing skill** | Avoids maintaining two implementations. User prefers single path. | |
| 184 | +| Runner capacity | Constrained (1 runner), unconstrained (2+) | **2+ runners available** | Runner capacity is not a constraint. Can add more concurrent workflows freely. | |
| 185 | +| Review blocking merge | Non-blocking (post-merge review), blocking | **Keep blocking** | Team consensus: post-merge review comments have low compliance. Keeping review as blocking preserves quality gate. Speed up the review instead of removing it. | |
| 186 | +| Bash fast-push script | Build it, don't build it | **Don't build** | UX team raised valid concern about fragmentation. Optimize `/fast-push` skill instead. | |
| 187 | + |
| 188 | +--- |
| 189 | + |
| 190 | +_Generated by dyad:swarm-to-plan_ |
0 commit comments