docs: add faster PR workflows planning document (#2739)

wwwillchen · web-flow · commit a0bc7c731b0c · 2026-02-16T22:12:23.000-08:00
## Summary Add a comprehensive planning document outlining an initiative to improve the speed of landing PRs for maintainers by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The plan covers: - CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation) - Auto-trigger fix loop for privileged author PRs (remove label requirement) - Tiered test selection based on changed files - PR status watcher for terminal notifications - Unified `/pr` skill entry point - Move pr-review-responder to self-hosted runners The plan is structured in four implementation phases: instrumentation, CI quick wins, auto-fix loop enhancements, and developer experience improvements. It includes risk assessments, testing strategies, and decision logs. ## Test plan - Verify the document is correctly formatted and renders as markdown - Review the content for clarity and alignment with project goals - Confirm all phases, user stories, and technical components are clearly described 🤖 Generated with [Claude Code](https://claude.com/claude-code)  --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2739" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a>
diff --git a/.github/workflows/flakiness-upload.yml b/.github/workflows/flakiness-upload.yml
@@ -51,4 +51,3 @@ jobs:
           FLAKINESS_ACCESS_TOKEN: ${{ secrets.FLAKINESS_ACCESS_TOKEN }}
         run: |
           find . -name report.json -exec npx -y flakiness upload {} +
-
diff --git a/plans/faster-pr-workflows.md b/plans/faster-pr-workflows.md
@@ -0,0 +1,190 @@
+# Faster PR Workflows Without Sacrificing Quality
+
+> Generated by swarm planning session on 2026-02-15
+
+## Summary
+
+Improve the speed of landing PRs for the maintainer (wwwillchen/wwwillchen-bot) by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The current workflow is already highly automated — this plan targets the remaining friction: CI wall-clock time, human-in-the-loop steps, and the "waiting black hole" between push and merge.
+
+## Problem Statement
+
+The current PR landing flow involves a multi-cycle loop: push → wait for CI (~10-15 min) → wait for AI review (~10-30 min) → address comments → re-push → wait again. Each iteration adds latency, and for 3-4 PRs per day, this compounds into 1-2 hours of waiting plus context-switching costs. The quality gates (multi-reviewer AI review, comprehensive E2E testing, lint/type checks) are working well — the problem is **throughput**, not quality.
+
+## Scope
+
+### In Scope (MVP)
+
+- CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation)
+- Auto-trigger fix loop for privileged author PRs (remove label requirement)
+- Tiered test selection based on changed files
+- PR status watcher for terminal notifications
+- Unified `/pr` skill entry point
+- Move pr-review-responder to self-hosted runners (enable E2E fix capability)
+
+### Out of Scope (Follow-up)
+
+- Auto-merge changes (keep manual "merge-when-ready" label)
+- External contributor experience improvements
+- Bash script alternative to `/fast-push` (optimize existing skill instead)
+- Dry-run/preview mode for skills
+- Incremental Electron builds
+- Notification channels beyond terminal (desktop, Slack)
+- Multi-PR-in-flight status dashboard
+
+## User Stories
+
+- As a maintainer, I want CI to complete faster on my self-hosted runners, so I spend less time waiting between push and feedback.
+- As a maintainer, I want CI failures to be automatically diagnosed and fixed without me triggering a skill, so the fix-push-verify loop runs hands-free.
+- As a maintainer, I want to be notified in my terminal when CI completes (pass or fail), so I can confidently context-switch to other work during CI runs.
+- As a maintainer, I want a single `/pr` command that detects what my PR needs and does the right thing, so I don't need to remember which of 10+ skills to invoke.
+- As a maintainer, I want CI to skip E2E tests for low-risk changes (docs, config, test-only), so trivial PRs land faster without wasting runner time.
+- As a maintainer, I want the automated fix loop to be able to fix E2E failures (not just lint/type/unit), so fewer PRs get stuck at `needs-human:review-issue`.
+
+## UX Design
+
+### User Flow (Optimized)
+
+1. Developer writes code (with Claude Code assistance)
+2. Runs `/pr` — unified entry point detects uncommitted changes, runs lint/format/type-check/unit-tests locally, commits, pushes, creates/updates PR
+3. `/pr` kicks off a background PR watcher that polls CI status
+4. Developer context-switches to other work
+5. Terminal notification: "CI passed / CI failed on PR #123"
+6. If CI failed: auto-fix loop triggers automatically (no label needed), developer sees notification when fix is pushed and re-CI starts
+7. If all checks pass: developer adds "merge-when-ready" label → auto-merge
+8. If auto-fix loop fails after 4 iterations: `needs-human:review-issue` label applied, developer investigates
+
+### Key States
+
+- **Pushing**: `/pr` runs local validation and pushes. Terminal shows progress via task tracking.
+- **Waiting**: Background watcher polls. Developer sees no output unless something happens.
+- **CI Passed**: Terminal notification (bell + message). Developer returns to add merge label.
+- **CI Failed**: Terminal notification. Auto-fix loop kicks in automatically. Developer notified of each iteration.
+- **Fix Loop Exhausted**: `needs-human:review-issue` label. Terminal notification. Developer investigates.
+- **Merged**: After "merge-when-ready" label + CI pass, auto-merge fires. PR closed.
+
+### Interaction Details
+
+- `/pr` is the primary entry point. It inspects current state (uncommitted changes? open PR? failing checks? review comments?) and routes to the appropriate action.
+- Sub-commands available for power users: `/pr push`, `/pr fix`, `/pr watch`, `/pr rebase`
+- Background watcher uses `gh pr checks --watch` or polling `gh pr checks` every 30 seconds
+- Terminal notifications use bell character + colored status line
+
+### Accessibility
+
+- All feedback is text-based (terminal output), compatible with screen readers
+- Status notifications include text labels, not just colors
+- No custom UI components — leverages GitHub's accessible web interface for detailed status
+
+## Technical Design
+
+### Architecture
+
+All changes are to the CI/CD pipeline, GitHub Actions workflows, and Claude Code skills. No application code changes.
+
+### Components Affected
+
+| Component                                   | Change                                                                         | Complexity |
+| ------------------------------------------- | ------------------------------------------------------------------------------ | ---------- |
+| `.github/workflows/ci.yml`                  | Add persistent caching, merge build+E2E for self-hosted, tiered test selection | Medium     |
+| `.github/workflows/pr-review-responder.yml` | Remove label requirement for privileged authors, move to self-hosted runners   | Small      |
+| `scripts/ci-cleanup-macos.sh`               | Preserve caches (Playwright, pnpm store, nextjs-template, scaffold deps)       | Small      |
+| `playwright.config.ts`                      | Add test tagging/grouping for tiered selection                                 | Medium     |
+| `.claude/skills/pr/SKILL.md` (new)          | Unified `/pr` entry point that routes to sub-skills                            | Medium     |
+| `.claude/skills/pr-watch/SKILL.md` (new)    | Background CI status watcher with terminal notifications                       | Small      |
+| `.claude/skills/fast-push/SKILL.md`         | Optimize for speed — reduce unnecessary steps                                  | Small      |
+
+### Data Model Changes
+
+None. All improvements are to pipeline/tooling configuration.
+
+### API Changes
+
+None. Internal tooling only.
+
+## Implementation Plan
+
+### Phase 0: Instrument (prerequisite)
+
+- [ ] Add CI job timing annotations (start/end timestamps per job) to measure baseline
+- [ ] Track PR cycle time (first push to merge) — could be a simple GitHub Action that logs timestamps
+- [ ] Measure auto-fix loop iteration frequency (how often does `cc:request:2`, `cc:request:3`, `cc:request:4` get reached?)
+- [ ] Measure CI first-push pass rate for privileged author PRs
+- [ ] Document current `scripts/ci-cleanup-macos.sh` behavior — confirm what's already cached vs. cleaned
+
+### Phase 1: CI Pipeline Quick Wins
+
+- [ ] **Optimize self-hosted runner caching**: Preserve Playwright browser binaries, pnpm store, nextjs-template clone + deps, scaffold deps between runs. Update `scripts/ci-cleanup-macos.sh` to keep these cached (keyed on lockfile/dependency hashes). Note: `node_modules/` already persists (cleanup script doesn't remove it), so `npm ci` is already partially cached — focus caching effort on Playwright, pnpm store, and template deps.
+- [ ] **Skip redundant installs**: Check if Chromium is already installed before running `npx playwright install chromium`. Check if nextjs-template exists and is up-to-date before cloning.
+- [ ] **Merge build + E2E into single job for self-hosted runners**: Eliminate artifact tar/upload/download overhead and redundant setup steps (second checkout, second npm ci, second pnpm/scaffold/nextjs-template setup). Keep separate jobs for external contributors (they need cross-OS artifacts). Estimated savings: 3-5 min per CI run.
+- [ ] **Validate caching gains**: Compare CI timing before/after to confirm actual savings. Run a single diagnostic PR with `date` commands at each step boundary to establish baseline.
+
+### Phase 2: Auto-Fix Loop Enhancement
+
+- [ ] **Auto-trigger fix loop for privileged authors**: Modify `pr-review-responder.yml` to trigger for all privileged-author PRs, not just those with `cc:request` label. Auto-apply `cc:request` on PR creation for privileged authors, or remove the label check entirely.
+- [ ] **Move pr-review-responder to self-hosted macOS runners**: Currently runs on `ubuntu-latest` which cannot execute Electron Playwright tests. Moving to self-hosted enables the fix loop to actually fix E2E failures, not just lint/type/unit failures.
+- [ ] **Add skip-review for trivial fixes**: When the fix commit only changes lint/formatting or updates snapshots, automatically skip re-triggering the Claude PR Review. Use a heuristic: if diff is under N lines and only touches files mentioned in resolved review comments.
+- [ ] **Add notification comments**: When auto-fix triggers, post a brief PR comment ("CI failed on lint. Auto-fix triggered.") so the developer has context without reading Action logs.
+
+### Phase 3: Tiered Test Selection
+
+- [ ] **Implement change classification in `check-changes` job**: Classify changed files into tiers:
+  - **Tier 0** (docs/config/.claude/rules only): Skip CI entirely (already exists)
+  - **Tier 1** (test files only): Run lint + type-check + only the changed test files
+  - **Tier 2** (source code, no schema/config changes): Run lint + type-check + unit tests + targeted E2E
+  - **Tier 3** (broad changes, deps, config): Full CI suite
+- [ ] **Tag E2E tests by feature area**: Add metadata to Playwright tests indicating which source files/features they cover. Start with coarse groupings (chat, database, deployment, settings, etc.).
+- [ ] **Add `full-ci` escape label**: Allow forcing full CI via a label for cases where the tier classification seems wrong. Post a CI comment explaining which tier was selected and why.
+- [ ] **Maintain nightly full-suite coverage**: The existing deflake-e2e scheduled run provides a safety net for regressions missed by tiered selection.
+
+### Phase 4: Developer Experience
+
+- [ ] **Create unified `/pr` skill**: Single entry point that inspects state and routes:
+  - Uncommitted changes → push flow (lint, commit, push, create/update PR)
+  - Open PR with failing checks → fix flow (diagnose + fix CI failures)
+  - Open PR with unresolved review comments → comment fix flow
+  - Open PR with all checks passing → suggest adding "merge-when-ready" label
+  - Open PR needing rebase → rebase flow
+  - Fold `/fast-push` behavior into `/pr` as the default for deterministic operations (use haiku), with opus reserved for reasoning-heavy tasks (conflict resolution, complex review responses). Deprecate `/fast-push` as a separate entry point.
+- [ ] **Create `/pr-watch` skill**: Background watcher that polls `gh pr checks` and notifies on completion. Should surface specific failure type (lint, type-check, unit test, E2E) so developer knows whether to intervene or let auto-fix handle it. Could be as simple as `gh pr checks --watch` with a terminal bell on exit.
+- [ ] **Add pre-push snapshot warning**: If UI source files changed but E2E snapshot files didn't, warn that snapshots likely need updating before push. Lightweight heuristic to prevent the most common E2E failure cause (~30% of E2E failures).
+
+## Testing Strategy
+
+- [ ] **CI timing**: Compare wall-clock CI duration before/after each phase. Target: 30-50% reduction for typical PRs.
+- [ ] **Cache correctness**: Verify builds with cached dependencies produce identical outputs to clean builds. Run both in parallel for a verification period.
+- [ ] **Tier classification accuracy**: Run full CI alongside tiered CI for 2 weeks to measure false-negative rate (changes that should have triggered full CI but didn't).
+- [ ] **Auto-fix loop effectiveness**: Track how often the fix loop resolves issues automatically vs. escalating to human. Target: fix loop handles >80% of CI failures without human intervention.
+- [ ] **Quality regression**: Monitor for bugs/reverts within 24 hours of merge. Must not increase from current baseline.
+
+## Risks & Mitigations
+
+| Risk                                                            | Likelihood | Impact | Mitigation                                                                                                  |
+| --------------------------------------------------------------- | ---------- | ------ | ----------------------------------------------------------------------------------------------------------- |
+| Stale caches cause false CI passes on self-hosted runners       | Medium     | High   | Hash-based cache keys, periodic full cache clear, nightly clean build                                       |
+| Tiered test selection misses regressions                        | Medium     | High   | Conservative tier classification, `full-ci` escape label, nightly full suite, always full CI on `main` push |
+| Auto-fix loop on self-hosted runners consumes capacity          | Low        | Medium | 2+ runners available; add concurrency limits if needed                                                      |
+| Merged build+E2E job can't retry just E2E on flakes             | Medium     | Medium | Monitor flake rate; revert to split jobs if flake-retry savings > artifact-transfer savings                 |
+| Unified `/pr` skill makes wrong routing decision                | Low        | Low    | Show what action will be taken before executing; sub-commands for manual override                           |
+| Moving pr-review-responder to self-hosted increases runner load | Low        | Low    | 2+ runners available; responder runs are short-lived                                                        |
+
+## Open Questions
+
+1. **What is the actual CI time breakdown per job?** Phase 0 instrumentation will answer this. The answer determines whether Phase 1 (caching) or Phase 3 (test selection) delivers more value.
+2. **How often does the auto-fix loop iterate beyond 1 cycle?** If most PRs converge in 1 iteration, loop optimization matters less. If 3-4 iterations are common, optimizing the loop is critical path.
+3. **What is the E2E flake rate?** Determines whether merging build+E2E jobs (losing independent E2E retry) is a net win or net loss.
+4. **How should E2E tests be tagged/grouped?** Coarse groupings (by feature area) are easier to maintain but less precise. Fine-grained file-level dependency tracking is more accurate but harder to keep updated. Start coarse and refine.
+
+## Decision Log
+
+| Decision                 | Options Considered                                      | Chosen                      | Reasoning                                                                                                                                                      |
+| ------------------------ | ------------------------------------------------------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Auto-merge approach      | Default auto-merge, auto-merge on cc:done, keep current | **Keep current**            | User prefers the deliberate "merge-when-ready" checkpoint. Focus speed improvements elsewhere.                                                                 |
+| Target persona           | Maintainer only, both maintainer + contributors         | **Maintainer only**         | Focused scope delivers faster. External contributor improvements deferred.                                                                                     |
+| Fast push implementation | Bash script, optimize existing skill, both paths        | **Optimize existing skill** | Avoids maintaining two implementations. User prefers single path.                                                                                              |
+| Runner capacity          | Constrained (1 runner), unconstrained (2+)              | **2+ runners available**    | Runner capacity is not a constraint. Can add more concurrent workflows freely.                                                                                 |
+| Review blocking merge    | Non-blocking (post-merge review), blocking              | **Keep blocking**           | Team consensus: post-merge review comments have low compliance. Keeping review as blocking preserves quality gate. Speed up the review instead of removing it. |
+| Bash fast-push script    | Build it, don't build it                                | **Don't build**             | UX team raised valid concern about fragmentation. Optimize `/fast-push` skill instead.                                                                         |
+
+---
+
+_Generated by dyad:swarm-to-plan_