Skip to content

Commit a0bc7c7

Browse files
authored
docs: add faster PR workflows planning document (#2739)
## Summary Add a comprehensive planning document outlining an initiative to improve the speed of landing PRs for maintainers by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The plan covers: - CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation) - Auto-trigger fix loop for privileged author PRs (remove label requirement) - Tiered test selection based on changed files - PR status watcher for terminal notifications - Unified `/pr` skill entry point - Move pr-review-responder to self-hosted runners The plan is structured in four implementation phases: instrumentation, CI quick wins, auto-fix loop enhancements, and developer experience improvements. It includes risk assessments, testing strategies, and decision logs. ## Test plan - Verify the document is correctly formatted and renders as markdown - Review the content for clarity and alignment with project goals - Confirm all phases, user stories, and technical components are clearly described 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- devin-review-badge-begin --> --- <a href="https://app.devin.ai/review/dyad-sh/dyad/pull/2739" target="_blank"> <picture> <source media="(prefers-color-scheme: dark)" srcset="https://static.devin.ai/assets/gh-open-in-devin-review-dark.svg?v=1"> <img src="https://static.devin.ai/assets/gh-open-in-devin-review-light.svg?v=1" alt="Open with Devin"> </picture> </a> <!-- devin-review-badge-end -->
1 parent 465cc90 commit a0bc7c7

File tree

2 files changed

+190
-1
lines changed

2 files changed

+190
-1
lines changed

.github/workflows/flakiness-upload.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,3 @@ jobs:
5151
FLAKINESS_ACCESS_TOKEN: ${{ secrets.FLAKINESS_ACCESS_TOKEN }}
5252
run: |
5353
find . -name report.json -exec npx -y flakiness upload {} +
54-

plans/faster-pr-workflows.md

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Faster PR Workflows Without Sacrificing Quality
2+
3+
> Generated by swarm planning session on 2026-02-15
4+
5+
## Summary
6+
7+
Improve the speed of landing PRs for the maintainer (wwwillchen/wwwillchen-bot) by optimizing the CI pipeline, automating the fix-and-retry loop, and improving developer feedback during wait times. The current workflow is already highly automated — this plan targets the remaining friction: CI wall-clock time, human-in-the-loop steps, and the "waiting black hole" between push and merge.
8+
9+
## Problem Statement
10+
11+
The current PR landing flow involves a multi-cycle loop: push → wait for CI (~10-15 min) → wait for AI review (~10-30 min) → address comments → re-push → wait again. Each iteration adds latency, and for 3-4 PRs per day, this compounds into 1-2 hours of waiting plus context-switching costs. The quality gates (multi-reviewer AI review, comprehensive E2E testing, lint/type checks) are working well — the problem is **throughput**, not quality.
12+
13+
## Scope
14+
15+
### In Scope (MVP)
16+
17+
- CI pipeline optimizations for self-hosted macOS ARM64 runners (caching, job consolidation)
18+
- Auto-trigger fix loop for privileged author PRs (remove label requirement)
19+
- Tiered test selection based on changed files
20+
- PR status watcher for terminal notifications
21+
- Unified `/pr` skill entry point
22+
- Move pr-review-responder to self-hosted runners (enable E2E fix capability)
23+
24+
### Out of Scope (Follow-up)
25+
26+
- Auto-merge changes (keep manual "merge-when-ready" label)
27+
- External contributor experience improvements
28+
- Bash script alternative to `/fast-push` (optimize existing skill instead)
29+
- Dry-run/preview mode for skills
30+
- Incremental Electron builds
31+
- Notification channels beyond terminal (desktop, Slack)
32+
- Multi-PR-in-flight status dashboard
33+
34+
## User Stories
35+
36+
- As a maintainer, I want CI to complete faster on my self-hosted runners, so I spend less time waiting between push and feedback.
37+
- As a maintainer, I want CI failures to be automatically diagnosed and fixed without me triggering a skill, so the fix-push-verify loop runs hands-free.
38+
- As a maintainer, I want to be notified in my terminal when CI completes (pass or fail), so I can confidently context-switch to other work during CI runs.
39+
- As a maintainer, I want a single `/pr` command that detects what my PR needs and does the right thing, so I don't need to remember which of 10+ skills to invoke.
40+
- As a maintainer, I want CI to skip E2E tests for low-risk changes (docs, config, test-only), so trivial PRs land faster without wasting runner time.
41+
- As a maintainer, I want the automated fix loop to be able to fix E2E failures (not just lint/type/unit), so fewer PRs get stuck at `needs-human:review-issue`.
42+
43+
## UX Design
44+
45+
### User Flow (Optimized)
46+
47+
1. Developer writes code (with Claude Code assistance)
48+
2. Runs `/pr` — unified entry point detects uncommitted changes, runs lint/format/type-check/unit-tests locally, commits, pushes, creates/updates PR
49+
3. `/pr` kicks off a background PR watcher that polls CI status
50+
4. Developer context-switches to other work
51+
5. Terminal notification: "CI passed / CI failed on PR #123"
52+
6. If CI failed: auto-fix loop triggers automatically (no label needed), developer sees notification when fix is pushed and re-CI starts
53+
7. If all checks pass: developer adds "merge-when-ready" label → auto-merge
54+
8. If auto-fix loop fails after 4 iterations: `needs-human:review-issue` label applied, developer investigates
55+
56+
### Key States
57+
58+
- **Pushing**: `/pr` runs local validation and pushes. Terminal shows progress via task tracking.
59+
- **Waiting**: Background watcher polls. Developer sees no output unless something happens.
60+
- **CI Passed**: Terminal notification (bell + message). Developer returns to add merge label.
61+
- **CI Failed**: Terminal notification. Auto-fix loop kicks in automatically. Developer notified of each iteration.
62+
- **Fix Loop Exhausted**: `needs-human:review-issue` label. Terminal notification. Developer investigates.
63+
- **Merged**: After "merge-when-ready" label + CI pass, auto-merge fires. PR closed.
64+
65+
### Interaction Details
66+
67+
- `/pr` is the primary entry point. It inspects current state (uncommitted changes? open PR? failing checks? review comments?) and routes to the appropriate action.
68+
- Sub-commands available for power users: `/pr push`, `/pr fix`, `/pr watch`, `/pr rebase`
69+
- Background watcher uses `gh pr checks --watch` or polling `gh pr checks` every 30 seconds
70+
- Terminal notifications use bell character + colored status line
71+
72+
### Accessibility
73+
74+
- All feedback is text-based (terminal output), compatible with screen readers
75+
- Status notifications include text labels, not just colors
76+
- No custom UI components — leverages GitHub's accessible web interface for detailed status
77+
78+
## Technical Design
79+
80+
### Architecture
81+
82+
All changes are to the CI/CD pipeline, GitHub Actions workflows, and Claude Code skills. No application code changes.
83+
84+
### Components Affected
85+
86+
| Component | Change | Complexity |
87+
| ------------------------------------------- | ------------------------------------------------------------------------------ | ---------- |
88+
| `.github/workflows/ci.yml` | Add persistent caching, merge build+E2E for self-hosted, tiered test selection | Medium |
89+
| `.github/workflows/pr-review-responder.yml` | Remove label requirement for privileged authors, move to self-hosted runners | Small |
90+
| `scripts/ci-cleanup-macos.sh` | Preserve caches (Playwright, pnpm store, nextjs-template, scaffold deps) | Small |
91+
| `playwright.config.ts` | Add test tagging/grouping for tiered selection | Medium |
92+
| `.claude/skills/pr/SKILL.md` (new) | Unified `/pr` entry point that routes to sub-skills | Medium |
93+
| `.claude/skills/pr-watch/SKILL.md` (new) | Background CI status watcher with terminal notifications | Small |
94+
| `.claude/skills/fast-push/SKILL.md` | Optimize for speed — reduce unnecessary steps | Small |
95+
96+
### Data Model Changes
97+
98+
None. All improvements are to pipeline/tooling configuration.
99+
100+
### API Changes
101+
102+
None. Internal tooling only.
103+
104+
## Implementation Plan
105+
106+
### Phase 0: Instrument (prerequisite)
107+
108+
- [ ] Add CI job timing annotations (start/end timestamps per job) to measure baseline
109+
- [ ] Track PR cycle time (first push to merge) — could be a simple GitHub Action that logs timestamps
110+
- [ ] Measure auto-fix loop iteration frequency (how often does `cc:request:2`, `cc:request:3`, `cc:request:4` get reached?)
111+
- [ ] Measure CI first-push pass rate for privileged author PRs
112+
- [ ] Document current `scripts/ci-cleanup-macos.sh` behavior — confirm what's already cached vs. cleaned
113+
114+
### Phase 1: CI Pipeline Quick Wins
115+
116+
- [ ] **Optimize self-hosted runner caching**: Preserve Playwright browser binaries, pnpm store, nextjs-template clone + deps, scaffold deps between runs. Update `scripts/ci-cleanup-macos.sh` to keep these cached (keyed on lockfile/dependency hashes). Note: `node_modules/` already persists (cleanup script doesn't remove it), so `npm ci` is already partially cached — focus caching effort on Playwright, pnpm store, and template deps.
117+
- [ ] **Skip redundant installs**: Check if Chromium is already installed before running `npx playwright install chromium`. Check if nextjs-template exists and is up-to-date before cloning.
118+
- [ ] **Merge build + E2E into single job for self-hosted runners**: Eliminate artifact tar/upload/download overhead and redundant setup steps (second checkout, second npm ci, second pnpm/scaffold/nextjs-template setup). Keep separate jobs for external contributors (they need cross-OS artifacts). Estimated savings: 3-5 min per CI run.
119+
- [ ] **Validate caching gains**: Compare CI timing before/after to confirm actual savings. Run a single diagnostic PR with `date` commands at each step boundary to establish baseline.
120+
121+
### Phase 2: Auto-Fix Loop Enhancement
122+
123+
- [ ] **Auto-trigger fix loop for privileged authors**: Modify `pr-review-responder.yml` to trigger for all privileged-author PRs, not just those with `cc:request` label. Auto-apply `cc:request` on PR creation for privileged authors, or remove the label check entirely.
124+
- [ ] **Move pr-review-responder to self-hosted macOS runners**: Currently runs on `ubuntu-latest` which cannot execute Electron Playwright tests. Moving to self-hosted enables the fix loop to actually fix E2E failures, not just lint/type/unit failures.
125+
- [ ] **Add skip-review for trivial fixes**: When the fix commit only changes lint/formatting or updates snapshots, automatically skip re-triggering the Claude PR Review. Use a heuristic: if diff is under N lines and only touches files mentioned in resolved review comments.
126+
- [ ] **Add notification comments**: When auto-fix triggers, post a brief PR comment ("CI failed on lint. Auto-fix triggered.") so the developer has context without reading Action logs.
127+
128+
### Phase 3: Tiered Test Selection
129+
130+
- [ ] **Implement change classification in `check-changes` job**: Classify changed files into tiers:
131+
- **Tier 0** (docs/config/.claude/rules only): Skip CI entirely (already exists)
132+
- **Tier 1** (test files only): Run lint + type-check + only the changed test files
133+
- **Tier 2** (source code, no schema/config changes): Run lint + type-check + unit tests + targeted E2E
134+
- **Tier 3** (broad changes, deps, config): Full CI suite
135+
- [ ] **Tag E2E tests by feature area**: Add metadata to Playwright tests indicating which source files/features they cover. Start with coarse groupings (chat, database, deployment, settings, etc.).
136+
- [ ] **Add `full-ci` escape label**: Allow forcing full CI via a label for cases where the tier classification seems wrong. Post a CI comment explaining which tier was selected and why.
137+
- [ ] **Maintain nightly full-suite coverage**: The existing deflake-e2e scheduled run provides a safety net for regressions missed by tiered selection.
138+
139+
### Phase 4: Developer Experience
140+
141+
- [ ] **Create unified `/pr` skill**: Single entry point that inspects state and routes:
142+
- Uncommitted changes → push flow (lint, commit, push, create/update PR)
143+
- Open PR with failing checks → fix flow (diagnose + fix CI failures)
144+
- Open PR with unresolved review comments → comment fix flow
145+
- Open PR with all checks passing → suggest adding "merge-when-ready" label
146+
- Open PR needing rebase → rebase flow
147+
- Fold `/fast-push` behavior into `/pr` as the default for deterministic operations (use haiku), with opus reserved for reasoning-heavy tasks (conflict resolution, complex review responses). Deprecate `/fast-push` as a separate entry point.
148+
- [ ] **Create `/pr-watch` skill**: Background watcher that polls `gh pr checks` and notifies on completion. Should surface specific failure type (lint, type-check, unit test, E2E) so developer knows whether to intervene or let auto-fix handle it. Could be as simple as `gh pr checks --watch` with a terminal bell on exit.
149+
- [ ] **Add pre-push snapshot warning**: If UI source files changed but E2E snapshot files didn't, warn that snapshots likely need updating before push. Lightweight heuristic to prevent the most common E2E failure cause (~30% of E2E failures).
150+
151+
## Testing Strategy
152+
153+
- [ ] **CI timing**: Compare wall-clock CI duration before/after each phase. Target: 30-50% reduction for typical PRs.
154+
- [ ] **Cache correctness**: Verify builds with cached dependencies produce identical outputs to clean builds. Run both in parallel for a verification period.
155+
- [ ] **Tier classification accuracy**: Run full CI alongside tiered CI for 2 weeks to measure false-negative rate (changes that should have triggered full CI but didn't).
156+
- [ ] **Auto-fix loop effectiveness**: Track how often the fix loop resolves issues automatically vs. escalating to human. Target: fix loop handles >80% of CI failures without human intervention.
157+
- [ ] **Quality regression**: Monitor for bugs/reverts within 24 hours of merge. Must not increase from current baseline.
158+
159+
## Risks & Mitigations
160+
161+
| Risk | Likelihood | Impact | Mitigation |
162+
| --------------------------------------------------------------- | ---------- | ------ | ----------------------------------------------------------------------------------------------------------- |
163+
| Stale caches cause false CI passes on self-hosted runners | Medium | High | Hash-based cache keys, periodic full cache clear, nightly clean build |
164+
| Tiered test selection misses regressions | Medium | High | Conservative tier classification, `full-ci` escape label, nightly full suite, always full CI on `main` push |
165+
| Auto-fix loop on self-hosted runners consumes capacity | Low | Medium | 2+ runners available; add concurrency limits if needed |
166+
| Merged build+E2E job can't retry just E2E on flakes | Medium | Medium | Monitor flake rate; revert to split jobs if flake-retry savings > artifact-transfer savings |
167+
| Unified `/pr` skill makes wrong routing decision | Low | Low | Show what action will be taken before executing; sub-commands for manual override |
168+
| Moving pr-review-responder to self-hosted increases runner load | Low | Low | 2+ runners available; responder runs are short-lived |
169+
170+
## Open Questions
171+
172+
1. **What is the actual CI time breakdown per job?** Phase 0 instrumentation will answer this. The answer determines whether Phase 1 (caching) or Phase 3 (test selection) delivers more value.
173+
2. **How often does the auto-fix loop iterate beyond 1 cycle?** If most PRs converge in 1 iteration, loop optimization matters less. If 3-4 iterations are common, optimizing the loop is critical path.
174+
3. **What is the E2E flake rate?** Determines whether merging build+E2E jobs (losing independent E2E retry) is a net win or net loss.
175+
4. **How should E2E tests be tagged/grouped?** Coarse groupings (by feature area) are easier to maintain but less precise. Fine-grained file-level dependency tracking is more accurate but harder to keep updated. Start coarse and refine.
176+
177+
## Decision Log
178+
179+
| Decision | Options Considered | Chosen | Reasoning |
180+
| ------------------------ | ------------------------------------------------------- | --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
181+
| Auto-merge approach | Default auto-merge, auto-merge on cc:done, keep current | **Keep current** | User prefers the deliberate "merge-when-ready" checkpoint. Focus speed improvements elsewhere. |
182+
| Target persona | Maintainer only, both maintainer + contributors | **Maintainer only** | Focused scope delivers faster. External contributor improvements deferred. |
183+
| Fast push implementation | Bash script, optimize existing skill, both paths | **Optimize existing skill** | Avoids maintaining two implementations. User prefers single path. |
184+
| Runner capacity | Constrained (1 runner), unconstrained (2+) | **2+ runners available** | Runner capacity is not a constraint. Can add more concurrent workflows freely. |
185+
| Review blocking merge | Non-blocking (post-merge review), blocking | **Keep blocking** | Team consensus: post-merge review comments have low compliance. Keeping review as blocking preserves quality gate. Speed up the review instead of removing it. |
186+
| Bash fast-push script | Build it, don't build it | **Don't build** | UX team raised valid concern about fragmentation. Optimize `/fast-push` skill instead. |
187+
188+
---
189+
190+
_Generated by dyad:swarm-to-plan_

0 commit comments

Comments
 (0)