|
| 1 | +# Check CI Failures |
| 2 | + |
| 3 | +Analyze failing tests from PR CI runs with parallel subagent log analysis. |
| 4 | + |
| 5 | +## Usage |
| 6 | + |
| 7 | +``` |
| 8 | +/ci-failures [pr-number] |
| 9 | +``` |
| 10 | + |
| 11 | +If no PR number provided, detect from current branch. |
| 12 | + |
| 13 | +## Instructions |
| 14 | + |
| 15 | +1. Get the PR number from argument or current branch: |
| 16 | + |
| 17 | + ```bash |
| 18 | + gh pr view --json number,headRefName --jq '"\(.number) \(.headRefName)"' |
| 19 | + ``` |
| 20 | + |
| 21 | +2. **CRITICAL: Always fetch fresh run IDs** - never trust cached IDs from conversation summaries: |
| 22 | + |
| 23 | + ```bash |
| 24 | + gh api "repos/vercel/next.js/actions/runs?branch={branch}&per_page=10" \ |
| 25 | + --jq '.workflow_runs[] | select(.name == "build-and-test") | "\(.id) attempts:\(.run_attempt) status:\(.status) conclusion:\(.conclusion)"' |
| 26 | + ``` |
| 27 | + |
| 28 | +3. **Prioritize the MOST RECENT run, even if in-progress:** |
| 29 | + - If the latest run is `in_progress` or `queued`, check it FIRST - it has the most relevant failures |
| 30 | + - Individual jobs complete before the overall run - analyze them as they finish |
| 31 | + - Only fall back to older completed runs if the current run has no completed jobs yet |
| 32 | + |
| 33 | +4. Get all failed jobs from the run (works for in-progress runs too): |
| 34 | + |
| 35 | + ```bash |
| 36 | + gh api "repos/vercel/next.js/actions/runs/{run_id}/jobs?per_page=100" \ |
| 37 | + --jq '.jobs[] | select(.conclusion == "failure") | "\(.id) \(.name)"' |
| 38 | + ``` |
| 39 | + |
| 40 | + **Note:** For runs with >100 jobs, paginate: |
| 41 | + |
| 42 | + ```bash |
| 43 | + gh api "repos/vercel/next.js/actions/runs/{run_id}/jobs?per_page=100&page=2" |
| 44 | + ``` |
| 45 | + |
| 46 | +5. Spawn parallel haiku subagents to analyze logs (limit to 3-4 to avoid rate limits): |
| 47 | + - **CRITICAL: Use the API endpoint for logs, NOT `gh run view`** |
| 48 | + - `gh run view --job --log` FAILS when run is in-progress |
| 49 | + - **Do NOT group by job name** (e.g., "test dev", "turbopack") - group by failure pattern instead |
| 50 | + - Agent prompt should extract structured data using: |
| 51 | + ```bash |
| 52 | + # Extract assertion failures with context: |
| 53 | + gh api "repos/vercel/next.js/actions/jobs/{job_id}/logs" 2>&1 | \ |
| 54 | + grep -B3 -A10 "expect.*\(toBe\|toContain\|toEqual\|toStartWith\|toMatch\)" | head -100 |
| 55 | + # Also check for test file paths: |
| 56 | + gh api "repos/vercel/next.js/actions/jobs/{job_id}/logs" 2>&1 | \ |
| 57 | + grep -E "^\s+at Object\.|FAIL\s+test/" | head -20 |
| 58 | + ``` |
| 59 | + - **Agent prompt template** (copy-paste for each agent): |
| 60 | + ``` |
| 61 | + Analyze CI logs for these jobs: {job_ids} |
| 62 | + For each failing test, extract: |
| 63 | + 1. TEST FILE: (full path, e.g., test/production/required-server-files-ssr-404/test/index.test.ts) |
| 64 | + 2. TEST NAME: (the specific test case name) |
| 65 | + 3. EXPECTED: (exact expected value from assertion) |
| 66 | + 4. RECEIVED: (exact received value from assertion) |
| 67 | + 5. CATEGORY: (assertion|timeout|routing|source-map|build|cli-output) |
| 68 | + 6. ROOT CAUSE: (one sentence hypothesis) |
| 69 | + Return structured findings grouped by TEST FILE, not by job. |
| 70 | + ``` |
| 71 | +
|
| 72 | +6. **Deduplicate by test file** before summarizing: |
| 73 | + - Group all failures by TEST FILE path, not by CI job name |
| 74 | + - If multiple jobs fail the same test file, count them but report once |
| 75 | + - Identify systemic issues (same test failing across many jobs) |
| 76 | +
|
| 77 | +7. Create summary table **grouped by test file**: |
| 78 | + | Test File | Issue (Expected vs Received) | Jobs | Priority | |
| 79 | + |-----------|------------------------------|------|----------| |
| 80 | + | `test/production/required-server-files-ssr-404/...` | `"second"` vs `"[slug]"` (routing) | 3 | HIGH | |
| 81 | + | `test/integration/server-side-dev-errors/...` | source map paths wrong | 5 | HIGH | |
| 82 | + | `test/e2e/app-dir/disable-logging-route/...` | "Compiling" appearing when disabled | 2 | MEDIUM | |
| 83 | +
|
| 84 | +8. Recommend fixes: |
| 85 | + - **HIGH priority**: Show specific expected vs actual values, include test file path |
| 86 | + - **MEDIUM priority**: Identify root cause pattern |
| 87 | + - **LOW priority**: Mark as likely flaky/transient |
| 88 | +
|
| 89 | +## Failure Categories |
| 90 | +
|
| 91 | +- **Infrastructure/Transient**: Network errors, 503s, timeouts unrelated to code |
| 92 | +- **Assertion Failures**: Wrong output, path mismatches, snapshot differences |
| 93 | +- **Build Failures**: Compilation errors, missing dependencies |
| 94 | +- **Timeout**: Tests hanging, usually indicates async issues or missing server responses |
| 95 | +- **Port Binding**: EADDRINUSE errors, parallel test conflicts |
| 96 | +- **Routing/SSR**: Dynamic params not resolved, wrong status codes, JSON parse errors |
| 97 | +- **Source Maps**: `webpack-internal://` paths, wrong line numbers, missing code frames |
| 98 | +- **CLI Output**: Missing warnings, wrong log order, "Ready" printed before errors |
| 99 | +
|
| 100 | +## Failure Extraction Patterns |
| 101 | +
|
| 102 | +Use these grep patterns to identify specific failure types: |
| 103 | +
|
| 104 | +```bash |
| 105 | +# Assertion failures (most common) |
| 106 | +grep -B3 -A10 "expect.*\(toBe\|toContain\|toEqual\|toStartWith\)" | head -100 |
| 107 | + |
| 108 | +# Routing issues (dynamic params, status codes) |
| 109 | +grep -E "Expected.*Received|\[slug\]|x-matched-path|Expected: [0-9]+" | head -50 |
| 110 | + |
| 111 | +# Source map issues |
| 112 | +grep -E "webpack-internal://|at .* \(webpack" | head -30 |
| 113 | + |
| 114 | +# CLI output issues (missing warnings) |
| 115 | +grep -E "Ready in|deprecated|Both middleware|Compiling" | head -30 |
| 116 | + |
| 117 | +# Timeout issues |
| 118 | +grep -E "TIMEOUT|TimeoutError|exceeded|Exceeded timeout" | head -20 |
| 119 | + |
| 120 | +# Test file paths (to identify which test is failing) |
| 121 | +grep -E "FAIL test/|at Object\.<anonymous> \(" | head -20 |
| 122 | +``` |
| 123 | +
|
| 124 | +## Common Gotchas |
| 125 | +
|
| 126 | +### In-Progress Runs |
| 127 | +
|
| 128 | +- `gh run view {run_id} --job {job_id} --log` **FAILS** when run is in-progress |
| 129 | +- `gh api "repos/.../actions/jobs/{job_id}/logs"` **WORKS** for any completed job |
| 130 | +- Always use the API endpoint for reliability |
| 131 | +
|
| 132 | +### Pagination |
| 133 | +
|
| 134 | +- GitHub API paginates at 100 jobs per page |
| 135 | +- Next.js CI has ~120+ jobs - always check page 2: |
| 136 | + ```bash |
| 137 | + gh api ".../jobs?per_page=100&page=1" --jq '[.jobs[] | select(.conclusion == "failure")] | length' |
| 138 | + gh api ".../jobs?per_page=100&page=2" --jq '[.jobs[] | select(.conclusion == "failure")] | length' |
| 139 | + ``` |
| 140 | +
|
| 141 | +### Multiple Attempts |
| 142 | +
|
| 143 | +- CI runs can have multiple attempts (retries) |
| 144 | +- Check attempt count: `.run_attempt` field |
| 145 | +- Query specific attempt: `.../runs/{id}/attempts/{n}/jobs` |
| 146 | +- 404 on attempt endpoint means that attempt doesn't exist |
| 147 | +
|
| 148 | +## Quick Reference |
| 149 | +
|
| 150 | +```bash |
| 151 | +# Get failed jobs (works for in-progress runs) |
| 152 | +gh api "repos/vercel/next.js/actions/runs/{run_id}/jobs?per_page=100" \ |
| 153 | + --jq '.jobs[] | select(.conclusion == "failure") | "\(.id) \(.name)"' |
| 154 | +
|
| 155 | +# Get logs for a specific job (works for in-progress runs) |
| 156 | +gh api "repos/vercel/next.js/actions/jobs/{job_id}/logs" 2>&1 | head -500 |
| 157 | +
|
| 158 | +# Search logs for errors |
| 159 | +gh api "repos/vercel/next.js/actions/jobs/{job_id}/logs" 2>&1 | \ |
| 160 | + grep -E "FAIL|Error|error:|✕|Expected|Received" | head -50 |
| 161 | +``` |
0 commit comments