chore: CI debugging & PR fixing tooling for Claude #19906

spalladino · 2026-01-23T18:58:45Z

Summary

Reorganizes Claude Code's CI/PR tooling around two primary workflows:

Autonomous workflows (/fix-pr, /rebase-pr) - "Fix this PR, don't bother me"
Interactive debugging (/debug-e2e) - "Debug this flaky test with me"

The key architectural change is that all log reading happens in subagents, never inline. This keeps conversation context clean since CI logs can be 50k+ lines.

How to Use These Tools

`/fix-pr <PR number>`

Use case: CI is failing on your PR and you want Claude to autonomously fix it.

/fix-pr 19882

Claude will:

Identify what failed (build, lint, format, tests)
Checkout and rebase your PR on the base branch
Apply fixes based on failure type
Run quality checks
Amend and push

Best for: Format errors, lint issues, obvious build errors, simple test fixes.
For complex e2e failures, it will suggest using /debug-e2e instead.

`/rebase-pr <PR number>`

Use case: Your PR just needs a rebase and you want Claude to handle conflicts.

/rebase-pr 19882

Claude will:

Checkout the PR
Rebase on the base branch
Resolve conflicts
Run bootstrap if needed
Verify build
Push

`/debug-e2e <hash|PR|URL|test name>`

Use case: Flaky e2e test that needs investigation, or you want to debug interactively.

/debug-e2e 343c52b17688d2cd      # CI log hash
/debug-e2e 19783                  # PR number
/debug-e2e epochs_l1_reorgs       # Test name for general investigation

This starts an interactive debugging session where Claude:

Identifies failures via subagent (keeps logs out of main context)
Discusses findings with you
Spawns log analysis subagents to test hypotheses
Compares failing vs successful runs
Searches codebase for root cause
Suggests fixes

The key difference from /fix-pr: this is collaborative ping-pong debugging, not autonomous fixing.

Architecture

┌────────────────────────────────────────────────────────┐
│               User-Invocable Skills                    │
├─────────────┬─────────────┬────────────────────────────┤
│  /fix-pr    │ /rebase-pr  │       /debug-e2e           │
│ (autonomous)│ (autonomous)│   (interactive session)    │
└──────┬──────┴──────┬──────┴──────────┬─────────────────┘
       │             │                  │
       ▼             │                  ▼
┌─────────────────┐  │     ┌────────────────────────────┐
│ identify-ci-    │──┘     │    analyze-logs            │
│ failures        │        │    (subagent)              │
│ (subagent)      │        │                            │
└─────────────────┘        │ Deep-reads logs, returns   │
       │                   │ condensed summaries        │
       ▼                   └────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│    Logs downloaded once to /tmp/<hash>.log             │
│    Reused via local paths by all agents                │
└────────────────────────────────────────────────────────┘

New Components

Subagents (`yarn-project/.claude/agents/`)

identify-ci-failures: Finds all failures from PR/URL/hash, downloads logs to /tmp/, returns structured list
analyze-logs: Deep-reads specific logs, understands Aztec log format, returns condensed summaries
retrospective: Session analysis for self-improvement (unrelated to CI, but added in this PR)

Skills (`yarn-project/.claude/skills/`)

debug-e2e: Interactive debugging orchestrator
fix-pr: Autonomous PR fixer
rebase-pr: Simple rebase workflow

Scripts

extract-test-logs.sh: Extracts logs for a specific test from log file

CLAUDE.md Updates

Added "Native Tools Over Bash" section
Updated build command to yarn build
Updated test commands to use yarn workspace pattern
Added CI Log Access section
Simplified quality checklist

Removed

.claude/commands/fix-pr.md (replaced by skill)
.claude/commands/rebase-pr.md (replaced by skill)
.claude/skills/ci-log-reader/SKILL.md (absorbed into subagents)
.claude/agents/e2e-test-debugger.md (replaced by debug-e2e skill)

Test plan

Run /fix-pr on a PR with format errors
Run /rebase-pr on a PR that needs rebasing
Run /debug-e2e on a failing e2e test hash

🤖 Generated with Claude Code

Reorganizes Claude Code's CI/PR tooling around two primary workflows: **1. Autonomous Workflows** - "Fix/rebase this PR, don't bother me" - `/fix-pr <PR>` - Identifies CI failures, rebases, fixes issues, pushes - `/rebase-pr <PR>` - Simple rebase with conflict resolution and build verification **2. Interactive Debugging** - "Debug this flaky test with me" - `/debug-e2e <hash|PR|URL>` - Ping-pong debugging session for e2e failures The key insight is that CI logs can be 50k+ lines and pollute context. All log reading now happens in Task subagents, never inline: ``` ┌────────────────────────────────────────────────────────┐ │ User-Invocable Skills │ ├─────────────┬─────────────┬────────────────────────────┤ │ /fix-pr │ /rebase-pr │ /debug-e2e │ │ (autonomous)│ (autonomous)│ (interactive session) │ └──────┬──────┴──────┬──────┴──────────┬─────────────────┘ │ │ │ ▼ │ ▼ ┌─────────────────┐ │ ┌────────────────────────────┐ │ identify-ci- │──┘ │ analyze-logs │ │ failures │ │ (subagent) │ │ (subagent) │ │ │ └─────────────────┘ │ Deep-reads logs, returns │ │ │ condensed summaries │ ▼ └────────────────────────────┘ ┌────────────────────────────────────────────────────────┐ │ Logs downloaded once to /tmp/<hash>.log │ │ Reused via local paths by all agents │ └────────────────────────────────────────────────────────┘ ``` - **identify-ci-failures** - Given PR/URL/hash, finds all failures, downloads logs to /tmp/, returns structured failure list with local paths - **analyze-logs** - Deep-reads specific logs, extracts relevant sections, returns summaries (never raw content). Understands Aztec log format, test markers, multi- actor patterns - **retrospective** - Session analysis for self-improvement (captures corrections, failed approaches, learned patterns) - **debug-e2e** - Interactive debugging orchestrator. Spawns subagents for log analysis, works with user to form/test hypotheses, suggests fixes - **fix-pr** - Autonomous PR fixer. Identifies failures via subagent, rebases, applies fixes based on failure type, pushes - **rebase-pr** - Simple rebase workflow with conflict resolution - **extract-test-logs.sh** - Extracts logs for a specific test from a log file (used by analyze-logs to isolate relevant sections) - Added "Native Tools Over Bash" section (prefer Glob/Read/Grep/Edit over bash) - Updated build command to `yarn build` (not tsgo) - Updated test commands to use `yarn workspace` (don't cd into packages) - Added CI Log Access section with `./ci.sh dlog` instructions - Updated quality checklist with correct commands - `.claude/commands/fix-pr.md` (replaced by skill) - `.claude/commands/rebase-pr.md` (replaced by skill) - `.claude/skills/ci-log-reader/SKILL.md` (absorbed into subagents) - `.claude/agents/e2e-test-debugger.md` (replaced by debug-e2e skill) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ludamad · 2026-01-23T20:12:07Z

yarn-project/.claude/agents/retrospective.md

+
+  <example>
+  user: "Let's do a retrospective on this session"
+  assistant: "I'll analyze our conversation to extract learnings."


Does this one make sense as an agent? An agent can't run at the end of a session can it? I thought they always have somewhat fresh context. I would think this is a skill, something like reporting-completion.md or else

I wanted it to be able to run in a separate context window and have it re-read the conversation transcript. Unfortunately I didn't get to test this one before running out of credits.

ludamad · 2026-01-23T20:12:44Z

yarn-project/.claude/skills/fix-pr/SKILL.md

+
+## Usage
+
+```


/fix-pr is no longer usable, right?

fwiw i've been using this by just telling claude inline to use the 'fix-pr command in yarn-project', so I wouldn't miss it

It is. Commands and skills have been merged, and you can call a skill as if it were a command.

Ah yeah the claude changelog is mandatory daily reading isnt it.
Added automatic discovery of skills from nested .claude/skills directories when working with files in subdirectories that addresses the underlying reason why I was curious about things being automatically picked up. neat

For the curious, 2 weeks ago! https://github.com/anthropics/claude-code/blame/main/CHANGELOG.md
The development schedules for the presentation layer of AI are nuts

ludamad · 2026-01-23T20:14:06Z

yarn-project/CLAUDE.md

+## Native Tools Over Bash
+
+Prefer native tools over bash equivalents—they don't require permissions and provide better output:
+


Not sure I've found they make better output

I've found that claude messes up less with its native tools, but if this is contentious I'm fine to remove it. I've also added native tools to the permissions, so they should be less annoying now.

ludamad

LGTM, it's an imperfect science but seems like a good set of changes

spalladino · 2026-01-23T20:15:39Z

it's an imperfect science

Just think how fun it would be to have a set of end to end tests to test the claude skills we define

ludamad · 2026-01-23T20:18:15Z

Oh believe me when pitched 'AI for CI' I already had that full thought experiment and how fun being a prompt maintainer would/wouldn't be

AztecBot · 2026-01-23T20:45:49Z

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/1aeb93997d8a8df3�1aeb93997d8a8df38;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/valid_epoch_pruned_slash.test.ts (397s) (code: 1) group:e2e-p2p-epoch-flakes (\033Santiago Palladino\033: chore: CI debugging & PR fixing tooling for Claude (#19906))

spalladino changed the title ~~chore: consolidate CI debugging & PR fixing tooling for Claude~~ chore: CI debugging & PR fixing tooling for Claude Jan 23, 2026

spalladino force-pushed the palla/claude-ci-tooling-consolidation branch from ae48857 to 431fe8f Compare January 23, 2026 19:22

ludamad reviewed Jan 23, 2026

View reviewed changes

ludamad approved these changes Jan 23, 2026

View reviewed changes

spalladino added this pull request to the merge queue Jan 23, 2026

Merged via the queue into next with commit d44f1da Jan 23, 2026
16 checks passed

spalladino deleted the palla/claude-ci-tooling-consolidation branch January 23, 2026 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: CI debugging & PR fixing tooling for Claude #19906

chore: CI debugging & PR fixing tooling for Claude #19906

spalladino commented Jan 23, 2026

Uh oh!

ludamad Jan 23, 2026

Uh oh!

spalladino Jan 23, 2026

Uh oh!

ludamad Jan 23, 2026

Uh oh!

ludamad Jan 23, 2026

Uh oh!

spalladino Jan 23, 2026

Uh oh!

ludamad Jan 23, 2026 •

edited

Loading

Uh oh!

ludamad Jan 23, 2026

Uh oh!

spalladino Jan 23, 2026

Uh oh!

ludamad left a comment

Uh oh!

spalladino commented Jan 23, 2026

Uh oh!

ludamad commented Jan 23, 2026

Uh oh!

AztecBot commented Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		## Native Tools Over Bash

		Prefer native tools over bash equivalents—they don't require permissions and provide better output:

chore: CI debugging & PR fixing tooling for Claude #19906

chore: CI debugging & PR fixing tooling for Claude #19906

Conversation

spalladino commented Jan 23, 2026

Summary

How to Use These Tools

/fix-pr <PR number>

/rebase-pr <PR number>

/debug-e2e <hash|PR|URL|test name>

Architecture

New Components

Subagents (yarn-project/.claude/agents/)

Skills (yarn-project/.claude/skills/)

Scripts

CLAUDE.md Updates

Removed

Test plan

Uh oh!

ludamad Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

spalladino Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ludamad Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ludamad Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

spalladino Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ludamad Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ludamad Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

spalladino Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

ludamad left a comment

Choose a reason for hiding this comment

Uh oh!

spalladino commented Jan 23, 2026

Uh oh!

ludamad commented Jan 23, 2026

Uh oh!

AztecBot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flakey Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`/fix-pr <PR number>`

`/rebase-pr <PR number>`

`/debug-e2e <hash|PR|URL|test name>`

Subagents (`yarn-project/.claude/agents/`)

Skills (`yarn-project/.claude/skills/`)

ludamad Jan 23, 2026 •

edited

Loading

AztecBot commented Jan 23, 2026 •

edited

Loading