Skip to content

Conversation

@spalladino
Copy link
Contributor

Summary

Reorganizes Claude Code's CI/PR tooling around two primary workflows:

  1. Autonomous workflows (/fix-pr, /rebase-pr) - "Fix this PR, don't bother me"
  2. Interactive debugging (/debug-e2e) - "Debug this flaky test with me"

The key architectural change is that all log reading happens in subagents, never inline. This keeps conversation context clean since CI logs can be 50k+ lines.

How to Use These Tools

/fix-pr <PR number>

Use case: CI is failing on your PR and you want Claude to autonomously fix it.

/fix-pr 19882

Claude will:

  1. Identify what failed (build, lint, format, tests)
  2. Checkout and rebase your PR on the base branch
  3. Apply fixes based on failure type
  4. Run quality checks
  5. Amend and push

Best for: Format errors, lint issues, obvious build errors, simple test fixes.
For complex e2e failures, it will suggest using /debug-e2e instead.

/rebase-pr <PR number>

Use case: Your PR just needs a rebase and you want Claude to handle conflicts.

/rebase-pr 19882

Claude will:

  1. Checkout the PR
  2. Rebase on the base branch
  3. Resolve conflicts
  4. Run bootstrap if needed
  5. Verify build
  6. Push

/debug-e2e <hash|PR|URL|test name>

Use case: Flaky e2e test that needs investigation, or you want to debug interactively.

/debug-e2e 343c52b17688d2cd      # CI log hash
/debug-e2e 19783                  # PR number
/debug-e2e epochs_l1_reorgs       # Test name for general investigation

This starts an interactive debugging session where Claude:

  1. Identifies failures via subagent (keeps logs out of main context)
  2. Discusses findings with you
  3. Spawns log analysis subagents to test hypotheses
  4. Compares failing vs successful runs
  5. Searches codebase for root cause
  6. Suggests fixes

The key difference from /fix-pr: this is collaborative ping-pong debugging, not autonomous fixing.

Architecture

┌────────────────────────────────────────────────────────┐
│               User-Invocable Skills                    │
├─────────────┬─────────────┬────────────────────────────┤
│  /fix-pr    │ /rebase-pr  │       /debug-e2e           │
│ (autonomous)│ (autonomous)│   (interactive session)    │
└──────┬──────┴──────┬──────┴──────────┬─────────────────┘
       │             │                  │
       ▼             │                  ▼
┌─────────────────┐  │     ┌────────────────────────────┐
│ identify-ci-    │──┘     │    analyze-logs            │
│ failures        │        │    (subagent)              │
│ (subagent)      │        │                            │
└─────────────────┘        │ Deep-reads logs, returns   │
       │                   │ condensed summaries        │
       ▼                   └────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│    Logs downloaded once to /tmp/<hash>.log             │
│    Reused via local paths by all agents                │
└────────────────────────────────────────────────────────┘

New Components

Subagents (yarn-project/.claude/agents/)

  • identify-ci-failures: Finds all failures from PR/URL/hash, downloads logs to /tmp/, returns structured list
  • analyze-logs: Deep-reads specific logs, understands Aztec log format, returns condensed summaries
  • retrospective: Session analysis for self-improvement (unrelated to CI, but added in this PR)

Skills (yarn-project/.claude/skills/)

  • debug-e2e: Interactive debugging orchestrator
  • fix-pr: Autonomous PR fixer
  • rebase-pr: Simple rebase workflow

Scripts

  • extract-test-logs.sh: Extracts logs for a specific test from log file

CLAUDE.md Updates

  • Added "Native Tools Over Bash" section
  • Updated build command to yarn build
  • Updated test commands to use yarn workspace pattern
  • Added CI Log Access section
  • Simplified quality checklist

Removed

  • .claude/commands/fix-pr.md (replaced by skill)
  • .claude/commands/rebase-pr.md (replaced by skill)
  • .claude/skills/ci-log-reader/SKILL.md (absorbed into subagents)
  • .claude/agents/e2e-test-debugger.md (replaced by debug-e2e skill)

Test plan

  • Run /fix-pr on a PR with format errors
  • Run /rebase-pr on a PR that needs rebasing
  • Run /debug-e2e on a failing e2e test hash

🤖 Generated with Claude Code

@spalladino spalladino changed the title chore: consolidate CI debugging & PR fixing tooling for Claude chore: CI debugging & PR fixing tooling for Claude Jan 23, 2026
Reorganizes Claude Code's CI/PR tooling around two primary workflows:

**1. Autonomous Workflows** - "Fix/rebase this PR, don't bother me"
- `/fix-pr <PR>` - Identifies CI failures, rebases, fixes issues, pushes
- `/rebase-pr <PR>` - Simple rebase with conflict resolution and build verification

**2. Interactive Debugging** - "Debug this flaky test with me"
- `/debug-e2e <hash|PR|URL>` - Ping-pong debugging session for e2e failures

The key insight is that CI logs can be 50k+ lines and pollute context. All log
reading now happens in Task subagents, never inline:

```
┌────────────────────────────────────────────────────────┐
│               User-Invocable Skills                    │
├─────────────┬─────────────┬────────────────────────────┤
│  /fix-pr    │ /rebase-pr  │       /debug-e2e           │
│ (autonomous)│ (autonomous)│   (interactive session)    │
└──────┬──────┴──────┬──────┴──────────┬─────────────────┘
       │             │                  │
       ▼             │                  ▼
┌─────────────────┐  │     ┌────────────────────────────┐
│ identify-ci-    │──┘     │    analyze-logs            │
│ failures        │        │    (subagent)              │
│ (subagent)      │        │                            │
└─────────────────┘        │ Deep-reads logs, returns   │
       │                   │ condensed summaries        │
       ▼                   └────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│    Logs downloaded once to /tmp/<hash>.log             │
│    Reused via local paths by all agents                │
└────────────────────────────────────────────────────────┘
```

- **identify-ci-failures** - Given PR/URL/hash, finds all failures, downloads logs
  to /tmp/, returns structured failure list with local paths
- **analyze-logs** - Deep-reads specific logs, extracts relevant sections, returns
  summaries (never raw content). Understands Aztec log format, test markers, multi-
  actor patterns
- **retrospective** - Session analysis for self-improvement (captures corrections,
  failed approaches, learned patterns)

- **debug-e2e** - Interactive debugging orchestrator. Spawns subagents for log
  analysis, works with user to form/test hypotheses, suggests fixes
- **fix-pr** - Autonomous PR fixer. Identifies failures via subagent, rebases,
  applies fixes based on failure type, pushes
- **rebase-pr** - Simple rebase workflow with conflict resolution

- **extract-test-logs.sh** - Extracts logs for a specific test from a log file
  (used by analyze-logs to isolate relevant sections)

- Added "Native Tools Over Bash" section (prefer Glob/Read/Grep/Edit over bash)
- Updated build command to `yarn build` (not tsgo)
- Updated test commands to use `yarn workspace` (don't cd into packages)
- Added CI Log Access section with `./ci.sh dlog` instructions
- Updated quality checklist with correct commands

- `.claude/commands/fix-pr.md` (replaced by skill)
- `.claude/commands/rebase-pr.md` (replaced by skill)
- `.claude/skills/ci-log-reader/SKILL.md` (absorbed into subagents)
- `.claude/agents/e2e-test-debugger.md` (replaced by debug-e2e skill)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@spalladino spalladino force-pushed the palla/claude-ci-tooling-consolidation branch from ae48857 to 431fe8f Compare January 23, 2026 19:22
<example>
user: "Let's do a retrospective on this session"
assistant: "I'll analyze our conversation to extract learnings."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this one make sense as an agent? An agent can't run at the end of a session can it? I thought they always have somewhat fresh context. I would think this is a skill, something like reporting-completion.md or else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted it to be able to run in a separate context window and have it re-read the conversation transcript. Unfortunately I didn't get to test this one before running out of credits.


## Usage

```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/fix-pr is no longer usable, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw i've been using this by just telling claude inline to use the 'fix-pr command in yarn-project', so I wouldn't miss it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. Commands and skills have been merged, and you can call a skill as if it were a command.

Copy link
Collaborator

@ludamad ludamad Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah the claude changelog is mandatory daily reading isnt it.
Added automatic discovery of skills from nested .claude/skills directories when working with files in subdirectories that addresses the underlying reason why I was curious about things being automatically picked up. neat

For the curious, 2 weeks ago! https://github.com/anthropics/claude-code/blame/main/CHANGELOG.md
The development schedules for the presentation layer of AI are nuts

## Native Tools Over Bash

Prefer native tools over bash equivalents—they don't require permissions and provide better output:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I've found they make better output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've found that claude messes up less with its native tools, but if this is contentious I'm fine to remove it. I've also added native tools to the permissions, so they should be less annoying now.

Copy link
Collaborator

@ludamad ludamad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it's an imperfect science but seems like a good set of changes

@spalladino
Copy link
Contributor Author

it's an imperfect science

Just think how fun it would be to have a set of end to end tests to test the claude skills we define

@spalladino spalladino added this pull request to the merge queue Jan 23, 2026
@ludamad
Copy link
Collaborator

ludamad commented Jan 23, 2026

Oh believe me when pitched 'AI for CI' I already had that full thought experiment and how fun being a prompt maintainer would/wouldn't be

@AztecBot
Copy link
Collaborator

AztecBot commented Jan 23, 2026

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/1aeb93997d8a8df3�1aeb93997d8a8df38;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_p2p/valid_epoch_pruned_slash.test.ts (397s) (code: 1) group:e2e-p2p-epoch-flakes (\033Santiago Palladino\033: chore: CI debugging & PR fixing tooling for Claude (#19906))

Merged via the queue into next with commit d44f1da Jan 23, 2026
16 checks passed
@spalladino spalladino deleted the palla/claude-ci-tooling-consolidation branch January 23, 2026 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants