-
Notifications
You must be signed in to change notification settings - Fork 583
chore: CI debugging & PR fixing tooling for Claude #19906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reorganizes Claude Code's CI/PR tooling around two primary workflows:
**1. Autonomous Workflows** - "Fix/rebase this PR, don't bother me"
- `/fix-pr <PR>` - Identifies CI failures, rebases, fixes issues, pushes
- `/rebase-pr <PR>` - Simple rebase with conflict resolution and build verification
**2. Interactive Debugging** - "Debug this flaky test with me"
- `/debug-e2e <hash|PR|URL>` - Ping-pong debugging session for e2e failures
The key insight is that CI logs can be 50k+ lines and pollute context. All log
reading now happens in Task subagents, never inline:
```
┌────────────────────────────────────────────────────────┐
│ User-Invocable Skills │
├─────────────┬─────────────┬────────────────────────────┤
│ /fix-pr │ /rebase-pr │ /debug-e2e │
│ (autonomous)│ (autonomous)│ (interactive session) │
└──────┬──────┴──────┬──────┴──────────┬─────────────────┘
│ │ │
▼ │ ▼
┌─────────────────┐ │ ┌────────────────────────────┐
│ identify-ci- │──┘ │ analyze-logs │
│ failures │ │ (subagent) │
│ (subagent) │ │ │
└─────────────────┘ │ Deep-reads logs, returns │
│ │ condensed summaries │
▼ └────────────────────────────┘
┌────────────────────────────────────────────────────────┐
│ Logs downloaded once to /tmp/<hash>.log │
│ Reused via local paths by all agents │
└────────────────────────────────────────────────────────┘
```
- **identify-ci-failures** - Given PR/URL/hash, finds all failures, downloads logs
to /tmp/, returns structured failure list with local paths
- **analyze-logs** - Deep-reads specific logs, extracts relevant sections, returns
summaries (never raw content). Understands Aztec log format, test markers, multi-
actor patterns
- **retrospective** - Session analysis for self-improvement (captures corrections,
failed approaches, learned patterns)
- **debug-e2e** - Interactive debugging orchestrator. Spawns subagents for log
analysis, works with user to form/test hypotheses, suggests fixes
- **fix-pr** - Autonomous PR fixer. Identifies failures via subagent, rebases,
applies fixes based on failure type, pushes
- **rebase-pr** - Simple rebase workflow with conflict resolution
- **extract-test-logs.sh** - Extracts logs for a specific test from a log file
(used by analyze-logs to isolate relevant sections)
- Added "Native Tools Over Bash" section (prefer Glob/Read/Grep/Edit over bash)
- Updated build command to `yarn build` (not tsgo)
- Updated test commands to use `yarn workspace` (don't cd into packages)
- Added CI Log Access section with `./ci.sh dlog` instructions
- Updated quality checklist with correct commands
- `.claude/commands/fix-pr.md` (replaced by skill)
- `.claude/commands/rebase-pr.md` (replaced by skill)
- `.claude/skills/ci-log-reader/SKILL.md` (absorbed into subagents)
- `.claude/agents/e2e-test-debugger.md` (replaced by debug-e2e skill)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ae48857 to
431fe8f
Compare
| <example> | ||
| user: "Let's do a retrospective on this session" | ||
| assistant: "I'll analyze our conversation to extract learnings." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this one make sense as an agent? An agent can't run at the end of a session can it? I thought they always have somewhat fresh context. I would think this is a skill, something like reporting-completion.md or else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted it to be able to run in a separate context window and have it re-read the conversation transcript. Unfortunately I didn't get to test this one before running out of credits.
|
|
||
| ## Usage | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/fix-pr is no longer usable, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw i've been using this by just telling claude inline to use the 'fix-pr command in yarn-project', so I wouldn't miss it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. Commands and skills have been merged, and you can call a skill as if it were a command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah the claude changelog is mandatory daily reading isnt it.
Added automatic discovery of skills from nested .claude/skills directories when working with files in subdirectories that addresses the underlying reason why I was curious about things being automatically picked up. neat
For the curious, 2 weeks ago! https://github.com/anthropics/claude-code/blame/main/CHANGELOG.md
The development schedules for the presentation layer of AI are nuts
| ## Native Tools Over Bash | ||
|
|
||
| Prefer native tools over bash equivalents—they don't require permissions and provide better output: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I've found they make better output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found that claude messes up less with its native tools, but if this is contentious I'm fine to remove it. I've also added native tools to the permissions, so they should be less annoying now.
ludamad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, it's an imperfect science but seems like a good set of changes
Just think how fun it would be to have a set of end to end tests to test the claude skills we define |
|
Oh believe me when pitched 'AI for CI' I already had that full thought experiment and how fun being a prompt maintainer would/wouldn't be |
Flakey Tests🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry. |
Summary
Reorganizes Claude Code's CI/PR tooling around two primary workflows:
/fix-pr,/rebase-pr) - "Fix this PR, don't bother me"/debug-e2e) - "Debug this flaky test with me"The key architectural change is that all log reading happens in subagents, never inline. This keeps conversation context clean since CI logs can be 50k+ lines.
How to Use These Tools
/fix-pr <PR number>Use case: CI is failing on your PR and you want Claude to autonomously fix it.
Claude will:
Best for: Format errors, lint issues, obvious build errors, simple test fixes.
For complex e2e failures, it will suggest using
/debug-e2einstead./rebase-pr <PR number>Use case: Your PR just needs a rebase and you want Claude to handle conflicts.
Claude will:
/debug-e2e <hash|PR|URL|test name>Use case: Flaky e2e test that needs investigation, or you want to debug interactively.
This starts an interactive debugging session where Claude:
The key difference from
/fix-pr: this is collaborative ping-pong debugging, not autonomous fixing.Architecture
New Components
Subagents (
yarn-project/.claude/agents/)/tmp/, returns structured listSkills (
yarn-project/.claude/skills/)Scripts
CLAUDE.md Updates
yarn buildyarn workspacepatternRemoved
.claude/commands/fix-pr.md(replaced by skill).claude/commands/rebase-pr.md(replaced by skill).claude/skills/ci-log-reader/SKILL.md(absorbed into subagents).claude/agents/e2e-test-debugger.md(replaced by debug-e2e skill)Test plan
/fix-pron a PR with format errors/rebase-pron a PR that needs rebasing/debug-e2eon a failing e2e test hash🤖 Generated with Claude Code