Skip to content

Commit 20548ba

Browse files
committed
feat: add Stage 2 automated fix workflow for fuzzer crashes
- Add fuzzer-fix-automation.yml workflow - Triggers on issues labeled 'fuzzer' - Downloads crash artifacts and reproduces crashes - Analyzes root cause using Claude Opus 4 - Attempts to create fixes for straightforward bugs - Writes regression tests that fail before fix, pass after - Verifies fix with fuzzer, tests, clippy, fmt - Posts detailed analysis as issue comment - Update documentation for two-stage approach - Stage 1: Crash detection and issue creation (existing PR #5292) - Stage 2: Automated fix attempt (new workflow) - Add workflow diagrams showing both stages - Document Claude's capabilities and limitations - Add examples of successful fixes vs. analysis-only - Include cost/performance metrics for both stages - Best practices for reviewing automated fixes This creates a complete fuzzer crash handling pipeline: 1. Detect crash → Create issue with analysis 2. Attempt fix → Post regression tests and verification
1 parent 98cda38 commit 20548ba

File tree

2 files changed

+492
-24
lines changed

2 files changed

+492
-24
lines changed

.github/workflows/README-fuzzer-analysis.md

Lines changed: 166 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
# Automated Fuzzer Crash Analysis with Claude Code
22

3-
This directory contains workflows for automated fuzzer crash detection, analysis, and issue creation using the Claude Code bot.
3+
This directory contains workflows for automated fuzzer crash detection, analysis, issue creation, and fix automation using the Claude Code bot.
44

55
## Overview
66

7-
The fuzzing infrastructure automatically detects crashes and uses Claude to analyze them and create/update GitHub issues with duplicate detection.
7+
The fuzzing infrastructure has **two stages**:
8+
9+
1. **Stage 1 (fuzz.yml)**: Detects crashes and uses Claude to analyze and create/update GitHub issues with smart duplicate detection
10+
2. **Stage 2 (fuzzer-fix-automation.yml)**: When a fuzzer issue is created, automatically attempts to fix it, create regression tests, and post findings
811

912
## How It Works
1013

@@ -161,17 +164,78 @@ Claude has access to:
161164
- ✅ Source code analysis at crash locations
162165
- ✅ Cargo fuzz for crash reproduction (optional)
163166

164-
Claude uses:
167+
**Stage 1 (Issue Creation)** uses:
165168
- **Model**: Claude Sonnet 4.5 (`claude-sonnet-4-5-20250929`)
166169
- **Cost**: ~$0.03-0.05 per crash analysis
167170
- **Max turns**: 25 (for complex analysis)
168171

172+
**Stage 2 (Fix Automation)** uses:
173+
- **Model**: Claude Opus 4 (`claude-opus-4-20250514`) - more capable for code generation
174+
- **Cost**: ~$0.15-0.25 per fix attempt
175+
- **Max turns**: 40 (allows for iterative fixing and testing)
176+
177+
### 3. Automated Fix Attempt (fuzzer-fix-automation.yml)
178+
179+
When a fuzzer issue is created (labeled with `fuzzer`), Claude automatically:
180+
181+
1. **Extracts Crash Details** - Parses the issue body for:
182+
- Target name
183+
- Crash file name
184+
- Artifact download URL
185+
- Stack trace and error message
186+
187+
2. **Downloads and Reproduces** - Attempts to:
188+
- Download the crash artifact
189+
- Reproduce the crash locally with the fuzzer
190+
- Verify the panic/error occurs
191+
192+
3. **Analyzes Root Cause** - Deep analysis of:
193+
- Source code at crash location
194+
- Stack trace to understand call path
195+
- Debug output to see problematic input
196+
- Determines the underlying bug
197+
198+
4. **Assesses Fixability** - Decides if this is fixable automatically:
199+
- **CAN FIX**: Missing bounds check, validation, edge case handling, simple panics
200+
- **CANNOT FIX**: Architectural issues, complex logic, requires domain knowledge
201+
202+
5. **Creates Fix (if straightforward)**:
203+
- Modifies source code with minimal changes
204+
- Adds validation or bounds checks
205+
- Handles the edge case properly
206+
- Follows project code style guidelines
207+
208+
6. **Writes Regression Tests**:
209+
- Creates test using the actual fuzzer input that triggered the crash
210+
- Test fails before the fix, passes after
211+
- Placed in appropriate test module
212+
- Named clearly (e.g., `test_fuzzer_crash_issue_123`)
213+
214+
7. **Verifies the Fix**:
215+
- Runs regression test
216+
- Runs fuzzer with crash file (should not panic)
217+
- Runs related tests
218+
- Checks with clippy
219+
- Formats code
220+
221+
8. **Posts Findings** - Comments on the issue with:
222+
- Root cause analysis
223+
- Fix description (if created)
224+
- Regression test details
225+
- Verification results
226+
- OR explanation of why it can't be fixed automatically
227+
169228
## Workflow Structure
170229

171230
```
231+
┌─────────────────────────────────────────┐
232+
│ STAGE 1: Detection & Issue Creation │
233+
│ (fuzz.yml) │
234+
└─────────────────────────────────────────┘
235+
172236
┌─────────────────────────────────────────┐
173237
│ io_fuzz / ops_fuzz │
174-
│ - Run fuzzing target
238+
│ - Run fuzzing target (2 hours)
175239
│ - Check for crashes │
176240
│ - Archive artifacts + logs │
177241
│ - Output: crashes_found, first_crash │
@@ -184,6 +248,29 @@ Claude uses:
184248
│ - Download fuzzer logs │
185249
│ - Run Claude with analysis prompt │
186250
│ - Claude creates/updates issues │
251+
│ • Smart duplicate detection │
252+
│ • Occurrence tracking │
253+
│ • Detailed crash analysis │
254+
└──────────────┬──────────────────────────┘
255+
256+
│ Issue created with 'fuzzer' label
257+
258+
259+
┌─────────────────────────────────────────┐
260+
│ STAGE 2: Automated Fix Attempt │
261+
│ (fuzzer-fix-automation.yml) │
262+
└─────────────────────────────────────────┘
263+
264+
┌─────────────────────────────────────────┐
265+
│ attempt-fix │
266+
│ - Triggered by issue with 'fuzzer' label│
267+
│ - Download crash artifact │
268+
│ - Reproduce the crash │
269+
│ - Analyze root cause │
270+
│ - Create fix if straightforward │
271+
│ - Write regression tests │
272+
│ - Verify fix works │
273+
│ - Post analysis comment │
187274
└─────────────────────────────────────────┘
188275
```
189276

@@ -211,6 +298,12 @@ Claude uses:
211298
- `id-token: write` - OIDC token for authentication
212299
- `pull-requests: read` - Read PR context if needed
213300

301+
**Fix automation job (fuzzer-fix-automation.yml):**
302+
- `contents: write` - Modify source files to create fixes
303+
- `pull-requests: write` - Create PRs if requested
304+
- `issues: write` - Comment on issues with findings
305+
- `id-token: write` - OIDC token for authentication
306+
214307
## Monitoring
215308

216309
### View Fuzzing Runs
@@ -305,40 +398,75 @@ Check:
305398

306399
## Examples
307400

308-
### Example 1: New Crash Detected
401+
### Example 1: New Crash Detected and Auto-Fixed
309402

310-
1. Fuzzer detects crash in `file_io` target
403+
**Stage 1 - Issue Creation:**
404+
1. Fuzzer detects crash in `file_io` target: "index out of bounds"
311405
2. `io_fuzz` job archives crash files and logs
312406
3. `report-io-fuzz-failures` job triggered
313-
4. Claude analyzes fuzzer log, identifies panic in `vortex_io::read_header`
407+
4. Claude analyzes fuzzer log, identifies panic in `vortex_io::read_header` at line 45
314408
5. Claude searches existing issues, finds no match
315-
6. Claude creates new issue with detailed analysis
316-
7. Issue includes stack trace, root cause, reproduction steps
317-
318-
### Example 2: Duplicate Crash
319-
409+
6. Claude creates issue #789 with detailed analysis, labels it `bug,fuzzer`
410+
411+
**Stage 2 - Fix Automation:**
412+
7. `fuzzer-fix-automation` workflow triggers on issue #789
413+
8. Claude extracts crash details from issue body
414+
9. Claude downloads crash artifact
415+
10. Claude reproduces the crash locally
416+
11. Claude analyzes source code at `vortex_io::read_header:45`
417+
12. Root cause: Missing bounds check before indexing into buffer
418+
13. Claude creates fix: Adds validation `if index >= buffer.len() { return Err(...) }`
419+
14. Claude writes regression test: `test_fuzzer_crash_issue_789()`
420+
15. Claude verifies: test passes, fuzzer doesn't crash, clippy passes
421+
16. Claude comments on issue #789 with full analysis and fix details
422+
17. Human reviews and merges the fix
423+
424+
### Example 2: Duplicate Crash (No Fix Attempted)
425+
426+
**Stage 1 - Duplicate Detection:**
320427
1. Fuzzer detects crash (same as issue #123)
321428
2. Claude analyzes, recognizes same crash location and error pattern
322429
3. Claude finds existing issue #123
323430
4. Claude updates tracking comment: "Crash seen 5 time(s)"
324431
5. No new issue created, keeping issue list clean
325432

326-
### Example 3: Similar but Different
433+
**Stage 2 - No trigger:**
434+
6. Fix automation doesn't trigger (no new issue created)
435+
7. Human can manually trigger on issue #123 if desired
436+
437+
### Example 3: Complex Crash (Analysis Only)
327438

328-
1. Fuzzer detects crash in same function as issue #456
329-
2. Claude analyzes, sees same location but different error pattern
330-
3. Claude determines it's SIMILAR (medium confidence)
331-
4. Claude adds comment to issue #456 explaining the similarity
332-
5. Human reviews and decides if it's truly the same or needs new issue
439+
**Stage 1 - Issue Creation:**
440+
1. Fuzzer detects crash in `array_ops` target
441+
2. Claude creates issue #790 with analysis
442+
443+
**Stage 2 - Cannot Auto-Fix:**
444+
3. `fuzzer-fix-automation` triggers on issue #790
445+
4. Claude analyzes the crash
446+
5. Determines it's an architectural issue requiring refactoring
447+
6. Claude comments: "This requires human intervention" with detailed analysis
448+
7. Provides suggestions for how to approach the fix
449+
8. Human developer takes over from Claude's analysis
333450

334451
## Best Practices
335452

453+
### For Stage 1 (Issue Creation)
454+
336455
1. **Review Claude's Classifications** - Especially "similar" cases
337456
2. **Close True Duplicates** - If Claude missed one, close and reference the original
338457
3. **Add Labels** - Tag issues with severity (`P0`, `P1`, etc.)
339458
4. **Track Frequency** - High occurrence counts indicate priority bugs
340-
5. **Minimize Test Cases** - Use `cargo fuzz tmin` to create minimal reproducers
341-
6. **Update Corpus** - Add interesting crashes to corpus after fixing
459+
460+
### For Stage 2 (Fix Automation)
461+
462+
1. **Review All Fixes** - Claude's fixes are suggestions, always review before merging
463+
2. **Test Thoroughly** - Run the regression test and broader test suite
464+
3. **Check Edge Cases** - Verify Claude considered all edge cases, not just the crash
465+
4. **Assess Test Quality** - Ensure regression tests actually catch the bug
466+
5. **Consider Broader Impact** - Check if the same issue exists elsewhere in the codebase
467+
6. **Minimize Test Cases** - Use `cargo fuzz tmin` to create minimal reproducers
468+
7. **Update Corpus** - Add interesting crashes to corpus after fixing
469+
8. **Close Issues** - Once merged, close the issue and reference the PR
342470

343471
## Limitations
344472

@@ -356,23 +484,37 @@ In these cases:
356484

357485
## Cost and Performance
358486

487+
### Stage 1 (Issue Creation)
359488
- **Analysis Time**: 1-3 minutes per crash
360489
- **Cost**: ~$0.03-0.05 per crash (using Sonnet 4.5)
361490
- **Accuracy**: High for duplicate detection (based on source code analysis)
362491
- **False Positives**: Low (conservative by default)
363492

364-
## Future Enhancements
493+
### Stage 2 (Fix Automation)
494+
- **Analysis Time**: 5-15 minutes per crash (includes reproduction, analysis, fixing, testing)
495+
- **Cost**: ~$0.15-0.25 per fix attempt (using Opus 4)
496+
- **Success Rate**: Depends on crash complexity
497+
- Simple bugs (bounds checks, validation): ~70-80% fix rate
498+
- Medium complexity: ~30-50% fix rate
499+
- Complex bugs: Analysis only, human intervention needed
500+
- **False Fixes**: Very low (Claude is conservative about committing changes)
365501

366-
Potential improvements:
502+
## Future Enhancements
367503

504+
### Stage 1 Enhancements
368505
- [ ] Automatic crash minimization before reporting
369-
- [ ] Severity classification (security vs stability)
370-
- [ ] Automatic PR creation for simple fixes
371506
- [ ] Integration with coverage reports
372507
- [ ] Historical crash trend analysis
373508
- [ ] Cross-target duplicate detection
374509
- [ ] Automatic corpus optimization
375510

511+
### Stage 2 Enhancements
512+
- [ ] Automatic PR creation (currently just posts fix)
513+
- [ ] Severity classification (security vs stability)
514+
- [ ] Suggest fixes to similar code patterns across codebase
515+
- [ ] Batch fix multiple similar crashes
516+
- [ ] Learn from accepted/rejected fixes to improve future attempts
517+
376518
## Support
377519

378520
For issues with the fuzzing infrastructure:

0 commit comments

Comments
 (0)