Skip to content

Commit c1c05f0

Browse files
docs: Document checklist breaks Sonnet autonomy (8/22 steps)
Related to #1703, #1755
1 parent 464221d commit c1c05f0

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

docs/DISCOVERIES.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -512,3 +512,39 @@ This file should be referenced by:
512512
- **CLAUDE.md**: "Before solving complex problems, check @docs/DISCOVERIES.md"
513513
- **AGENTS.md**: "Review @docs/DISCOVERIES.md to avoid known pitfalls"
514514
- **New developers**: "Read DISCOVERIES.md to understand institutional knowledge"
515+
516+
## Checklist CLAUDE.md Breaks Sonnet 4.5 Autonomy (2025-11-30)
517+
518+
### Issue
519+
520+
Follow-up testing to #1703 Opus experiments revealed checklist approach DEGRADES Sonnet 4.5 by causing premature workflow termination.
521+
522+
### Testing
523+
524+
Ran Sonnet 4.5 on REST API Client (HIGH complexity) with:
525+
526+
1. Original CLAUDE.md (baseline)
527+
2. Checklist CLAUDE.md (Approach 2 from #1703)
528+
529+
### Results
530+
531+
**Original Sonnet**: 104m, $24, 109 turns, **22/22 steps**
532+
**Checklist Sonnet**: 36m, $8, 35 turns, **8/22 steps**
533+
534+
### Root Cause
535+
536+
Checklist validation gates (STOP checkpoints, pre-flight validation) trigger Sonnet to pause and ask permission: "Would ye like me to continue?" This violates autonomy guidelines and causes premature stopping.
537+
538+
### Key Learning
539+
540+
**Model-specific behavior**: Interventions designed to force Opus completion have OPPOSITE effect on Sonnet - they cause stopping instead of continuation.
541+
542+
### Solution
543+
544+
**DO NOT implement checklist approach in production** - it breaks the model that works naturally. Use Sonnet 4.5 with original CLAUDE.md for all use cases.
545+
546+
###Prevention
547+
548+
- **Test interventions across ALL target models** - what helps one can break another
549+
- **Validation gates harmful for autonomous models** - Sonnet needs zero checkpoints
550+
- **No universal CLAUDE.md solution** - different models need different approaches

0 commit comments

Comments
 (0)