Summary
A simple scaffolding task (issue-1-scaffolding on l2l project) consumed 9.6M effective input tokens across 3 sessions and 180 API turns. The primary driver is the high number of API turns, each re-reading ~50K tokens of cached system prompt.
Problem / Context
Token breakdown from the second agent test:
- cache_read: 9,160,663 tokens (95.2%)
- cache_create: 461,016 tokens (4.8%)
- direct_input: 204 tokens (0.0%)
- Total: 9,621,883 effective input tokens
- Output: 30,371 tokens
180 API turns across 3 sessions:
- Session 1 (planning): 43 turns — ENTIRELY WASTED (planned already-completed work)
- Session 2 (verification): 87 turns — ~50% waste from sequential single-tool operations
- Session 3 (pr-review): 50 turns — moderate waste
Average context per turn: ~53K tokens (mostly Claude Code system prompt + tool definitions).
Analysis
Session 1 waste (43 turns saved entirely):
- Planning phase produced a full implementation plan for work already done on
feat/config-module
- Pre-check
git diff main..HEAD -- src/ tests/ would have revealed existing code
- Savings: 43 turns × 53K = ~2.3M tokens
Session 2 waste (~57 turns saveable):
- Turns 6-9: 4 sequential Glob calls → could be 1 parallel turn
- Turns 10-13: 4 sequential test Bash calls → could be 1 batched turn
- Turns 16-21: 6 sequential git commands → could be 2 turns
- Turns 29-37: 8 sequential
git show calls → could be 1-2 turns
- Turns 48-65: 9 sequential Edit calls for spec checkboxes → could batch
- Savings: ~57 turns × 53K = ~3M tokens
Session 3 waste (~20 turns saveable):
- Re-verifies work already verified in session 2
- Failed reads of non-existent files (AGENTS.md)
Proposed Solution
P0: Pre-implementation detection in feature-loop.sh
# Before planning, check if implementation exists
CODE_DIFF=$(git diff main..HEAD -- src/ tests/ | wc -l)
if [ "$CODE_DIFF" -gt 0 ]; then
echo "Implementation exists, skipping planning phase"
# Skip directly to verification
fi
P1: Batch operations in prompts
Update verification prompt (PROMPT_verify.md) to instruct:
- "Run all verification commands in a SINGLE Bash call"
- "Use parallel tool calls (multiple Glob/Read in one turn)"
- "Batch spec checkbox updates into a single Edit"
P2: Merge verification + PR into single session
Eliminate session 3 entirely by having the verification session also create the PR.
P3: Prompt compression
- Remove AGENTS.md references (saves failed reads)
- Trim verbose prompt sections
- Remove unnecessary skill invocations
Projected Impact
| Optimization |
Turns saved |
Tokens saved |
| Skip planning when code exists |
43 |
~2.3M |
| Batch verification operations |
57 |
~3.0M |
| Merge verify+PR sessions |
20 |
~1.1M |
| Total |
120 |
~6.4M |
Current: 180 turns, 9.6M tokens
Optimized: ~60 turns, ~3.2M tokens (67% reduction)
Acceptance Criteria
Summary
A simple scaffolding task (issue-1-scaffolding on l2l project) consumed 9.6M effective input tokens across 3 sessions and 180 API turns. The primary driver is the high number of API turns, each re-reading ~50K tokens of cached system prompt.
Problem / Context
Token breakdown from the second agent test:
180 API turns across 3 sessions:
Average context per turn: ~53K tokens (mostly Claude Code system prompt + tool definitions).
Analysis
Session 1 waste (43 turns saved entirely):
feat/config-modulegit diff main..HEAD -- src/ tests/would have revealed existing codeSession 2 waste (~57 turns saveable):
git showcalls → could be 1-2 turnsSession 3 waste (~20 turns saveable):
Proposed Solution
P0: Pre-implementation detection in feature-loop.sh
P1: Batch operations in prompts
Update verification prompt (
PROMPT_verify.md) to instruct:P2: Merge verification + PR into single session
Eliminate session 3 entirely by having the verification session also create the PR.
P3: Prompt compression
Projected Impact
Current: 180 turns, 9.6M tokens
Optimized: ~60 turns, ~3.2M tokens (67% reduction)
Acceptance Criteria