Target Workflow: smoke-claude
Source report: #1606
Estimated cost per run: ~$0.24 (range: $0.18–$0.30)
Total tokens per run: ~203K
Cache read rate: 76.2% (excellent)
Cache write rate: 22.9% ← dominant cost at 73% of spend
LLM turns: ~6 (4 × Sonnet + 2 × Haiku routing)
Run frequency: 21 runs over the last 24 hours (every PR + schedule)
Current Configuration
| Setting |
Value |
| Tools loaded |
5 (cache-memory, github, playwright, edit, bash) |
| Tools actually used |
4 (github, playwright, bash, safe-outputs) |
| Network groups |
defaults, github, playwright |
| Shared imports |
shared/mcp-pagination.md (3,225 bytes, ~810 tokens) |
| Pre-agent steps |
No |
| Prompt size |
smoke-claude.md: 3,531 bytes + mcp-pagination.md: 3,225 bytes |
max-turns |
15 (actual runs use ~6) |
edit: is loaded but never called — the file-creation test uses bash echo, not the edit tool.
cache-memory: is loaded but not needed — cache memory is designed for multi-session persistent memory in long-running agents. A 6-turn smoke test does not benefit from it.
Recommendations
1. Remove edit: tool — not used by the smoke test
The smoke test creates a file using bash echo. The edit: tool is never called.
Estimated savings: ~600 tokens/run from cache writes (~1.3%)
Change in .github/workflows/smoke-claude.md:
tools:
cache-memory: true
github:
toolsets: [repos, pull_requests]
playwright:
allowed_domains:
- github.com
- edit:
bash:
- "*"
Cost savings: ~$0.00225/run × 630 runs/month ≈ $1.42/month
2. Remove cache-memory: true — irrelevant for a short smoke test
Cache memory persists learned facts across agent sessions. A 6-turn smoke test that runs then exits has no use for cross-session memory. Removing it eliminates:
- The
cache_memory_prompt.md framework injection from the system prompt (~2,000–3,000 tokens)
- The cache-memory tool schema (~600 tokens)
Estimated savings: ~2,500 tokens/run from cache writes (~5.4%)
Change in .github/workflows/smoke-claude.md:
tools:
- cache-memory: true
github:
toolsets: [repos, pull_requests]
Cost savings: ~$0.009/run × 630 runs/month ≈ $5.91/month
3. Remove imports: shared/mcp-pagination.md (or inline a one-liner)
mcp-pagination.md is 3,225 bytes (~810 tokens) of detailed pagination guidance: retry loops, common perPage values, error message examples, etc. This guidance targets workflows that fetch large result sets (75K-token PR diffs, full code searches). The smoke test only calls list_pull_requests to get 2 recent merged PRs — no pagination risk.
Option A (recommended): Remove the import entirely. Add a single inline note to the prompt:
Change in .github/workflows/smoke-claude.md:
-imports:
- - shared/mcp-pagination.md
And in the prompt body:
+> Use `perPage: 2` when listing PRs.
+
## Test Requirements
Option B (conservative): Keep the import but accept the overhead.
Estimated savings (Option A): ~810 tokens/run (~1.8%)
Cost savings: ~$0.003/run × 630 runs/month ≈ $1.91/month
4. Reduce max-turns: 15 to 8
Actual runs consistently use 6 turns. A ceiling of 8 gives a 33% safety buffer while preventing cost runaway if the agent loops unexpectedly.
At 15 turns instead of 8, a runaway session wastes 7 extra sonnet turns × ~39K cache reads × $0.30/M = $0.082 extra per runaway run.
Change in .github/workflows/smoke-claude.md:
engine:
id: claude
- max-turns: 15
+ max-turns: 8
No baseline savings, but meaningful cost-runaway protection.
Cache Analysis (Anthropic-Specific)
Aggregated across 5 representative runs (31 total requests):
| Turn (typical) |
Model |
Input |
Output |
Cache Read |
Cache Write |
Net New |
| 1 |
Haiku |
~400 |
~50 |
0 |
0 |
~450 |
| 2 |
Sonnet |
~6 |
~1,300 |
0 |
~46,400 |
~47,700 |
| 3 |
Haiku |
~400 |
~50 |
0 |
0 |
~450 |
| 4 |
Sonnet |
~6 |
~1,300 |
~46,400 |
~small |
~1,306 |
| 5 |
Sonnet |
~6 |
~2,000 |
~46,400 |
~small |
~2,006 |
| 6 |
Sonnet |
~6 |
~935 |
~46,400 |
~small |
~941 |
(Turns estimated from report aggregates: 21 Sonnet, 10 Haiku across 5 runs; direct Sonnet input was only 31 tokens total — essentially all context comes from cache reads)
Cache write amortization: Turn 2 writes the full system prompt + tool schemas (~46K tokens). Turns 4–6 each read the same ~46K block (3× reuse per session). This is healthy reuse — write cost is justified.
Cache write cost vs benefit:
- Write: 46,400 tokens × $3.75/M = $0.174/run
- Reads: 154,472 tokens × $0.30/M = $0.046/run
- Without caching, Turns 4–6 would each pay full Sonnet input price ($3/M) for ~46K tokens → $0.416 extra. Caching saves ~$0.37/run, so the $0.174 write cost is well justified.
Haiku zero-cache observation: 10 Haiku calls across 5 runs all have cache_read_tokens = 0. These are small framework routing calls (~400 tokens input, 532ms avg). Haiku is too cheap and the calls too small for caching to be worthwhile here — no action needed.
Expected Impact
| Metric |
Current |
Projected |
Savings |
| Cache write tokens/run |
~46,400 |
~42,490 |
~3,910 (−8.4%) |
| Cache read tokens/run |
~154,472 |
~141,600 |
~12,872 (−8.3%) |
| Cost/run |
~$0.240 |
~$0.221 |
~$0.019 (−7.9%) |
| Monthly cost (630 runs) |
~$151 |
~$139 |
~$12/month |
| LLM turns |
6 |
6 |
0 |
| Max runaway turns |
15 |
8 |
−7 |
Projections assume proportional reduction in cache write and read tokens from smaller system prompt.
Implementation Checklist
Generated by Daily Claude Token Optimization Advisor · ◷
Target Workflow:
smoke-claudeSource report: #1606
Estimated cost per run: ~$0.24 (range: $0.18–$0.30)
Total tokens per run: ~203K
Cache read rate: 76.2% (excellent)
Cache write rate: 22.9% ← dominant cost at 73% of spend
LLM turns: ~6 (4 × Sonnet + 2 × Haiku routing)
Run frequency: 21 runs over the last 24 hours (every PR + schedule)
Current Configuration
cache-memory,github,playwright,edit,bash)github,playwright,bash,safe-outputs)defaults,github,playwrightshared/mcp-pagination.md(3,225 bytes, ~810 tokens)max-turnsedit:is loaded but never called — the file-creation test usesbash echo, not the edit tool.cache-memory:is loaded but not needed — cache memory is designed for multi-session persistent memory in long-running agents. A 6-turn smoke test does not benefit from it.Recommendations
1. Remove
edit:tool — not used by the smoke testThe smoke test creates a file using
bash echo. Theedit:tool is never called.Estimated savings: ~600 tokens/run from cache writes (~1.3%)
Change in
.github/workflows/smoke-claude.md:tools: cache-memory: true github: toolsets: [repos, pull_requests] playwright: allowed_domains: - github.com - edit: bash: - "*"Cost savings: ~$0.00225/run × 630 runs/month ≈ $1.42/month
2. Remove
cache-memory: true— irrelevant for a short smoke testCache memory persists learned facts across agent sessions. A 6-turn smoke test that runs then exits has no use for cross-session memory. Removing it eliminates:
cache_memory_prompt.mdframework injection from the system prompt (~2,000–3,000 tokens)Estimated savings: ~2,500 tokens/run from cache writes (~5.4%)
Change in
.github/workflows/smoke-claude.md:tools: - cache-memory: true github: toolsets: [repos, pull_requests]Cost savings: ~$0.009/run × 630 runs/month ≈ $5.91/month
3. Remove
imports: shared/mcp-pagination.md(or inline a one-liner)mcp-pagination.mdis 3,225 bytes (~810 tokens) of detailed pagination guidance: retry loops, common perPage values, error message examples, etc. This guidance targets workflows that fetch large result sets (75K-token PR diffs, full code searches). The smoke test only callslist_pull_requeststo get 2 recent merged PRs — no pagination risk.Option A (recommended): Remove the import entirely. Add a single inline note to the prompt:
Change in
.github/workflows/smoke-claude.md:And in the prompt body:
Option B (conservative): Keep the import but accept the overhead.
Estimated savings (Option A): ~810 tokens/run (~1.8%)
Cost savings: ~$0.003/run × 630 runs/month ≈ $1.91/month
4. Reduce
max-turns: 15to8Actual runs consistently use 6 turns. A ceiling of 8 gives a 33% safety buffer while preventing cost runaway if the agent loops unexpectedly.
At 15 turns instead of 8, a runaway session wastes 7 extra sonnet turns × ~39K cache reads × $0.30/M = $0.082 extra per runaway run.
Change in
.github/workflows/smoke-claude.md:No baseline savings, but meaningful cost-runaway protection.
Cache Analysis (Anthropic-Specific)
Aggregated across 5 representative runs (31 total requests):
(Turns estimated from report aggregates: 21 Sonnet, 10 Haiku across 5 runs; direct Sonnet input was only 31 tokens total — essentially all context comes from cache reads)
Cache write amortization: Turn 2 writes the full system prompt + tool schemas (~46K tokens). Turns 4–6 each read the same ~46K block (3× reuse per session). This is healthy reuse — write cost is justified.
Cache write cost vs benefit:
Haiku zero-cache observation: 10 Haiku calls across 5 runs all have
cache_read_tokens = 0. These are small framework routing calls (~400 tokens input, 532ms avg). Haiku is too cheap and the calls too small for caching to be worthwhile here — no action needed.Expected Impact
Projections assume proportional reduction in cache write and read tokens from smaller system prompt.
Implementation Checklist
edit:fromtools:sectioncache-memory: truefromtools:sectionimports: - shared/mcp-pagination.mdand add inline> Use perPage: 2 when listing PRs.max-turns: 15tomax-turns: 8gh aw compile .github/workflows/smoke-claude.mdnpx tsx scripts/ci/postprocess-smoke-workflows.tstoken-usage.jsonlfromagent-artifactsto confirm cache write reduction