Audit your setup. Protect your sessions. Measure what matters.
Find the ghost tokens. Score your context quality. Survive compaction.
v1.x found the ghost tokens. v2.0 protects your sessions from the inside.
| Capability | What It Does |
|---|---|
| Smart Compaction | Checkpoints decisions, errors, and agent state before auto-compact fires. Restores what the summary dropped. Your "why" survives. |
| Context Quality Scoring | Six-signal analysis (stale reads, bloated results, duplicates, compaction depth, decision density, agent efficiency). Tells you when to /compact, not just how full you are. Shows live in your status bar. |
| Session Continuity | Automatic checkpoints on session end, /clear, and crashes. New sessions pick up where you left off via keyword-matched context injection. |
All three work together: quality score triggers compaction advice, smart compact captures state before it fires, continuity restores it in your next session. Zero external dependencies. Plain markdown checkpoints. Setup in one command:
python3 $MEASURE_PY setup-smart-compact # checkpoint + restore hooks
python3 $MEASURE_PY setup-quality-bar # live quality score in your status barIn Claude Code:
/plugin marketplace add alexgreensh/token-optimizer
/plugin install token-optimizer@alexgreensh-token-optimizer
Auto-updates when you restart Claude Code. Requires Claude Code 1.0.33+.
curl -fsSL https://raw.githubusercontent.com/alexgreensh/token-optimizer/main/install.sh | bashUpdates: cd ~/.claude/token-optimizer && git pull.
git clone https://github.com/alexgreensh/token-optimizer.git ~/.claude/token-optimizer
ln -sf ~/.claude/token-optimizer/skills/token-optimizer ~/.claude/skills/token-optimizerIf you installed via the script and want to switch to the plugin (for auto-updates):
# Remove skill (handles both symlink and directory)
if [ -L ~/.claude/skills/token-optimizer ]; then
rm -f ~/.claude/skills/token-optimizer
else
rm -rf ~/.claude/skills/token-optimizer
fi
rm -rf ~/.claude/token-optimizer # remove clone (optional)Then install the plugin using the commands above.
Then start Claude Code and run:
/token-optimizer
The optimizer can track your usage over time: which skills you use, how context fills up, model costs. This powers the Trends and Health tabs in your dashboard.
python3 $MEASURE_PY setup-hook --dry-run # preview the change
python3 $MEASURE_PY setup-hook # install itThis adds a SessionEnd hook that silently collects usage stats after each session (~2 seconds, all data local). The dashboard auto-refreshes with your latest data.
Already ran /token-optimizer and skipped this step? Just run the command above. Remove anytime: python3 $MEASURE_PY setup-hook --uninstall
Every message you send to Claude Code re-sends everything: system prompt, tool definitions, MCP servers, skills, commands, CLAUDE.md, MEMORY.md, and system reminders. The API is stateless. No memory between messages. The full stack, replayed every time. These are the ghost tokens: invisible overhead that eats your context window before you type a word.
Prompt caching makes this cheap (90% cost reduction on cached tokens). But cheap doesn't mean small. Those tokens still fill your context window, count toward rate limits, and degrade output quality past 50-70% fill.
The more you've customized Claude Code, the worse it gets.
Your 200K context window gets eaten from multiple directions:
Fixed overhead (everyone pays, can't change): System prompt (~3K tokens) plus built-in tool definitions (12-17K tokens). About 8-10% of your window, gone before anything else loads. Common misconception: the "system prompt" is often reported as ~3K tokens. But built-in tools load alongside it every message. The real irreducible floor is ~15K, not ~3K. Posts quoting the base prompt alone understate overhead by 5x.
Autocompact buffer: When autocompact is on (the default), Claude Code reserves headroom for compaction. In practice, roughly 30-35K tokens (~16% of your window) sit empty. Run /context on a fresh session to see the exact number.
MCP tools: The biggest variable. Anthropic's own engineering team measured 134K tokens consumed by tool definitions before optimization. Tool Search (activates automatically when MCP tools exceed ~10% of context) reduced this by 85%, but MCP servers still add up: each deferred tool costs ~15 tokens, plus server instructions.
Your config stack (what this tool optimizes): CLAUDE.md that's grown organically. MEMORY.md that duplicates half of it. 50+ skills you installed and forgot. Commands you never use. @imports pulling in files you didn't realize. .claude/rules/ adding up quietly. No permissions.deny rules to exclude files from context.
A real power user's baseline overhead: ~43,000 tokens (22% of the 200K window). Add the autocompact buffer and ~38% is unavailable before you type a single word.
Every subagent you spawn gets its own 200K window and loads the same full stack. Five parallel agents means five copies of that overhead, each starting ~30% full before doing any work.
One command. Six parallel agents audit your entire setup. You get a prioritized list of exactly what's eating your context and how to fix it.
> /token-optimizer
[Token Optimizer] Backing up config...
Dispatching 6 audit agents...
YOUR SETUP
Per-message overhead: ~43,000 tokens
Context used: 38% before your first message
QUICK WINS
Slim CLAUDE.md + MEMORY.md: -5,200 tokens/msg
Archive unused skills + commands: -4,800 tokens/msg
Prune MCP + add file exclusion: -5,000 tokens/msg
Total: ~15,000 tokens/msg recovered
Ready to implement? Everything backed up first.
Everything gets backed up before any change. You see diffs. You approve each fix. Nothing irreversible.
| Area | What It Catches |
|---|---|
| CLAUDE.md | Content that should be skills or reference files. Duplication with MEMORY.md. @imports pulling in more than you realize. Poor cache structure. |
| MEMORY.md | Overlap with CLAUDE.md. Verbose entries. Content past the 200-line auto-load cap. |
| Skills | Unused skills still loading frontmatter (~100 tokens each). Duplicates. Archived skills in the wrong directory still loading. |
| MCP Servers | Broken/unused servers. Duplicate tools across servers and plugins. Missing Tool Search. |
| Commands | Rarely-used commands inflating the menu (~50 tokens each). |
| Rules & Advanced | .claude/rules/ overhead. Missing permissions.deny rules. No hooks. No monitoring. |
Not everything needs to load every message. The optimizer moves content to where it costs the least:
| Where | Token Cost | What Goes Here |
|---|---|---|
| CLAUDE.md | Every message (~800 token target) | Identity, critical rules, key paths |
| Skills & references | ~100 tokens in menu, full content only when invoked | Workflows, configs, detailed standards |
| Project files | Zero until explicitly read | Guides, templates, documentation |
A bloated CLAUDE.md doesn't need deleting. Coding standards move to a reference file. A deployment workflow becomes a skill. Same functionality, fraction of the per-message cost.
Results depend on your setup. Heavier setups save more.
Config cleanup (what the tool directly changes):
| Starting Point | Typical Recovery |
|---|---|
| Power user (50+ skills, 3+ MCP servers, bloated config) | 5-15% of context window |
| No Tool Search (disabled or not triggered) | 134K → ~8.7K tokens (85% reduction in MCP overhead) |
| Lighter setup (few skills, 1 MCP server) | 3-8% |
Advanced option: Disabling autocompact and managing /compact manually recovers an additional ~16% of your window. The optimizer explains the tradeoff and helps you decide.
Behavioral savings (free, compound across every session):
| Habit | Why It Matters |
|---|---|
/compact at 50-70% instead of waiting for auto-compact |
Better output quality, fewer hallucinations |
| Haiku for data-gathering agents | 5x cheaper than Opus for file reads and counting |
/clear between unrelated topics |
Fresh context, no stale information dragging quality down |
| Batch requests into one message | Each message re-sends your full config stack |
| Plan mode for complex tasks | Prevents expensive re-work from wrong initial direction |
After the audit, you get an interactive HTML dashboard that breaks down exactly where your tokens go and what you can do about it.
Every component is clickable. Expand any item to see why it matters, what the trade-offs are, and what changes. Toggle the fixes you want, and copy a ready-to-paste optimization prompt.
The dashboard auto-regenerates after every session (via the SessionEnd hook). It shows Trends and Health tabs with your latest usage data. The full audit dashboard (with optimization recommendations) requires running /token-optimizer.
Bookmarkable URL (macOS):
python3 $MEASURE_PY setup-daemon
# => http://localhost:24842/This installs a tiny background server (~2MB memory) that starts when you log in. Bookmark the URL and check your dashboard anytime. It always shows the latest data because the SessionEnd hook regenerates the file after every session.
Only accessible from your machine (localhost). Uninstall: python3 $MEASURE_PY setup-daemon --uninstall
Other options:
# Open the file directly (no server needed)
open ~/.claude/_backups/token-optimizer/dashboard.html
# Serve over HTTP (headless/remote machines)
python3 $MEASURE_PY dashboard --serve
# Remote access via SSH tunnel
ssh -L 8080:localhost:8080 your-server
# Then open http://localhost:8080/dashboard.html locallyTo regenerate manually: python3 $MEASURE_PY dashboard.
Full audit dashboard (after running /token-optimizer):
python3 $MEASURE_PY dashboard --coord-path PATH --serve| Phase | What Happens |
|---|---|
| Initialize | Backs up your config, takes a "before" snapshot |
| Audit | 6 parallel agents scan everything (sonnet for judgment, haiku for counting) |
| Analyze | Synthesis agent (opus) prioritizes fixes by impact |
| Implement | You choose what to fix. Diffs and approval before every change |
| Verify | Re-measures everything, shows before/after with exact savings |
Right model for each job. Session folder pattern keeps agent output from flooding your context.
Prompt caching cuts cost by 90%. But it doesn't shrink your context window.
- You hit compaction sooner. Compaction is lossy. Every cycle throws away context.
- Rate limits burn faster. Cache reads still count toward your subscription quota.
- Quality degrades. Performance drops as context fills, especially past 70%.
- Agents multiply it. Every subagent loads its own copy of your full config stack. Dispatch 5 agents and that overhead loads 5 times, each in a fresh 200K window. Agent teams use ~7x more tokens in plan mode than standard sessions. Reducing per-agent overhead from 43K to 28K saves 75K tokens across those 5 agents.
Standalone script. No dependencies. Python 3.8+.
The path depends on how you installed. Set it once:
# Auto-detect (works for both plugin and script/manual installs):
MEASURE_PY=""
for f in ~/.claude/skills/token-optimizer/scripts/measure.py \
~/.claude/plugins/cache/*/token-optimizer/*/skills/token-optimizer/scripts/measure.py; do
[ -f "$f" ] && MEASURE_PY="$f" && break
done
[ -z "$MEASURE_PY" ] && { echo "measure.py not found. Is Token Optimizer installed?"; exit 1; }python3 $MEASURE_PY report
# Save snapshots for before/after comparison
python3 $MEASURE_PY snapshot before
# ... make changes ...
python3 $MEASURE_PY snapshot after
python3 $MEASURE_PY compareThe optimizer doesn't just audit your config once. It tracks how you actually use Claude Code over time, so you can spot patterns, catch waste, and make informed decisions about what to keep and what to archive.
Two commands power this: trends for usage patterns and health for session hygiene. Both work from the CLI and appear as interactive tabs in the persistent dashboard (auto-refreshed after every session) and in the full audit dashboard.
See Enable Session Tracking above for quick setup.
Add a SessionEnd hook and usage data collects itself. The setup command auto-detects measure.py's path regardless of install method:
python3 $MEASURE_PY setup-hook --dry-run # preview the change
python3 $MEASURE_PY setup-hook # install itOr add manually to ~/.claude/settings.json (adjust the path to match your install):
{
"hooks": {
"SessionEnd": [{
"hooks": [{
"type": "command",
"command": "python3 /path/to/measure.py collect --quiet && python3 /path/to/measure.py dashboard --quiet"
}]
}]
}
}Every session end: collects the JSONL log into a local SQLite database (~/.claude/_backups/token-optimizer/trends.db), then regenerates the persistent dashboard. No external services. No API calls. Your data stays on your machine.
You can also collect manually or backfill older sessions:
# Collect last 90 days of sessions (default)
python3 $MEASURE_PY collect
# Backfill a longer history
python3 $MEASURE_PY collect --days 180Collection is idempotent. Running it twice on the same sessions won't double-count anything.
python3 $MEASURE_PY trends
python3 $MEASURE_PY trends --days 7
python3 $MEASURE_PY trends --jsonScans your session history and shows:
Skills usage: Which skills you actually invoke vs. which sit idle loading frontmatter every session. This is the most actionable insight. If you have 59 skills installed but only use 8 in the last 30 days, that's 51 skills costing ~100 tokens each, every session, for nothing.
Model mix: Your opus/sonnet/haiku split across all sessions. If you see 90% opus, you're probably overspending on data-gathering agents that would work fine on haiku.
Daily breakdown: Per-day session count, token volume, and which skills were used. In the dashboard, each day expands to show individual sessions with duration, message count, cache hit rate, and skills used.
USAGE TRENDS (last 30 days)
Sessions: 70 | Avg duration: 340 min
SKILLS
Used (8 of 59 installed):
morning .................. 28 sessions
evening-auto ............. 25 sessions
recall ................... 12 sessions
Never used (last 30 days):
api-docs, condition-based-waiting, ...
(51 skills, ~5,100 tokens overhead)
MODEL MIX
sonnet ████████████████████░░░░░ 63% 3.4M tokens
opus ████████████░░░░░░░░░░░░░ 22% 1.2M tokens
haiku ███████░░░░░░░░░░░░░░░░░░ 15% 800K tokens
In the dashboard, every skill listed in trends is clickable. Click a skill name and it expands to show:
- Description: What the skill does (from SKILL.md frontmatter)
- Frontmatter tokens: How much it costs per session just sitting in the menu
- File structure: What files the skill contains (SKILL.md, references/, scripts/, etc.)
Never-used skills link directly to the Quick Wins tab so you can archive them in one step.
python3 $MEASURE_PY healthDetects running Claude Code processes and flags problems:
- Stale sessions (24h+): Still running but probably forgotten. Long sessions accumulate context bloat.
- Zombie sessions (48h+): Almost certainly orphaned. Safe to kill.
- Outdated versions: Running an older Claude Code version than what's installed. Restart to get fixes.
- Automated processes: Lists any launchd/cron jobs running Claude.
SESSION HEALTH CHECK
Installed version: 2.1.63
RUNNING SESSIONS (2)
PID 521 (2d 8h ago) v2.1.62 OUTDATED ZOMBIE
PID 91719 (1d 2h ago) v2.1.63 STALE
RECOMMENDATIONS
- 1 session running older version. Restart to get latest fixes.
- 2 sessions running 24+ hours. Check if still needed.
Trends and health appear as dedicated tabs in both the persistent dashboard (measure.py dashboard) and the full audit dashboard (measure.py dashboard --coord-path PATH). The Trends tab includes:
- Date range selector (7/14/30 days + calendar date picker)
- Interactive daily breakdown table (click a day to expand individual sessions)
- Skills usage bars with clickable detail panels
- Model mix visualization with cost-saving context
The persistent dashboard defaults to the Trends tab and hides empty audit sections. The full audit dashboard shows all tabs. The right panel collapses on analytics tabs since they're informational, giving the data more room.
The audit tells you what's wrong with your current setup. Coach Mode tells you how to build things right from the start.
> /token-coach
One question: "What's your goal today?"
- Building something new: Architecture guidance for skills, MCP servers, CLAUDE.md structure
- Existing setup feels slow: Pattern detection with named anti-patterns and fix priorities
- Designing a multi-agent system: Agent type selection, model routing, coordination patterns, cost math
- Quick health check: Token Health Score (0-100) and top 3 actions
Coach Mode is conversational, not a wall of text. It leads with your actual numbers, names the patterns it detects ("You've got the 50-Skill Trap going on"), asks follow-up questions, and generates a prioritized action plan with estimated token savings.
Every setup gets a 0-100 score based on detected patterns:
python3 $MEASURE_PY coach
Token Health Score: 78/100
Startup overhead: 18,200 tokens (9.1% of 200K)
Usable context: ~148,800 tokens
Issues detected:
[!!] Heavy CLAUDE.md: 1,450 tokens (target: <800)
[!] Verbose Skill Descriptions: 5 skills over 200 chars
Good practices:
[OK] Reasonable Skill Count: 23 skills (2,300 tokens)
[OK] SessionEnd Hook Installed: Usage tracking activeThe score and pattern analysis also appear as a dedicated Coach tab in the dashboard.
The coaching knowledge base covers config optimization and agentic architecture:
8 named anti-patterns: The 50-Skill Trap, The Opus Addiction, The CLAUDE.md Novel, The Import Avalanche, The MCP Sprawl, The Stale Memory, The Singleton Session, The Unscoped Rules. Each with symptoms, fix, and estimated savings.
Multi-agent design patterns: Subagent cost model (each inherits your full config stack), coordination folder pattern, model routing table (Haiku for data-gathering, Sonnet for analysis, Opus for reasoning), built-in agent type selection (Explore vs Plan vs General-purpose), skill assignment costs (static, not progressive in subagents).
Hard numbers: Baseline overhead breakdown, MCP tool costs (GitHub 26K eager vs 525 deferred), context quality degradation bands, environment variable reference, community benchmarks.
python3 $MEASURE_PY coach --json # Full JSON output
python3 $MEASURE_PY coach --focus skills # Focus on skill patterns
python3 $MEASURE_PY coach --focus agentic # Focus on multi-agent patternsv1.x audits your setup. v2.0 protects your sessions.
Auto-compaction fires when context gets tight, but it's lossy. It drops the "why" behind decisions, error sequences, and agent state. Smart Compaction adds structured checkpoints before compaction and restores what was lost afterward.
# Install the hook system (PreCompact + SessionStart + Stop + SessionEnd)
python3 $MEASURE_PY setup-smart-compact --dry-run # preview
python3 $MEASURE_PY setup-smart-compact # install
python3 $MEASURE_PY setup-smart-compact --status # check
python3 $MEASURE_PY setup-smart-compact --uninstall # removeWhat gets captured: decisions and reasoning, modified files (beyond Claude's 5-file rehydration), error-fix sequences, open questions, agent dispatch state, and the continuation point. All stored as plain markdown in ~/.claude/token-optimizer/checkpoints/.
Generate project-specific compaction instructions:
python3 $MEASURE_PY compact-instructions
# Add the output to your project .claude/settings.json compactInstructions fieldEvery tool measures how full your context is. This measures how useful the content is.
python3 $MEASURE_PY quality currentContext Quality Report
========================================
Content quality: 74/100 (Good)
Messages analyzed: 156
Decisions captured: 8
Issues found:
23 stale file reads (14,000 tokens est.) files edited since reading
3 bloated results ( 8,000 tokens est.) tool outputs never referenced again
4 duplicate reminders ( 2,000 tokens est.) repeated system-reminder injections
Signal-to-noise:
Decision density: 0.34 (34% substantive)
Agent efficiency: 82%
Recommendation:
/compact would free ~24,000 tokens of low-value content
Smart Compact checkpoint would preserve 8 decisions
Six weighted signals: stale reads (25%), bloated results (25%), duplicates (15%), compaction depth (15%), decision density (10%), agent efficiency (10%). Score ranges from 0-100. Quality data appears in the dashboard Health tab as an interactive gauge.
See your context quality score in the terminal status bar, updated every ~2 minutes:
Opus 4.6 | my-project ████████░░ 43% | Context Quality 74%
Colors tell the story: green (85%+, clean), dim (70-84%, fine), yellow (50-69%, compact soon), red (<50%, clear and restart).
python3 $MEASURE_PY setup-quality-bar --dry-run # preview
python3 $MEASURE_PY setup-quality-bar # install
python3 $MEASURE_PY setup-quality-bar --status # check
python3 $MEASURE_PY setup-quality-bar --uninstall # removeThis installs two things: a status line script that displays the score, and a UserPromptSubmit hook that recalculates it every 2 minutes. If you already have a custom status line, it shows integration instructions instead of replacing yours.
When quality drops below 70%, Claude also gets a warning and will proactively suggest /compact.
When sessions end (normally, via /clear, or crash), state is checkpointed automatically. New sessions in the same project can pick up where you left off:
- Same session (post-compact): Full context recovery injected automatically
- New session (related work): Checkpoint injected if first message has >30% keyword overlap
- New session (unrelated): One-line pointer to available checkpoint
All thresholds configurable via environment variables:
| Variable | Default | What It Controls |
|---|---|---|
TOKEN_OPTIMIZER_CHECKPOINT_TTL |
300 (5 min) | Max age for post-compact restore |
TOKEN_OPTIMIZER_CHECKPOINT_FILES |
10 | Max checkpoint files kept |
TOKEN_OPTIMIZER_CHECKPOINT_RETENTION_DAYS |
7 | Days before old checkpoints are cleaned |
TOKEN_OPTIMIZER_RELEVANCE_THRESHOLD |
0.3 | Keyword overlap for new-session restore |
| Tool | What It Does | Limitation |
|---|---|---|
| Manual audit | Flexible | Takes hours. No measurement. Easy to miss things. |
| ccusage | Monitors spending | Shows cost, not context waste or how to fix it. |
| token-optimizer-mcp | Caches MCP calls | One dimension only. |
| This | Audits, diagnoses, fixes, measures | Requires Claude Code. |
skills/token-optimizer/
SKILL.md Orchestrator (phases 0-5 + v2.0 actions)
assets/
dashboard.html Interactive dashboard (optimization + analytics + quality gauge)
dashboard-overview.png Dashboard screenshot
logo.svg Animated ASCII logo
hero-terminal.svg Terminal demo
before-after.svg Token breakdown comparison
how-it-works.svg 5-phase flow diagram
user-profiles.svg Context usage by setup type
references/
agent-prompts.md 8 agent prompt templates
implementation-playbook.md Fix implementation details (4A-4N)
optimization-checklist.md 32 optimization techniques
token-flow-architecture.md How Claude Code loads tokens
examples/
claude-md-optimized.md Optimized CLAUDE.md template
permissions-deny-template.json permissions.deny starter
hooks-starter.json Hook configuration (v2.0: smart compact + analytics)
scripts/
measure.py Measurement, quality, smart compact, trends, health & collection
statusline.js Status line script (shows context quality live)
skills/token-coach/
SKILL.md Coaching orchestrator (quality-aware)
references/
coaching-scripts.md Conversation flows + quality-driven coaching
coach-patterns.md Anti-patterns and fix patterns
agentic-systems.md Multi-agent architecture coaching
quick-reference.md Hard numbers and baselines
examples/
coaching-session-*.md Few-shot coaching examples
install.sh One-command installer
AGPL-3.0. See LICENSE.
Created by Alex Greenshpun.
