token-optimizer/README.md at main · alexgreensh/token-optimizer

Your AI is getting dumber and you can't see it.

Find the ghost tokens. Survive compaction. Track the quality decay.

Opus 4.6 drops from 93% to 76% accuracy across a 1M context window. Compaction loses 60-70% of your conversation. Ghost tokens burn through your plan limits on every single message. Token Optimizer tracks the degradation, cuts the waste, checkpoints your decisions before compaction fires, and tells you what to fix.

Install (3 lines)

# Plugin (recommended, auto-updates)
/plugin marketplace add alexgreensh/token-optimizer
/plugin install token-optimizer@alexgreensh-token-optimizer

Or script installer:

curl -fsSL https://raw.githubusercontent.com/alexgreensh/token-optimizer/main/install.sh | bash

Then in Claude Code: /token-optimizer

Why install this first?

Every Claude Code session starts with invisible overhead: system prompt, tool definitions, skills, MCP servers, CLAUDE.md, MEMORY.md. A typical power user burns 50-70K tokens before typing a word.

At 200K context, that's 25-35% gone. At 1M, it's "only" 5-7%, but the problems compound:

Quality degrades as context fills. MRCR drops from 93% to 76% across 256K to 1M. Your AI gets measurably dumber with every message.
You hit rate limits faster. Ghost tokens count toward your plan's usage caps on every message, cached or not. 50K overhead × 100 messages = 5M tokens burned on nothing.
Compaction is catastrophic. 60-70% of your conversation gone per compaction. After 2-3 compactions: 88-95% cumulative loss. And each compaction means re-sending all that overhead again.
Higher effort = faster burn. More thinking tokens per response means you hit compaction sooner, which means more total tokens consumed across the session.

Token Optimizer tracks all of this. Quality score, degradation bands, compaction loss, drift detection. Zero context tokens consumed (runs as external Python).

"But doesn't removing tokens hurt the model?" No. Token Optimizer removes structural waste (duplicate configs, unused skill frontmatter, bloated files), not useful context. It also actively measures quality: the 7-signal quality score tells you if your session is degrading, and Smart Compaction checkpoints your decisions before auto-compact fires. Most users see quality scores improve after optimization because the model has more room for real work.

NEW in v2.4: Degradation Intelligence

Command	What You Get
`quick`	"Am I in trouble?" 10-second answer: context health, degradation risk, biggest token offenders, which model to use.
`doctor`	"Is everything installed correctly?" Score out of 10. Broken hooks, missing components, exact fix commands.
`drift`	"Has my setup grown?" Side-by-side comparison vs your last snapshot. Catches config creep before it costs you.
`quality`	"How healthy is this session?" 7-signal analysis of your live conversation. Stale reads, wasted tokens, compaction damage.
`report`	"Where are my tokens going?" Full per-component breakdown. Every skill, every MCP server, every config file.
`/token-optimizer`	"Fix it for me." Interactive audit with 6 parallel agents. Guided fixes with diffs and backups.

Quality Scoring (7 signals)

Signal	Weight	What It Means For You
Context fill	20%	How close are you to the degradation cliff? Based on published MRCR benchmarks.
Stale reads	20%	Files you read earlier have changed. Your AI is working with outdated info.
Bloated results	20%	Tool outputs that were never used. Wasting context on noise.
Compaction depth	15%	Each compaction loses 60-70% of your conversation. After 2: 88% gone.
Duplicates	10%	The same system reminders injected over and over. Pure waste.
Decision density	8%	Are you having a real conversation or is it mostly overhead?
Agent efficiency	7%	Are your subagents pulling their weight or just burning tokens?

Degradation bands in the status bar:

Green (<50% fill): peak quality zone
Yellow (50-70%): degradation starting
Orange (70-80%): quality dropping
Red (80%+): severe, consider /clear

What Degradation Actually Looks Like

This is a real session. 708 messages, 2 compactions, 88% of the original context gone. Without the quality score, you'd have no idea.

Smart Compaction

Auto-compaction is lossy. Smart Compaction checkpoints decisions, errors, and agent state before it fires, then restores what the summary dropped.

python3 $MEASURE_PY setup-smart-compact    # checkpoint + restore hooks
python3 $MEASURE_PY setup-quality-bar      # live quality score in status bar

How It Compares

Capability	Token Optimizer	`/context` (built-in)	context-mode
Startup overhead audit	Deep (per-component)	Summary (v2.1.74+)	No
Quality degradation tracking	MRCR-based bands	Basic capacity %	No
Guided remediation	Yes, with token estimates	Basic suggestions	No
Runtime output containment	No	No	Yes (98% reduction)
Smart compaction survival	Checkpoint + restore	No	Session guide
Model recommendation	Yes (Sonnet vs Opus by context)	No	No
Usage trends + dashboard	SQLite + interactive HTML	No	Session stats
Compaction loss tracking	Yes (cumulative % lost)	No	Partial
Multi-platform	Claude Code (planned expansion)	Claude Code	6 platforms
Context tokens consumed	0 (Python script)	~200 tokens	MCP overhead

/context shows capacity. Token Optimizer fixes the causes. context-mode prevents runtime floods. Token Optimizer prevents structural waste.

The Problem

Every message you send to Claude Code re-sends everything: system prompt, tool definitions, MCP servers, skills, commands, CLAUDE.md, MEMORY.md, and system reminders. The API is stateless. These are the ghost tokens: invisible overhead that eats your context window before you type a word.

Prompt caching makes this cheaper (90% cost reduction on cached tokens). But cheaper doesn't mean free, and it doesn't mean small. Those tokens still fill your context window, still count toward your plan's rate limits on every message, and still degrade output quality. On Claude Max or Pro, ghost tokens eat into the same usage caps you need for actual work.

The more you've customized Claude Code, the worse it gets. And at 1M, the real problem isn't startup overhead, it's the compounding cost: degradation as the window fills, plus rate limit burn from overhead you never see.

Where it all goes

Fixed overhead (everyone pays): System prompt (~3K tokens) plus built-in tool definitions (12-17K tokens). About 8-10% at 200K, or 1.5-2% at 1M.

Autocompact buffer: ~30-35K tokens (~16%) reserved for compaction headroom.

MCP tools: The biggest variable. Anthropic's team measured 134K tokens consumed by tool definitions before optimization. Tool Search reduced this by 85%, but servers still add up.

Your config stack (what this tool optimizes): CLAUDE.md that's grown organically. MEMORY.md that duplicates half of it. 50+ skills you installed and forgot. Commands you never use. @imports. .claude/rules/. No permissions.deny rules.

What This Does

One command. Six parallel agents audit your entire setup. Prioritized fixes with exact token savings. Everything backed up before any change.

You see diffs. You approve each fix. Nothing irreversible.

What it audits

Area	What It Catches
CLAUDE.md	Content that should be skills or reference files. Duplication with MEMORY.md. `@imports`. Poor cache structure.
MEMORY.md	Overlap with CLAUDE.md. Verbose entries. Content past the 200-line auto-load cap.
Skills	Unused skills loading frontmatter (~100 tokens each). Duplicates. Wrong directory.
MCP Servers	Broken/unused servers. Duplicate tools. Missing Tool Search.
Commands	Rarely-used commands (~50 tokens each).
Rules & Advanced	`.claude/rules/` overhead. Missing `permissions.deny`. No hooks.

The fix: progressive disclosure

Where	Token Cost	What Goes Here
CLAUDE.md	Every message (~800 token target)	Identity, critical rules, key paths
Skills & references	~100 tokens in menu, full when invoked	Workflows, configs, standards
Project files	Zero until read	Guides, templates, documentation

Interactive Dashboard

After the audit, you get an interactive HTML dashboard.

Every component is clickable. Expand any item to see why it matters, what the trade-offs are, and what changes. Toggle the fixes you want, and copy a ready-to-paste optimization prompt.

Persistent Dashboard

The dashboard auto-regenerates after every session (via the SessionEnd hook).

python3 $MEASURE_PY setup-daemon     # Bookmarkable URL at http://localhost:24842/
python3 $MEASURE_PY dashboard --serve # One-time serve over HTTP

Enable Session Tracking

python3 $MEASURE_PY setup-hook --dry-run   # preview
python3 $MEASURE_PY setup-hook             # install

Adds a SessionEnd hook that collects usage stats after each session (~2 seconds, all data local).

Usage Analytics: See What's Actually Being Used

Trends: Which skills do you actually invoke vs just having installed? Which models are you using? How has your overhead changed over time?

Session Health: Catches stale sessions (24h+), zombie sessions (48h+), and outdated configurations before they cause problems.

python3 $MEASURE_PY trends              # usage patterns over time
python3 $MEASURE_PY health              # session hygiene check

Coach Mode: Not Sure Where to Start?

> /token-coach

Tell it your goal. Get back specific, prioritized fixes with exact token savings. Detects 8 named anti-patterns (The Kitchen Sink, The Hoarder, The Monolith...) and recommends multi-agent design patterns that actually save context.

v2.0+: Active Session Intelligence

Smart Compaction: Don't Lose Your Work

When auto-compact fires, 60-70% of your conversation vanishes. Decisions, error-fix sequences, agent state: gone. Smart Compaction saves all of it as checkpoints before compaction, then restores what the summary dropped.

python3 $MEASURE_PY setup-smart-compact    # one-time install

Live Quality Bar: Know Before It's Too Late

A glance at your terminal tells you if you're in trouble. Colors shift from green to red as quality degrades.

python3 $MEASURE_PY setup-quality-bar      # one-time install

Session Continuity: Pick Up Where You Left Off

Sessions auto-checkpoint on end, /clear, and crashes. Start a new session on the same topic and it injects the relevant context automatically.

All Commands

Standalone Python script. No dependencies. Python 3.8+. Zero context tokens consumed.

python3 $MEASURE_PY quick                # Am I in trouble? (start here)
python3 $MEASURE_PY doctor               # Is everything installed right?
python3 $MEASURE_PY drift                # Has my setup grown since last check?
python3 $MEASURE_PY quality current      # How healthy is this session?
python3 $MEASURE_PY report               # Where are my tokens going?
python3 $MEASURE_PY dashboard            # Visual dashboard (HTML)
python3 $MEASURE_PY trends               # What's actually being used?
python3 $MEASURE_PY collect              # Build usage database

What's Inside

skills/token-optimizer/
  SKILL.md                             Orchestrator (phases 0-5 + v2.0 actions)
  assets/
    dashboard.html                     Interactive dashboard
    logo.svg                           Animated ASCII logo
    hero-terminal.svg                  Terminal demo
  references/
    agent-prompts.md                   8 agent prompt templates
    implementation-playbook.md         Fix implementation details
    optimization-checklist.md          32 optimization techniques
    token-flow-architecture.md         How Claude Code loads tokens
  examples/
    claude-md-optimized.md             Optimized CLAUDE.md template
    permissions-deny-template.json     permissions.deny starter
    hooks-starter.json                 Hook configuration
  scripts/
    measure.py                         Core engine (audit, quality, smart compact, trends, health, quick, doctor, drift)
    statusline.js                      Status line (degradation-aware colors)
skills/token-coach/
  SKILL.md                             Coaching orchestrator
install.sh                             One-command installer

License

AGPL-3.0. See LICENSE.

Created by Alex Greenshpun.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Your AI is getting dumber and you can't see it.

Install (3 lines)

Why install this first?

NEW in v2.4: Degradation Intelligence

Quality Scoring (7 signals)

What Degradation Actually Looks Like

Smart Compaction

How It Compares

The Problem

Where it all goes

What This Does

What it audits

The fix: progressive disclosure

Interactive Dashboard

Persistent Dashboard

Enable Session Tracking

Usage Analytics: See What's Actually Being Used

Coach Mode: Not Sure Where to Start?

v2.0+: Active Session Intelligence

Smart Compaction: Don't Lose Your Work

Live Quality Bar: Know Before It's Too Late

Session Continuity: Pick Up Where You Left Off

All Commands

What's Inside

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Your AI is getting dumber and you can't see it.

Install (3 lines)

Why install this first?

NEW in v2.4: Degradation Intelligence

Quality Scoring (7 signals)

What Degradation Actually Looks Like

Smart Compaction

How It Compares

The Problem

Where it all goes

What This Does

What it audits

The fix: progressive disclosure

Interactive Dashboard

Persistent Dashboard

Enable Session Tracking

Usage Analytics: See What's Actually Being Used

Coach Mode: Not Sure Where to Start?

v2.0+: Active Session Intelligence

Smart Compaction: Don't Lose Your Work

Live Quality Bar: Know Before It's Too Late

Session Continuity: Pick Up Where You Left Off

All Commands

What's Inside

License