Token Proctor

Pick the right model. Validate the prompt. See the real cost — tokens and turns. Before you spend a premium request.

▶️ Watch the 30-second demo (MP4)

An open-source layer on top of GitHub Copilot (and any token-priced LLM plan) that answers the three questions every team eventually asks:

Is this prompt ready to run? — scores completeness 0–100 and suggests follow-ups when it's too vague.
Which model should run it? — picks the cheapest model that clears the quality bar, with a knob for token-cost vs agent-turn-cost trade-offs.
What will it actually cost? — projects tokens × turns × $ before the call, including % of your monthly plan allowance.

Ships as two surfaces from one core:

VS Code chat participant (@proctor) — primary UX, built on vscode.chat + vscode.lm.
MCP server (token-proctor-mcp) — same core over Model Context Protocol, works with Copilot CLI, Copilot agent mode, Claude Desktop, Cursor, etc.

100% local. No network calls. We don't proxy prompts anywhere — we call vscode.lm (your existing Copilot entitlement) or hand a decision back to the MCP client.

Full design doc: docs/ANALYSIS.md.

Why it exists — one concrete example

You ask Copilot in agent mode: "Refactor this service to use async I/O everywhere." On a 1× premium model (Sonnet 4, GPT-5), that single prompt runs ~20 turns of read-file / apply-edit / run-tests, each one a billed premium request.

Model	Premium multiplier	Turns	Premium-equivalent requests
Claude Sonnet 4	1.0×	20	20
o4-mini	0.33×	20	6.6 (–67%)
gpt-4o-mini	0×	20	0 (but quality drops on large refactors)

Token Proctor shows you this before you hit send and recommends the cheapest model that clears the quality bar for the task. On a 300-request monthly seat, three of those refactors on Sonnet is already 20% of the bucket.

What's new in v0.4

Turn-aware cost projection. The LLM judge predicts how many agent turns a prompt will need (1 for Q&A, 10–30 for code_large/agentic), and cost is inputTokens × outputTokens × turns × model price. No more hiding the cost of 20-turn agent loops behind a single-call estimate.
optimizeFor knob — tokens (default) minimizes $/M; turns minimizes premium × turns (right for agent loops); balanced splits the difference.
Plan-aware allowance %. Set plan.monthlyTokenAllowance in your policy and the summary shows "this prompt ≈ 4.5% of your squad-plan monthly tokens".
Exact tokenization on by default via js-tiktoken (o200k_base).
Copilot agent hand-off. After confirmation, Token Proctor launches a new Copilot Chat turn with the redacted prompt so Copilot's own agent tools (file edits, terminal) drive the work.

Prerequisites

Node.js ≥ 20
VS Code ≥ 1.95
GitHub Copilot extension installed and signed in (Business or Enterprise entitlement recommended for premium models)

Quick start

Install from the VS Code Marketplace (fastest):

code --install-extension token-proctor.token-proctor

Or clone and run from source:

git clone https://github.com/navintkr/token-proctor.git
cd token-proctor
npm install
npm run compile

Try the chat participant

Open this folder in VS Code.
Press F5 → an Extension Development Host window opens.
In the new window, make sure the workspace is the same token-proctor folder (so .token-proctor.json loads).
Open Copilot Chat (Ctrl+Shift+I) and type:

@proctor add caching to fetchUser so repeated calls within 5s return the same result

Sample output:

Task: code_small (confidence 90%)
Completeness: ✅ 72/100 (ready)
Recommended model: gpt-4o-mini
Estimate: ~210 in / ~180 out × 2 turns · ~$0.0005 · base quota · model=gpt-4o-mini · plan=squad 0.01%
<sub>tokens=exact · judge=on · policy=.token-proctor.json</sub>

🧠 llm-judge(gpt-4o-mini): task=code_small conf=0.90 out≈180 turns≈2 — ...

---
Recommended model: gpt-4o-mini

🚀 Accept & hand off to Copilot — switches Copilot's chat model to the recommendation
and lets the Copilot agent drive the change with its own tools (edits, terminal, etc.).

Click 🚀 Accept & hand off to Copilot to have Copilot's default agent take over with file/terminal tools. Token Proctor tries to flip the chat model dropdown automatically (a handful of best-effort command ids); if your Copilot build doesn't expose any of them, you'll see a toast asking you to pick the model manually.

Slash commands

Command	What it does
`@proctor /route <prompt>`	Default. Classify → validate → route → project cost.
`@proctor /validate <prompt>`	Completeness report with weighted dimensions + follow-ups.
`@proctor /cost <prompt>`	Per-turn and total tokens, turns, USD, plan burn %.
`@proctor /explain <prompt>`	Everything + the full candidate model matrix.
`@proctor /confirm <prompt>`	Confirm the last recommendation and forward.
`@proctor /cancel`	Abandon a pending confirmation.

Try the MCP server

npm run compile
node ./out/mcp-server.js

Register it with your MCP client. Example (.vscode/mcp.json):

{
  "servers": {
    "token-proctor": {
      "type": "stdio",
      "command": "node",
      "args": ["${workspaceFolder}/out/mcp-server.js"]
    }
  }
}

For Copilot CLI / Claude Desktop:

{
  "mcpServers": {
    "token-proctor": {
      "command": "npx",
      "args": ["-y", "token-proctor-mcp"]
    }
  }
}

Tools exposed:

Tool	Purpose
`analyze_prompt`	Full pipeline — redact, classify, validate, route, project cost.
`validate_prompt`	Completeness score + follow-up questions.
`recommend_model`	Task classification + best model + alternatives.
`estimate_cost`	Tokens + turns + USD against an auto-routed or named model.
`list_models`	The model catalog (prices, premium multipliers).
`redact_text`	Redact secrets using built-in + policy patterns.
`get_policy`	Return the loaded `.token-proctor.json` policy and its source path.

Configuration

VS Code settings

{
  "tokenProctor.completenessThreshold": 60,
  "tokenProctor.autoForward": true,
  "tokenProctor.requireConfirmation": true,
  "tokenProctor.handoffToCopilot": true,
  "tokenProctor.handoffParticipant": "github.copilot",
  "tokenProctor.preferCheap": false,
  "tokenProctor.optimizeFor": "tokens",      // "tokens" | "turns" | "balanced"
  "tokenProctor.exactTokenCounts": true,
  "tokenProctor.llmJudge.enabled": true,
  "tokenProctor.llmJudge.confidenceThreshold": 0.85
}

`optimizeFor` — the key routing knob

Mode	Weights	Best for
`tokens` (default)	prioritize $/M token price	One-shot Q&A, docs, creative
`turns`	prioritize low `premium × turns` burn	Agent loops (`code_large`, `agentic`)
`balanced`	weighted compromise	Mixed workloads

Why it matters: Claude Sonnet has a 1× premium multiplier. On an agentic prompt predicted to run 20 turns, that's 20 premium requests. An o4-mini at 0.33× would burn ~6.6 — about 70% less of your monthly bucket. optimizeFor: turns surfaces that trade-off.

Policy file — `.token-proctor.json`

Drop at workspace root (or ~/.token-proctor/config.json):

{
  "allowModels": ["gpt-4o-mini", "gpt-4o", "claude-sonnet-4", "o4-mini", "gemini-flash"],
  "denyModels": ["claude-opus"],
  "premiumModelsAllowedFor": ["code_large", "reasoning"],
  "optimizeFor": "balanced",
  "preferCheap": true,
  "completenessThreshold": 60,
  "redact": {
    "builtins": true,
    "patterns": ["CORP-[A-Z0-9]{12}"],
    "blockOnMatch": false
  },
  "audit": {
    "enabled": true,
    "path": ".token-proctor/audit.jsonl"
  },
  "llmJudge": {
    "enabled": true,
    "confidenceThreshold": 0.85
  },
  "plan": {
    "name": "squad",
    "monthlyTokenAllowance": 10000000,
    "overageUsdPerM": 5.0
  }
}

Block reference:

allow/deny/premium-for-task — gate the model pool the router can pick from.
redact — built-in detectors cover AWS access/secret keys, GitHub/Slack/OpenAI/Stripe tokens, JWTs, PEM private keys, Google API keys. Matches are replaced with [REDACTED:kind] before anything leaves the pure-function core. The forwarded prompt never contains raw secrets.
audit — opt-in JSONL log of every decision (task, model, cost, redactions, verdict). Local file; no network.
llmJudge — when rule-based confidence < confidenceThreshold, call the cheapest available vscode.lm model to classify + estimate output tokens + estimate turns.
plan — token-based plan context. When monthlyTokenAllowance is set, the summary shows "plan=name X.Y%".

Exact token counting

js-tiktoken is a regular dependency; token counts are exact (o200k_base) on every run. The summary tags tokens=exact. If the dep fails to load for some reason, falls back to a chars/4 + punctuation heuristic and tags tokens=heuristic.

Model catalog

Prices and premium multipliers are plain data in src/data/pricing.ts. Fork it, tune it for your org's real rates, ship it.

Architecture

┌─────────────────────────────────────────────────────────┐
│ VS Code Chat Participant (@proctor)                    │ ◄── primary UX
│ src/participant.ts                                       │
└────────────┬────────────────────────────────────────────┘
             │
             │             ┌───────────────────────────────┐
             ▼             ▼                               │
   ┌────────────────────────────────────┐                  │
   │ Core (src/core/)                   │   src/mcp-server.ts
   │   taskClassifier • promptValidator │ ◄── same core over MCP
   │   modelRouter • costEstimator      │
   │   llmJudge • redactor • policy     │
   │   tokens (js-tiktoken) • audit     │
   └────────────────────────────────────┘
             ▲
             │
   ┌─────────┴──────────┐
   │ src/data/pricing.ts │  ← model catalog, override per org
   └────────────────────┘

Core modules are pure functions (except policy and audit which touch the filesystem). No globals, no network. Trivial to unit-test, easy to swap any piece.

Why this exists

Copilot Business/Enterprise (and most token-priced LLM plans) bill per premium request or per token. In practice, most overspend comes from:

Users defaulting to the most powerful model for trivial edits.
Vague prompts that require many expensive round-trips to finish.
Agent mode running 10–30 turns of a 1× premium model on what could have been a 2-turn job on a 0× model.

Token Proctor surfaces all three before the call. It's the cheapest lever an org can pull on LLM spend — and it composes with, rather than replaces, whatever the underlying chat or agent does next.

Roadmap

v0.1 — classifier, validator, router, cost, chat participant, MCP server.
v0.2 — .token-proctor.json policy (allow/deny/premium-gating) + secret redaction + JSONL audit log.
v0.3 — LLM judge fallback classifier + js-tiktoken for exact counts.
v0.4 — turns-aware cost projection, plan-aware allowance %, optimizeFor knob, Copilot agent hand-off, rename to Token Proctor.
v0.5 — vscode.lm.registerTool so Copilot agent mode can call Proctor directly mid-turn.
v0.6 — Server-side GitHub Copilot Extension for centralized org routing and fleet-wide budget enforcement.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.vscode		.vscode
docs		docs
examples		examples
media		media
src		src
.gitignore		.gitignore
.token-proctor.json		.token-proctor.json
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token Proctor

Why it exists — one concrete example

What's new in v0.4

Prerequisites

Quick start

Try the chat participant

Slash commands

Try the MCP server

Configuration

VS Code settings

`optimizeFor` — the key routing knob

Policy file — `.token-proctor.json`

Exact token counting

Model catalog

Architecture

Why this exists

Roadmap

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Token Proctor

Why it exists — one concrete example

What's new in v0.4

Prerequisites

Quick start

Try the chat participant

Slash commands

Try the MCP server

Configuration

VS Code settings

optimizeFor — the key routing knob

Policy file — .token-proctor.json

Exact token counting

Model catalog

Architecture

Why this exists

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`optimizeFor` — the key routing knob

Policy file — `.token-proctor.json`

Packages