Pick the right model. Validate the prompt. See the real cost — tokens and turns. Before you spend a premium request.
An open-source layer on top of GitHub Copilot (and any token-priced LLM plan) that answers the three questions every team eventually asks:
- Is this prompt ready to run? — scores completeness 0–100 and suggests follow-ups when it's too vague.
- Which model should run it? — picks the cheapest model that clears the quality bar, with a knob for token-cost vs agent-turn-cost trade-offs.
- What will it actually cost? — projects tokens × turns × $ before the call, including % of your monthly plan allowance.
Ships as two surfaces from one core:
- VS Code chat participant (
@proctor) — primary UX, built onvscode.chat+vscode.lm. - MCP server (
token-proctor-mcp) — same core over Model Context Protocol, works with Copilot CLI, Copilot agent mode, Claude Desktop, Cursor, etc.
100% local. No network calls. We don't proxy prompts anywhere — we call vscode.lm (your existing Copilot entitlement) or hand a decision back to the MCP client.
Full design doc: docs/ANALYSIS.md.
You ask Copilot in agent mode: "Refactor this service to use async I/O everywhere." On a 1× premium model (Sonnet 4, GPT-5), that single prompt runs ~20 turns of read-file / apply-edit / run-tests, each one a billed premium request.
| Model | Premium multiplier | Turns | Premium-equivalent requests |
|---|---|---|---|
| Claude Sonnet 4 | 1.0× | 20 | 20 |
| o4-mini | 0.33× | 20 | 6.6 (–67%) |
| gpt-4o-mini | 0× | 20 | 0 (but quality drops on large refactors) |
Token Proctor shows you this before you hit send and recommends the cheapest model that clears the quality bar for the task. On a 300-request monthly seat, three of those refactors on Sonnet is already 20% of the bucket.
- Turn-aware cost projection. The LLM judge predicts how many agent turns a prompt will need (1 for Q&A, 10–30 for code_large/agentic), and cost is
inputTokens × outputTokens × turns × model price. No more hiding the cost of 20-turn agent loops behind a single-call estimate. optimizeForknob —tokens(default) minimizes $/M;turnsminimizespremium × turns(right for agent loops);balancedsplits the difference.- Plan-aware allowance %. Set
plan.monthlyTokenAllowancein your policy and the summary shows "this prompt ≈ 4.5% of your squad-plan monthly tokens". - Exact tokenization on by default via
js-tiktoken(o200k_base). - Copilot agent hand-off. After confirmation, Token Proctor launches a new Copilot Chat turn with the redacted prompt so Copilot's own agent tools (file edits, terminal) drive the work.
- Node.js ≥ 20
- VS Code ≥ 1.95
- GitHub Copilot extension installed and signed in (Business or Enterprise entitlement recommended for premium models)
Install from the VS Code Marketplace (fastest):
code --install-extension token-proctor.token-proctorOr clone and run from source:
git clone https://github.com/navintkr/token-proctor.git
cd token-proctor
npm install
npm run compile- Open this folder in VS Code.
- Press F5 → an Extension Development Host window opens.
- In the new window, make sure the workspace is the same
token-proctorfolder (so.token-proctor.jsonloads). - Open Copilot Chat (
Ctrl+Shift+I) and type:
@proctor add caching to fetchUser so repeated calls within 5s return the same result
Sample output:
Task: code_small (confidence 90%)
Completeness: ✅ 72/100 (ready)
Recommended model: gpt-4o-mini
Estimate: ~210 in / ~180 out × 2 turns · ~$0.0005 · base quota · model=gpt-4o-mini · plan=squad 0.01%
<sub>tokens=exact · judge=on · policy=.token-proctor.json</sub>
🧠 llm-judge(gpt-4o-mini): task=code_small conf=0.90 out≈180 turns≈2 — ...
---
Recommended model: gpt-4o-mini
🚀 Accept & hand off to Copilot — switches Copilot's chat model to the recommendation
and lets the Copilot agent drive the change with its own tools (edits, terminal, etc.).
Click 🚀 Accept & hand off to Copilot to have Copilot's default agent take over with file/terminal tools. Token Proctor tries to flip the chat model dropdown automatically (a handful of best-effort command ids); if your Copilot build doesn't expose any of them, you'll see a toast asking you to pick the model manually.
| Command | What it does |
|---|---|
@proctor /route <prompt> |
Default. Classify → validate → route → project cost. |
@proctor /validate <prompt> |
Completeness report with weighted dimensions + follow-ups. |
@proctor /cost <prompt> |
Per-turn and total tokens, turns, USD, plan burn %. |
@proctor /explain <prompt> |
Everything + the full candidate model matrix. |
@proctor /confirm <prompt> |
Confirm the last recommendation and forward. |
@proctor /cancel |
Abandon a pending confirmation. |
npm run compile
node ./out/mcp-server.jsRegister it with your MCP client. Example (.vscode/mcp.json):
{
"servers": {
"token-proctor": {
"type": "stdio",
"command": "node",
"args": ["${workspaceFolder}/out/mcp-server.js"]
}
}
}For Copilot CLI / Claude Desktop:
{
"mcpServers": {
"token-proctor": {
"command": "npx",
"args": ["-y", "token-proctor-mcp"]
}
}
}Tools exposed:
| Tool | Purpose |
|---|---|
analyze_prompt |
Full pipeline — redact, classify, validate, route, project cost. |
validate_prompt |
Completeness score + follow-up questions. |
recommend_model |
Task classification + best model + alternatives. |
estimate_cost |
Tokens + turns + USD against an auto-routed or named model. |
list_models |
The model catalog (prices, premium multipliers). |
redact_text |
Redact secrets using built-in + policy patterns. |
get_policy |
Return the loaded .token-proctor.json policy and its source path. |
| Mode | Weights | Best for |
|---|---|---|
tokens (default) |
prioritize $/M token price | One-shot Q&A, docs, creative |
turns |
prioritize low premium × turns burn |
Agent loops (code_large, agentic) |
balanced |
weighted compromise | Mixed workloads |
Why it matters: Claude Sonnet has a 1× premium multiplier. On an agentic prompt predicted to run 20 turns, that's 20 premium requests. An o4-mini at 0.33× would burn ~6.6 — about 70% less of your monthly bucket. optimizeFor: turns surfaces that trade-off.
Drop at workspace root (or ~/.token-proctor/config.json):
{
"allowModels": ["gpt-4o-mini", "gpt-4o", "claude-sonnet-4", "o4-mini", "gemini-flash"],
"denyModels": ["claude-opus"],
"premiumModelsAllowedFor": ["code_large", "reasoning"],
"optimizeFor": "balanced",
"preferCheap": true,
"completenessThreshold": 60,
"redact": {
"builtins": true,
"patterns": ["CORP-[A-Z0-9]{12}"],
"blockOnMatch": false
},
"audit": {
"enabled": true,
"path": ".token-proctor/audit.jsonl"
},
"llmJudge": {
"enabled": true,
"confidenceThreshold": 0.85
},
"plan": {
"name": "squad",
"monthlyTokenAllowance": 10000000,
"overageUsdPerM": 5.0
}
}Block reference:
- allow/deny/premium-for-task — gate the model pool the router can pick from.
- redact — built-in detectors cover AWS access/secret keys, GitHub/Slack/OpenAI/Stripe tokens, JWTs, PEM private keys, Google API keys. Matches are replaced with
[REDACTED:kind]before anything leaves the pure-function core. The forwarded prompt never contains raw secrets. - audit — opt-in JSONL log of every decision (task, model, cost, redactions, verdict). Local file; no network.
- llmJudge — when rule-based confidence <
confidenceThreshold, call the cheapest availablevscode.lmmodel to classify + estimate output tokens + estimate turns. - plan — token-based plan context. When
monthlyTokenAllowanceis set, the summary shows "plan=nameX.Y%".
js-tiktoken is a regular dependency; token counts are exact (o200k_base) on every run. The summary tags tokens=exact. If the dep fails to load for some reason, falls back to a chars/4 + punctuation heuristic and tags tokens=heuristic.
Prices and premium multipliers are plain data in src/data/pricing.ts. Fork it, tune it for your org's real rates, ship it.
┌─────────────────────────────────────────────────────────┐
│ VS Code Chat Participant (@proctor) │ ◄── primary UX
│ src/participant.ts │
└────────────┬────────────────────────────────────────────┘
│
│ ┌───────────────────────────────┐
▼ ▼ │
┌────────────────────────────────────┐ │
│ Core (src/core/) │ src/mcp-server.ts
│ taskClassifier • promptValidator │ ◄── same core over MCP
│ modelRouter • costEstimator │
│ llmJudge • redactor • policy │
│ tokens (js-tiktoken) • audit │
└────────────────────────────────────┘
▲
│
┌─────────┴──────────┐
│ src/data/pricing.ts │ ← model catalog, override per org
└────────────────────┘
Core modules are pure functions (except policy and audit which touch the filesystem). No globals, no network. Trivial to unit-test, easy to swap any piece.
Copilot Business/Enterprise (and most token-priced LLM plans) bill per premium request or per token. In practice, most overspend comes from:
- Users defaulting to the most powerful model for trivial edits.
- Vague prompts that require many expensive round-trips to finish.
- Agent mode running 10–30 turns of a 1× premium model on what could have been a 2-turn job on a 0× model.
Token Proctor surfaces all three before the call. It's the cheapest lever an org can pull on LLM spend — and it composes with, rather than replaces, whatever the underlying chat or agent does next.
- v0.1 — classifier, validator, router, cost, chat participant, MCP server.
- v0.2 —
.token-proctor.jsonpolicy (allow/deny/premium-gating) + secret redaction + JSONL audit log. - v0.3 — LLM judge fallback classifier +
js-tiktokenfor exact counts. - v0.4 — turns-aware cost projection, plan-aware allowance %,
optimizeForknob, Copilot agent hand-off, rename to Token Proctor. - v0.5 —
vscode.lm.registerToolso Copilot agent mode can call Proctor directly mid-turn. - v0.6 — Server-side GitHub Copilot Extension for centralized org routing and fleet-wide budget enforcement.
MIT. See LICENSE.

{ "tokenProctor.completenessThreshold": 60, "tokenProctor.autoForward": true, "tokenProctor.requireConfirmation": true, "tokenProctor.handoffToCopilot": true, "tokenProctor.handoffParticipant": "github.copilot", "tokenProctor.preferCheap": false, "tokenProctor.optimizeFor": "tokens", // "tokens" | "turns" | "balanced" "tokenProctor.exactTokenCounts": true, "tokenProctor.llmJudge.enabled": true, "tokenProctor.llmJudge.confidenceThreshold": 0.85 }