Token Usage #1006

yannickwellens · 2026-03-29T15:06:05Z

yannickwellens
Mar 29, 2026

Hi all,

I’ve been test-driving PAI over the past week and noticed that I’m hitting the Claude 5x Max limits quite quickly (around ~3 hours of usage).

For context, I’ve been experimenting with model selection strategies within PAI:

Opus 4.6 for complex tasks and critical steps (e.g. architecture decisions)
Sonnet for standard coding tasks
Haiku for simpler operations

My primary focus is coding-heavy workflows.

I also ran some internal analysis using PAI itself to better understand token overhead. From what I can tell, the overhead in the latest version seems minimal. Most of my consumption appears to come from tool usage (for example updating PRDs on disk, etc.).

My questions:

Is this level of usage expected?
Have others experienced similar limits with comparable workflows?
Are there effective strategies to reduce token usage while maintaining performance?
Or is upgrading to 20x simply the practical solution here?

Would love to hear how others are approaching this.

Thanks!

yannickwellens · 2026-03-29T16:23:09Z

yannickwellens
Mar 29, 2026
Author

Alright, so I let PAI do some deep analysis on his own structure here. AI improving AI. This all still needs to be tested and validated so see if PAI still behaves the same and there is an actual meaningful decrease in token usage. But here the summary just in case you want to chime in.

PAI Token Usage Investigation — Root Cause & Fix

The Problem

PAI sessions were consuming significantly more tokens than expected. Prior analysis had already ruled out PAI's instruction text (~10K tokens) as the cause. The question was: what's actually driving the high usage?

Root Cause: O(N²) Token Scaling from Tool Calls

Claude Code is stateless — it resends the entire conversation history with every API call. Each tool call = 1 API roundtrip. This means total input tokens grow quadratically:

Total_input = N × S + avg_result × N(N-1)/2

S = static system context (~25K tokens)
N = number of tool calls
avg_result = ~700 tokens per tool result

The PAI Algorithm v3.7.0 required 24 mandatory tool calls for a Standard effort run. Of those, 96% were process overhead — not actual work:

Category	Calls	%
Infrastructure (phase-entry Edits, PRD stub, mkdir, Algorithm read, JSONL)	12	50%
Criterion tracking (ISC writes, per-criterion checks, verification)	11	46%
Actual work (capability invocation)	1	4%

Estimated cumulative input tokens per Standard run: ~793K
Equivalent vanilla Claude Code task (~6 calls): ~160K — a 5x multiplier.

What Was NOT the Problem

Hook output: 42 hook executions per run but only ~210 tokens total ({"continue": true} each). Negligible.
PAI instruction text: ~10K tokens, cached after first call. Not the driver.
Algorithm file size: 6.5K tokens, read once. Minor.

Top 3 Cost Drivers (Ranked)

Per-criterion Edit storm — The Algorithm mandated editing the PRD individually for each ISC criterion as it passed. For Extended effort (16 criteria), this alone cost ~27K
cumulative tokens.
Static prefix repetition — 25K tokens × 24 calls = 600K raw input (mitigated by caching but still counts against rate limits).
Phase-entry Edit ceremony — 7 separate Edits that did nothing but update phase: X in PRD frontmatter. ~5.6K cumulative tokens.

The Fix: Algorithm v3.8.0

Five token efficiency rules added to the Algorithm spec:

Rule	What Changed	Impact
Single PRD Write	Full PRD written once in OBSERVE — no stub-then-edit	-1 tool call
Combined phase Edits	Frontmatter + content in one Edit, never separate	-7 tool calls
Batched criterion updates	Max 2-3 Edits in EXECUTE, not per-criterion	-5 to -13 tool calls
Write-over-Edit	Use Write to replace full PRD when updating non-contiguous sections	-2 tool calls
Standard fast-path	5 phases instead of 7 (ANALYZE combines OBSERVE+THINK+PLAN)	-2 phase transitions

Additionally: Standard ISC floor lowered from 8 to 4 criteria.

Test Results

Ran a real task (saving a memory entry) on both versions:

Metric	v3.7.0	v3.8.0	Change
Total tool calls	24	9-11	-54 to -63%
Infrastructure calls	23	6-8	-65%
Work-to-overhead ratio	4%	27%	+23pp
Est. accumulation tokens	193K	~38K	-80%

Key Insight

Because of O(N²) scaling, reducing tool calls by 54% saves ~80% of the context accumulation cost. Each eliminated tool call doesn't just save its own tokens — it reduces the cost of
every subsequent call. This is why minimizing process-management tool calls has outsized impact.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token Usage #1006

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Token Usage #1006

Uh oh!

yannickwellens Mar 29, 2026

My questions:

Replies: 1 comment

Uh oh!

Uh oh!

yannickwellens Mar 29, 2026 Author

PAI Token Usage Investigation — Root Cause & Fix

The Problem

Root Cause: O(N²) Token Scaling from Tool Calls

What Was NOT the Problem

Top 3 Cost Drivers (Ranked)

The Fix: Algorithm v3.8.0

Test Results

Key Insight

yannickwellens
Mar 29, 2026

yannickwellens
Mar 29, 2026
Author