Token Usage #1006
Replies: 1 comment
-
|
Alright, so I let PAI do some deep analysis on his own structure here. AI improving AI. This all still needs to be tested and validated so see if PAI still behaves the same and there is an actual meaningful decrease in token usage. But here the summary just in case you want to chime in. PAI Token Usage Investigation — Root Cause & FixThe ProblemPAI sessions were consuming significantly more tokens than expected. Prior analysis had already ruled out PAI's instruction text (~10K tokens) as the cause. The question was: what's actually driving the high usage? Root Cause: O(N²) Token Scaling from Tool CallsClaude Code is stateless — it resends the entire conversation history with every API call. Each tool call = 1 API roundtrip. This means total input tokens grow quadratically: Total_input = N × S + avg_result × N(N-1)/2
The PAI Algorithm v3.7.0 required 24 mandatory tool calls for a Standard effort run. Of those, 96% were process overhead — not actual work:
Estimated cumulative input tokens per Standard run: ~793K What Was NOT the Problem
Top 3 Cost Drivers (Ranked)
The Fix: Algorithm v3.8.0Five token efficiency rules added to the Algorithm spec:
Additionally: Standard ISC floor lowered from 8 to 4 criteria. Test ResultsRan a real task (saving a memory entry) on both versions:
Key InsightBecause of O(N²) scaling, reducing tool calls by 54% saves ~80% of the context accumulation cost. Each eliminated tool call doesn't just save its own tokens — it reduces the cost of |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I’ve been test-driving PAI over the past week and noticed that I’m hitting the Claude 5x Max limits quite quickly (around ~3 hours of usage).
For context, I’ve been experimenting with model selection strategies within PAI:
My primary focus is coding-heavy workflows.
I also ran some internal analysis using PAI itself to better understand token overhead. From what I can tell, the overhead in the latest version seems minimal. Most of my consumption appears to come from tool usage (for example updating PRDs on disk, etc.).
My questions:
Would love to hear how others are approaching this.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions