-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What specific problem does this solve?
When users switch AI models during a task in Roo Code, the expense tracker and token counter displays become inaccurate. The current implementation recalculates costs based on the active model's pricing configuration without preserving running totals from previous models used in the same conversation. This makes it impossible to accurately track total API costs for tasks that involve multiple model switches.
Who is affected? Users who switch between different models during tasks (e.g., switching from Claude-4-Sonnet to GPT-5 for different capabilities, or testing OpenRouter vs direct API providers). This particularly affects users who need accurate cost tracking for budgeting or expense reporting.
When does this happen?
- User starts a task with Model A (e.g., GPT-5 via OpenAI)
- Task accumulates token usage and costs based on Model A's pricing
- User switches to Model B mid-task (e.g., Claude-4-Sonnet via Anthropic)
- New API handler is created with Model B's configuration
- The expense tracker resets or shows incorrect totals, losing the costs from Model A
- After multiple switches, the displayed total becomes completely unreliable
What's the current behaviour vs expected behaviour?
- Current: Per-request costs are calculated correctly for the active model, but the cumulative total in the UI doesn't persist across model switches. The tracker essentially "resets" or shows incorrect values when models change.
- Expected: The tracker should maintain an accurate cumulative total across all model switches, with a detailed breakdown showing costs by provider, model, and feature usage (e.g., chat, embeddings, context condensing).
What's the impact?
- Inaccurate cost tracking: Users cannot rely on the displayed costs, making it impossible to track actual API expenses
- Budget uncertainty: Without accurate totals, users can't make informed decisions about model usage or stay within budget limits
- Lost trust: The broken cost tracker undermines confidence in Roo Code's reliability for professional use
Additional context (optional)
Mock UI Screenshots
(Some styling can get fixed at actual implementation, but thats the basic idea.)
Mock UI — Activity semantics:
- For normal Roo Code chat usage, Activity = the current Mode’s display name (built‑in or custom). Built‑in examples: "🏗️ Architect", "💻 Code", "❓ Ask", "🪲 Debug", "🪃 Orchestrator" (see
packages/types/src/mode.ts). For custom modes, use thenamefrom the user’s configuration (custom_modes.yaml/ project.roomodes). - For non‑chat services, Activity = a canonical service keyword with a friendly label. Initial set:
CODEBASE_INDEXING→ "Codebase Indexing"CONTEXT_CONDENSING→ "Context Condensing"CHAT_MEMORY→ "Chat Memory"
Text example breakdown:
Provider Model Activity Tokens Cost
Anthropic claude-4-sonnet 🏗️ Architect ↑18.0k ↓4.5k $1.25
OpenAI gpt-5-chat 💻 Code ↑52.0k ↓11.0k $3.85
OpenAI gpt-5-mini ❓ Ask ↑12.0k ↓3.0k $0.61
OpenAI gpt-5-chat 🪲 Debug ↑17.0k ↓5.2k $1.10
OpenRouter gpt-5-chat 🪃 Orchestrator ↑9.0k ↓2.5k $0.78
OpenAI text-embedding-3-large Codebase Indexing ↑420.0k ↓0 $0.42
xAI grok-4 Context Condensing ↑6.5k ↓1.5k $0.33
Anthropic claude-4-sonnet Chat Memory ↑9.0k ↓1.0k $0.50
OpenAI text-embedding-3-large Chat Memory ↑70.0k ↓0 $0.07
Related Code Paths:
- Model switching:
ClineProvider.activateProviderProfile(src/core/webview/ClineProvider.ts:~1180) - Cost calculation: Task.ts streaming loop (~1849)
- Token usage events:
TaskTokenUsageUpdatedin saveClineMessages (~676)
Related Issues
--
- Total Costs/Token Usage across all Tasks #6822 — Total Costs/Token Usage across all Tasks
This proposal provides the per‑task persistent ledger and activity breakdown that issue Total Costs/Token Usage across all Tasks #6822 can aggregate across tasks/workspaces.
- [ENHANCEMENT]: Aggregate subtask costs and improve hierarchy visualization in Orchestrator/Boomerang mode #5376 — Aggregate subtask costs and improve hierarchy visualization in Orchestrator mode
The lineage fields (root_task_id,parent_task_id,task_id) and aggregation rules in this proposal enable accurate parent/subtask roll‑ups without file merges.
Roo Code Task Links (Optional)
No response
Request checklist
- I've searched existing Issues and Discussions for duplicates
- This describes a specific problem with clear impact and context
Interested in implementing this?
- Yes, I'd like to help implement this feature
Implementation requirements
- I understand this needs approval before implementation begins
How should this be solved? (REQUIRED if contributing, optional otherwise)
Core Solution: Implement a persistent CostLedger class that maintains cross-model totals independently of the API handler.
The ledger will store historical entries with metadata about provider, model, and feature usage. On each API response, it appends an entry and recomputes cumulative totals for both costs and tokens. This ensures the expense tracker and token counter receive accurate data across model switches.
Key Components:
-
CostLedger Class (src/core/task/)
appendEntry(entry: CostEntry): Add new cost entry with metadatagetCumulativeTotal(): Return aggregated totalsgetBreakdownByModel(): Group entries for UI display- Persistence Strategy: Write-ahead log (WAL) + periodic snapshots
cost-ledger-wal.jsonl: Append-only log for crash safetycost-ledger.json: Periodic full snapshots (every 100 entries)- Batched writes every 1s or 10 entries to balance performance
- Automatic recovery on task resume
-
Integration Points:
- In Task.ts streaming loop: After calculating per-request cost, append to ledger
- In
activateProviderProfile: Preserve ledger across model switches - In
saveClineMessages: Include ledger totals in emitted events - In
completeSubtask: Merge child ledger into parent ledger
-
Subtask Cost Aggregation:
- Track costs from subtasks spawned in orchestrator mode
- Merge child task ledgers into parent on completion
- Mark subtask entries with subtaskId for UI filtering
- Aggregate totals across entire task hierarchy
-
Secondary Cost Tracking:
- Embeddings: Track costs from vector store operations
- Memory LLM: Track extraction costs from Chat Memory Phase 4
- Context Condensing: Track summarization costs
- Use generic event hook for future extensibility
-
UI Wiring:
- Remove mock data from ChatView.tsx
- Pass real ledger breakdown to TaskHeader
- Update costs in real-time as entries are added
User Interaction: No changes required - the tracker automatically updates with correct totals. The existing "View full breakdown" link opens the detailed dialog showing all cost entries grouped by provider and model.
How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)
Given a new task in Roo Code
When the user accumulates token usage with OpenAI GPT-5 ($3.85 total, 63k tokens)
Then the expense tracker displays accurate cumulative cost $3.85
And the token counter displays accurate cumulative tokens 63k
And the internal ledger stores one entry with correct provider, modelId, and cost
Given the task from above
When the user switches to Anthropic Claude-4-Sonnet and accumulates additional usage ($2.60, 48.5k tokens)
Then the expense tracker displays updated cumulative $6.45 (not reset to $2.60)
And the token counter displays updated cumulative 111.5k tokens (not reset to 48.5k)
And the breakdown dialog shows both models with their respective costs
But the per-request calculation still uses Claude's pricing correctly
Given a task with model switches and secondary operations
When context condensing with xAI Grok-4 occurs ($0.33 cost, 8k tokens)
Then the ledger appends entry with provider: 'xAI', modelId: 'grok-4', feature: 'context_condensing'
And the breakdown table shows this as a separate line item
And totals include all costs without double-counting
Given a parent task that spawns a subtask in orchestrator mode
When the subtask uses GPT-5-mini and accumulates $0.61 in costs
Then the subtask maintains its own ledger during execution
And when the subtask completes, its ledger merges into the parent's ledger
And the parent's total includes both its own costs and the subtask's costs
And the breakdown shows subtask entries marked with subtaskId
Given an existing task from history without a ledger
When the task is resumed
Then ledger initializes with zero totals
And a one-time warning logs about missing historical data
But task proceeds without errors
Given rapid model switches (e.g., GPT-5 → Claude-4 → GPT-5-mini → Grok-4 → o4-mini)
When costs accumulate across all switches
Then ledger maintains accurate totals for each model
And UI updates smoothly without flicker
And persistence is batched to avoid performance impact
Technical considerations (REQUIRED if contributing, optional otherwise)
-
Implementation Approach: Add CostLedger class with entries containing provider, modelId, activity_code, activity_label, tokensIn, tokensOut, cost. Extend Task (and service modules like code indexing and chat memory) to append entries on their API responses. Use write‑ahead log pattern with append‑only JSONL file for crash safety, plus periodic snapshots using existing safeWriteJson utility.
-
Performance: Ledger appends are O(1) in-memory with batched persistence. UI uses existing React components with minimal overhead. Estimated 2-3% performance impact.
-
Compatibility: Backward-compatible - legacy tasks initialize with empty ledger. No breaking API changes.
-
Systems Affected:
- Task.ts (ledger initialization and entry appending)
- Event system (extend TaskTokenUsageUpdated payload)
- UI components (wire real data to existing components)
- Task persistence (add cost-ledger.json to task directory)
-
Potential Blockers:
- Ledger schema versioning for future fields
- Ensuring UI performance with large breakdown tables (100+ entries)
- Chat Memory integration timing (use feature flags)
- Subtask ledger merging needs careful handling to avoid race conditions
Trade-offs and risks (REQUIRED if contributing, optional otherwise)
-
Alternative Approaches:
- Considered: Single cumulative counter without breakdown - Rejected because users need visibility into per-model costs
- Considered: Separate ledgers per provider - Rejected because it fragments the total and complicates UI
- Considered: In-memory only tracking - Rejected due to data loss risk on crashes
- Chosen: Unified ledger with WAL persistence - Provides durability, crash recovery, and detailed breakdown
-
Potential Negative Impacts:
- Risk: Large breakdown tables (100+ entries) could impact UI performance - Mitigation: Implement virtualized scrolling if needed
- Risk: Additional I/O for persistence - Mitigation: WAL with batched writes (1s or 10 entries), snapshots every 100 entries
- Risk: Memory overhead for long-running tasks - Mitigation: Implement entry compaction after 1000 entries
- Risk: WAL corruption on power loss - Mitigation: Skip corrupt lines during recovery, maintain last valid snapshot
-
Breaking Changes: None - the feature is additive with optional fields in existing events
-
Edge Cases:
- Model switches mid-streaming response - Tag with model active at response start
- Corrupted ledger file - Log warning and reinitialize
- Missing provider/model info - Default to 'unknown' with logged warning
- Chat Memory features not yet implemented - Use feature flags to enable tracking when ready
Persistence & Fork Semantics
-
Storage location:
- WAL:
tasks/<taskId>/cost-ledger-wal.jsonl(append-only) - Snapshot:
tasks/<taskId>/cost-ledger.json(periodic compaction) - Lives alongside
ui_messages.json,api_conversation_history.json, andcheckpoints/.
- WAL:
-
Ledger entry schema additions for robust aggregation:
entry_id(string, UUID): Created once when the entry is first written; remains identical after a fork.task_id(string): The task folder ID where the entry currently resides.origin_task_id(string): The task that originally wrote the entry (never changes).root_task_id(string): Root of the task lineage (historyItem.rootTaskId ?? taskId).ts(number): Entry timestamp.- Existing fields:
provider,model_id,feature,tokens_in,tokens_out,cost. - Optional:
fork_generation(number) for analytics (0=root, 1=child, ...).
-
Aggregation across tasks:
- Read ledgers for all tasks in the workspace.
- De-duplicate by
entry_id(ororigin_task_id + entry_id) to avoid counting pre‑fork entries twice. - Sum totals (tokens/cost) over the unique set, grouped as needed (by model, provider, feature, etc.).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status