Persistent Multi-Model Cost Tracking with Breakdown

## What specific problem does this solve?

When users switch AI models during a task in Roo Code, the expense tracker and token counter displays become inaccurate. The current implementation recalculates costs based on the active model's pricing configuration without preserving running totals from previous models used in the same conversation. This makes it impossible to accurately track total API costs for tasks that involve multiple model switches.

**Who is affected?** Users who switch between different models during tasks (e.g., switching from Claude-4-Sonnet to GPT-5 for different capabilities, or testing OpenRouter vs direct API providers). This particularly affects users who need accurate cost tracking for budgeting or expense reporting.

**When does this happen?** 
1. User starts a task with Model A (e.g., GPT-5 via OpenAI)
2. Task accumulates token usage and costs based on Model A's pricing
3. User switches to Model B mid-task (e.g., Claude-4-Sonnet via Anthropic)
4. New API handler is created with Model B's configuration
5. The expense tracker resets or shows incorrect totals, losing the costs from Model A
6. After multiple switches, the displayed total becomes completely unreliable

**What's the current behaviour vs expected behaviour?**
- **Current:** Per-request costs are calculated correctly for the active model, but the cumulative total in the UI doesn't persist across model switches. The tracker essentially "resets" or shows incorrect values when models change.
- **Expected:** The tracker should maintain an accurate cumulative total across all model switches, with a detailed breakdown showing costs by provider, model, and feature usage (e.g., chat, embeddings, context condensing).

**What's the impact?**
- **Inaccurate cost tracking:** Users cannot rely on the displayed costs, making it impossible to track actual API expenses
- **Budget uncertainty:** Without accurate totals, users can't make informed decisions about model usage or stay within budget limits
- **Lost trust:** The broken cost tracker undermines confidence in Roo Code's reliability for professional use

### Additional context (optional)

**Mock UI Screenshots**

<img width="766" height="374" alt="Image" src="https://github.com/user-attachments/assets/b644d62f-3be9-428c-af1c-bebfefd02eda" />

<img width="1302" height="1052" alt="Image" src="https://github.com/user-attachments/assets/7d7213c0-63d1-48fb-a9cf-13e4cc1705b5" />

<img width="828" height="1468" alt="Image" src="https://github.com/user-attachments/assets/6ad5b828-beb4-4337-9e61-94a186c9b7c7" />


(Some styling can get fixed at actual implementation, but thats the basic idea.)

**Mock UI — Activity semantics:**

- For normal Roo Code chat usage, Activity = the current Mode’s display name (built‑in or custom). Built‑in examples: "🏗️ Architect", "💻 Code", "❓ Ask", "🪲 Debug", "🪃 Orchestrator" (see `packages/types/src/mode.ts`). For custom modes, use the `name` from the user’s configuration (`custom_modes.yaml` / project `.roomodes`).
- For non‑chat services, Activity = a canonical service keyword with a friendly label. Initial set:
  - `CODEBASE_INDEXING` → "Codebase Indexing"
  - `CONTEXT_CONDENSING` → "Context Condensing"
  - `CHAT_MEMORY` → "Chat Memory"

**Text example breakdown:**
```
Provider        Model                   Activity                   Tokens          Cost
Anthropic       claude-4-sonnet         🏗️ Architect              ↑18.0k ↓4.5k    $1.25
OpenAI          gpt-5-chat              💻 Code                    ↑52.0k ↓11.0k   $3.85
OpenAI          gpt-5-mini              ❓ Ask                     ↑12.0k ↓3.0k    $0.61
OpenAI          gpt-5-chat              🪲 Debug                   ↑17.0k ↓5.2k    $1.10
OpenRouter      gpt-5-chat              🪃 Orchestrator            ↑9.0k  ↓2.5k    $0.78
OpenAI          text-embedding-3-large  Codebase Indexing         ↑420.0k ↓0      $0.42
xAI             grok-4                  Context Condensing         ↑6.5k  ↓1.5k    $0.33
Anthropic       claude-4-sonnet         Chat Memory                ↑9.0k  ↓1.0k    $0.50
OpenAI          text-embedding-3-large  Chat Memory                ↑70.0k ↓0       $0.07
```

**Related Code Paths:**
- Model switching: `ClineProvider.activateProviderProfile` (src/core/webview/ClineProvider.ts:~1180)
- Cost calculation: Task.ts streaming loop (~1849)
- Token usage events: `TaskTokenUsageUpdated` in saveClineMessages (~676)


## Related Issues
--
 
- #6822 — Total Costs/Token Usage across all Tasks
This proposal provides the per‑task persistent ledger and activity breakdown that issue #6822 can aggregate across tasks/workspaces.
 
- #5376 — Aggregate subtask costs and improve hierarchy visualization in Orchestrator mode
The lineage fields (`root_task_id`, `parent_task_id`, `task_id`) and aggregation rules in this proposal enable accurate parent/subtask roll‑ups without file merges.
 
 


### Roo Code Task Links (Optional)

_No response_

### Request checklist

- [x] I've searched existing Issues and Discussions for duplicates
- [x] This describes a specific problem with clear impact and context

### Interested in implementing this?

- [x] Yes, I'd like to help implement this feature

### Implementation requirements

- [x] I understand this needs approval before implementation begins

### How should this be solved? (REQUIRED if contributing, optional otherwise)


**Core Solution: Implement a persistent CostLedger class that maintains cross-model totals independently of the API handler.**

The ledger will store historical entries with metadata about provider, model, and feature usage. On each API response, it appends an entry and recomputes cumulative totals for both costs and tokens. This ensures the expense tracker and token counter receive accurate data across model switches.

**Key Components:**

1. **CostLedger Class (src/core/task/)**
   - `appendEntry(entry: CostEntry)`: Add new cost entry with metadata
   - `getCumulativeTotal()`: Return aggregated totals
   - `getBreakdownByModel()`: Group entries for UI display
   - **Persistence Strategy**: Write-ahead log (WAL) + periodic snapshots
     - `cost-ledger-wal.jsonl`: Append-only log for crash safety
     - `cost-ledger.json`: Periodic full snapshots (every 100 entries)
     - Batched writes every 1s or 10 entries to balance performance
     - Automatic recovery on task resume

2. **Integration Points:**
   - In Task.ts streaming loop: After calculating per-request cost, append to ledger
   - In `activateProviderProfile`: Preserve ledger across model switches
   - In `saveClineMessages`: Include ledger totals in emitted events
   - In `completeSubtask`: Merge child ledger into parent ledger

3. **Subtask Cost Aggregation:**
   - Track costs from subtasks spawned in orchestrator mode
   - Merge child task ledgers into parent on completion
   - Mark subtask entries with subtaskId for UI filtering
   - Aggregate totals across entire task hierarchy

4. **Secondary Cost Tracking:**
   - Embeddings: Track costs from vector store operations
   - Memory LLM: Track extraction costs from Chat Memory Phase 4
   - Context Condensing: Track summarization costs
   - Use generic event hook for future extensibility

5. **UI Wiring:**
   - Remove mock data from ChatView.tsx
   - Pass real ledger breakdown to TaskHeader
   - Update costs in real-time as entries are added

**User Interaction:** No changes required - the tracker automatically updates with correct totals. The existing "View full breakdown" link opens the detailed dialog showing all cost entries grouped by provider and model.

## How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Given a new task in Roo Code
When the user accumulates token usage with OpenAI GPT-5 ($3.85 total, 63k tokens)
Then the expense tracker displays accurate cumulative cost $3.85
And the token counter displays accurate cumulative tokens 63k
And the internal ledger stores one entry with correct provider, modelId, and cost

Given the task from above
When the user switches to Anthropic Claude-4-Sonnet and accumulates additional usage ($2.60, 48.5k tokens)
Then the expense tracker displays updated cumulative $6.45 (not reset to $2.60)
And the token counter displays updated cumulative 111.5k tokens (not reset to 48.5k)
And the breakdown dialog shows both models with their respective costs
But the per-request calculation still uses Claude's pricing correctly

Given a task with model switches and secondary operations
When context condensing with xAI Grok-4 occurs ($0.33 cost, 8k tokens)
Then the ledger appends entry with provider: 'xAI', modelId: 'grok-4', feature: 'context_condensing'
And the breakdown table shows this as a separate line item
And totals include all costs without double-counting

Given a parent task that spawns a subtask in orchestrator mode
When the subtask uses GPT-5-mini and accumulates $0.61 in costs
Then the subtask maintains its own ledger during execution
And when the subtask completes, its ledger merges into the parent's ledger
And the parent's total includes both its own costs and the subtask's costs
And the breakdown shows subtask entries marked with subtaskId

Given an existing task from history without a ledger
When the task is resumed
Then ledger initializes with zero totals
And a one-time warning logs about missing historical data
But task proceeds without errors

Given rapid model switches (e.g., GPT-5 → Claude-4 → GPT-5-mini → Grok-4 → o4-mini)
When costs accumulate across all switches
Then ledger maintains accurate totals for each model
And UI updates smoothly without flicker
And persistence is batched to avoid performance impact

## Technical considerations (REQUIRED if contributing, optional otherwise)

- **Implementation Approach:** Add CostLedger class with entries containing provider, modelId, activity_code, activity_label, tokensIn, tokensOut, cost. Extend Task (and service modules like code indexing and chat memory) to append entries on their API responses. Use write‑ahead log pattern with append‑only JSONL file for crash safety, plus periodic snapshots using existing safeWriteJson utility.

- **Performance:** Ledger appends are O(1) in-memory with batched persistence. UI uses existing React components with minimal overhead. Estimated 2-3% performance impact.

- **Compatibility:** Backward-compatible - legacy tasks initialize with empty ledger. No breaking API changes.

- **Systems Affected:** 
  - Task.ts (ledger initialization and entry appending)
  - Event system (extend TaskTokenUsageUpdated payload)
  - UI components (wire real data to existing components)
  - Task persistence (add cost-ledger.json to task directory)

- **Potential Blockers:** 
  - Ledger schema versioning for future fields
  - Ensuring UI performance with large breakdown tables (100+ entries)
  - Chat Memory integration timing (use feature flags)
  - Subtask ledger merging needs careful handling to avoid race conditions

## Trade-offs and risks (REQUIRED if contributing, optional otherwise)

- **Alternative Approaches:**
  - Considered: Single cumulative counter without breakdown - Rejected because users need visibility into per-model costs
  - Considered: Separate ledgers per provider - Rejected because it fragments the total and complicates UI  
  - Considered: In-memory only tracking - Rejected due to data loss risk on crashes
  - Chosen: Unified ledger with WAL persistence - Provides durability, crash recovery, and detailed breakdown

- **Potential Negative Impacts:**
  - Risk: Large breakdown tables (100+ entries) could impact UI performance - Mitigation: Implement virtualized scrolling if needed
  - Risk: Additional I/O for persistence - Mitigation: WAL with batched writes (1s or 10 entries), snapshots every 100 entries
  - Risk: Memory overhead for long-running tasks - Mitigation: Implement entry compaction after 1000 entries
  - Risk: WAL corruption on power loss - Mitigation: Skip corrupt lines during recovery, maintain last valid snapshot

- **Breaking Changes:** None - the feature is additive with optional fields in existing events

- **Edge Cases:**
  - Model switches mid-streaming response - Tag with model active at response start
  - Corrupted ledger file - Log warning and reinitialize
  - Missing provider/model info - Default to 'unknown' with logged warning
  - Chat Memory features not yet implemented - Use feature flags to enable tracking when ready

## Persistence & Fork Semantics

- **Storage location:**
  - WAL: `tasks/<taskId>/cost-ledger-wal.jsonl` (append-only)
  - Snapshot: `tasks/<taskId>/cost-ledger.json` (periodic compaction)
  - Lives alongside `ui_messages.json`, `api_conversation_history.json`, and `checkpoints/`.

- **Ledger entry schema additions for robust aggregation:**
  - `entry_id` (string, UUID): Created once when the entry is first written; remains identical after a fork.
  - `task_id` (string): The task folder ID where the entry currently resides.
  - `origin_task_id` (string): The task that originally wrote the entry (never changes).
  - `root_task_id` (string): Root of the task lineage (`historyItem.rootTaskId ?? taskId`).
  - `ts` (number): Entry timestamp.
  - Existing fields: `provider`, `model_id`, `feature`, `tokens_in`, `tokens_out`, `cost`.
  - Optional: `fork_generation` (number) for analytics (0=root, 1=child, ...).

- **Aggregation across tasks:**
  - Read ledgers for all tasks in the workspace.
  - De-duplicate by `entry_id` (or `origin_task_id + entry_id`) to avoid counting pre‑fork entries twice.
  - Sum totals (tokens/cost) over the unique set, grouped as needed (by model, provider, feature, etc.).



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent Multi-Model Cost Tracking with Breakdown #7755

What specific problem does this solve?

Additional context (optional)

Related Issues

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Persistence & Fork Semantics

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persistent Multi-Model Cost Tracking with Breakdown #7755

Description

What specific problem does this solve?

Additional context (optional)

Related Issues

Roo Code Task Links (Optional)

Request checklist

Interested in implementing this?

Implementation requirements

How should this be solved? (REQUIRED if contributing, optional otherwise)

How will we know it works? (Acceptance Criteria - REQUIRED if contributing, optional otherwise)

Technical considerations (REQUIRED if contributing, optional otherwise)

Trade-offs and risks (REQUIRED if contributing, optional otherwise)

Persistence & Fork Semantics

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions