|
| 1 | +--- |
| 2 | +title: "Session Usage and Context Status" |
| 3 | +--- |
| 4 | + |
| 5 | +- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6) |
| 6 | +- Champion: [@benbrandt](https://github.com/benbrandt) |
| 7 | + |
| 8 | +## Elevator pitch |
| 9 | + |
| 10 | +> What are you proposing to change? |
| 11 | +
|
| 12 | +Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations. |
| 13 | + |
| 14 | +## Status quo |
| 15 | + |
| 16 | +> How do things work today and what problems does this cause? Why would we change things? |
| 17 | +
|
| 18 | +Currently, the ACP protocol has no standardized way for agents to communicate: |
| 19 | + |
| 20 | +1. **Token usage** - How many tokens were consumed in a turn or cumulatively |
| 21 | +2. **Context window status** - How much of the model's context window is being used |
| 22 | +3. **Cost information** - Estimated costs for API usage |
| 23 | +4. **Prompt caching metrics** - Cache hits/misses for models that support caching |
| 24 | + |
| 25 | +This creates several problems: |
| 26 | + |
| 27 | +- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used |
| 28 | +- **No cost transparency** - Users can't track spending or estimate costs before operations |
| 29 | +- **No context management** - Clients can't warn users when approaching context limits or suggest compaction |
| 30 | +- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all) |
| 31 | + |
| 32 | +Industry research shows common patterns across AI coding tools: |
| 33 | + |
| 34 | +- LLM providers return cumulative token counts in API responses |
| 35 | +- IDE extensions display context percentage prominently (e.g., radial progress showing "19%") |
| 36 | +- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens") |
| 37 | +- Tools warn users at threshold percentages (75%, 90%, 95%) |
| 38 | +- Auto-compaction features trigger when approaching context limits |
| 39 | +- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns |
| 40 | + |
| 41 | +## What we propose to do about it |
| 42 | + |
| 43 | +> What are you proposing to improve the situation? |
| 44 | +
|
| 45 | +We propose separating usage tracking into two distinct concerns: |
| 46 | + |
| 47 | +1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data) |
| 48 | +2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "usage_update"` (session state) |
| 49 | + |
| 50 | +This separation reflects how users consume this information: |
| 51 | + |
| 52 | +- Token counts are tied to specific turns and useful immediately after a prompt |
| 53 | +- Context window and cost are cumulative session state that agents push proactively when available |
| 54 | + |
| 55 | +Agents send context updates at appropriate times: |
| 56 | + |
| 57 | +- On `session/new` response (if agent can query usage immediately) |
| 58 | +- On `session/load` / `session/resume` (for resumed/forked sessions) |
| 59 | +- After each `session/prompt` response (when usage data becomes available) |
| 60 | +- Anytime context window state changes significantly |
| 61 | + |
| 62 | +This approach provides flexibility for different agent implementations: |
| 63 | + |
| 64 | +- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats |
| 65 | +- Agents that only provide usage when actively prompting can send updates after sending a new prompt |
| 66 | + |
| 67 | +### Token Usage in `PromptResponse` |
| 68 | + |
| 69 | +Add a `usage` field to `PromptResponse` for token consumption tracking: |
| 70 | + |
| 71 | +```json |
| 72 | +{ |
| 73 | + "jsonrpc": "2.0", |
| 74 | + "id": 1, |
| 75 | + "result": { |
| 76 | + "sessionId": "sess_abc123", |
| 77 | + "stopReason": "end_turn", |
| 78 | + "usage": { |
| 79 | + "total_tokens": 53000, |
| 80 | + "input_tokens": 35000, |
| 81 | + "output_tokens": 12000, |
| 82 | + "thought_tokens": 5000, |
| 83 | + "cached_read_tokens": 5000, |
| 84 | + "cached_write_tokens": 1000 |
| 85 | + } |
| 86 | + } |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +#### Usage Fields |
| 91 | + |
| 92 | +- `total_tokens` (number, required) - Sum of all token types across session |
| 93 | +- `input_tokens` (number, required) - Total input tokens across all turns |
| 94 | +- `output_tokens` (number, required) - Total output tokens across all turns |
| 95 | +- `thought_tokens` (number, optional) - Total thought/reasoning tokens (for o1/o3 models) |
| 96 | +- `cached_read_tokens` (number, optional) - Total cache read tokens |
| 97 | +- `cached_write_tokens` (number, optional) - Total cache write tokens |
| 98 | + |
| 99 | +### Context Window and Cost via `session/update` |
| 100 | + |
| 101 | +Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "usage_update"`: |
| 102 | + |
| 103 | +```json |
| 104 | +{ |
| 105 | + "jsonrpc": "2.0", |
| 106 | + "method": "session/update", |
| 107 | + "params": { |
| 108 | + "sessionId": "sess_abc123", |
| 109 | + "update": { |
| 110 | + "sessionUpdate": "usage_update", |
| 111 | + "used": 53000, |
| 112 | + "size": 200000 |
| 113 | + } |
| 114 | + } |
| 115 | +} |
| 116 | +``` |
| 117 | + |
| 118 | +#### Context Window Fields (required) |
| 119 | + |
| 120 | +- `used` (number, required) - Tokens currently in context |
| 121 | +- `size` (number, required) - Total context window size in tokens |
| 122 | + |
| 123 | +Note: Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100` if needed. |
| 124 | + |
| 125 | +#### Cost Fields (optional) |
| 126 | + |
| 127 | +- `cost` (object, optional) - Cumulative session cost |
| 128 | + - `amount` (number, required) - Total cumulative cost for session |
| 129 | + - `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR") |
| 130 | + |
| 131 | +Example with optional cost: |
| 132 | + |
| 133 | +```json |
| 134 | +{ |
| 135 | + "jsonrpc": "2.0", |
| 136 | + "method": "session/update", |
| 137 | + "params": { |
| 138 | + "sessionId": "sess_abc123", |
| 139 | + "update": { |
| 140 | + "sessionUpdate": "usage_update", |
| 141 | + "used": 53000, |
| 142 | + "size": 200000, |
| 143 | + "cost": { |
| 144 | + "amount": 0.045, |
| 145 | + "currency": "USD" |
| 146 | + } |
| 147 | + } |
| 148 | + } |
| 149 | +} |
| 150 | +``` |
| 151 | + |
| 152 | +### Design Principles |
| 153 | + |
| 154 | +1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state |
| 155 | +2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`) |
| 156 | +3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification |
| 157 | +4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD |
| 158 | +5. **Prompt caching support** - Include cache read/write tokens for models that support it |
| 159 | +6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility |
| 160 | +7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting |
| 161 | + |
| 162 | +## Shiny future |
| 163 | + |
| 164 | +> How will things will play out once this feature exists? |
| 165 | +
|
| 166 | +**For Users:** |
| 167 | + |
| 168 | +- **Visibility**: Users see real-time context window usage with percentage indicators |
| 169 | +- **Cost awareness**: Users can track spending and check cumulative cost at any time |
| 170 | +- **Better planning**: Users know when to start new sessions or compact context |
| 171 | +- **Transparency**: Clear understanding of resource consumption |
| 172 | + |
| 173 | +**For Client Implementations:** |
| 174 | + |
| 175 | +- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings) |
| 176 | +- **Smart warnings**: Clients can warn users at 75%, 90% context usage |
| 177 | +- **Cost controls**: Clients can implement budget limits and alerts |
| 178 | +- **Analytics**: Clients can track usage patterns and optimize |
| 179 | +- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data |
| 180 | +- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll |
| 181 | + |
| 182 | +**For Agent Implementations:** |
| 183 | + |
| 184 | +- **Standard reporting**: Clear contract for what to report and when |
| 185 | +- **Flexibility**: Optional fields allow agents to report what they can calculate |
| 186 | +- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.) |
| 187 | +- **Caching support**: First-class support for prompt caching |
| 188 | + |
| 189 | +## Implementation details and plan |
| 190 | + |
| 191 | +> Tell me more about your implementation. What is your detailed implementation plan? |
| 192 | +
|
| 193 | +1. **Update schema.json** to add: |
| 194 | + - `Usage` type with token fields |
| 195 | + - `Cost` type with `amount` and `currency` fields |
| 196 | + - `ContextUpdate` type with `used`, `size` (required) and optional `cost` field |
| 197 | + - Add optional `usage` field to `PromptResponse` |
| 198 | + - Add `UsageUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "usage_update"`) |
| 199 | + |
| 200 | +2. **Update protocol documentation**: |
| 201 | + - Document `usage` field in `/docs/protocol/prompt-turn.mdx` |
| 202 | + - Document `session/update` notification with `sessionUpdate: "usage_update"` variant |
| 203 | + - Add examples showing typical usage patterns and when agents send context updates |
| 204 | + |
| 205 | +## Frequently asked questions |
| 206 | + |
| 207 | +> What questions have arisen over the course of authoring this document or during subsequent discussions? |
| 208 | +
|
| 209 | +### Why separate token usage from context window and cost? |
| 210 | + |
| 211 | +Different users care about different things at different times: |
| 212 | + |
| 213 | +- **Token counts**: Relevant immediately after a turn completes to understand the breakdown |
| 214 | +- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?" |
| 215 | +- **Cumulative cost**: Session-level state that agents push when available |
| 216 | + |
| 217 | +Separating them allows: |
| 218 | + |
| 219 | +- Cleaner data model where per-turn data stays in turn responses |
| 220 | +- Agents to push context updates proactively when data becomes available |
| 221 | +- Clients to receive updates reactively without needing to poll |
| 222 | + |
| 223 | +### Why is cost in session/update instead of PromptResponse? |
| 224 | + |
| 225 | +Cost is cumulative session state, similar to context window: |
| 226 | + |
| 227 | +- Users want to track total spending, not just per-turn costs |
| 228 | +- Keeps `PromptResponse` focused on per-turn token breakdown |
| 229 | +- Both cost and context window are session-level metrics that belong together |
| 230 | +- Cost is optional since not all agents track it |
| 231 | + |
| 232 | +### How do users know when to handoff or compact the context? |
| 233 | + |
| 234 | +The context update notification provides everything needed: |
| 235 | + |
| 236 | +- `used` and `size` give absolute numbers for precise tracking |
| 237 | +- Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100` |
| 238 | +- `size` lets clients understand the total budget |
| 239 | + |
| 240 | +**Recommended client behavior:** |
| 241 | + |
| 242 | +| Percentage | Action | |
| 243 | +| ---------- | ---------------------------------------------------------------- | |
| 244 | +| < 75% | Normal operation | |
| 245 | +| 75-90% | Yellow indicator, suggest "Context filling up" | |
| 246 | +| 90-95% | Orange indicator, recommend "Start new session or summarize" | |
| 247 | +| > 95% | Red indicator, warn "Next prompt may fail - handoff recommended" | |
| 248 | + |
| 249 | +Clients can also: |
| 250 | + |
| 251 | +- Offer "Compact context" or "Summarize conversation" actions |
| 252 | +- Auto-suggest starting a new session |
| 253 | +- Implement automatic handoff when approaching limits |
| 254 | + |
| 255 | +### Why not assume USD for cost? |
| 256 | + |
| 257 | +Agents may bill in different currencies: |
| 258 | + |
| 259 | +- European agents might bill in EUR |
| 260 | +- Asian agents might bill in JPY or CNY |
| 261 | +- Some agents might use credits or points |
| 262 | +- Currency conversion rates change |
| 263 | + |
| 264 | +Better to report actual billing currency and let clients convert if needed. |
| 265 | + |
| 266 | +### What if the agent can't calculate some fields? |
| 267 | + |
| 268 | +All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully. |
| 269 | + |
| 270 | +### How does this work with streaming responses? |
| 271 | + |
| 272 | +- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes |
| 273 | +- Final response: Include complete token usage in `PromptResponse` |
| 274 | +- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "usage_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly) |
| 275 | + |
| 276 | +### What about models without fixed context windows? |
| 277 | + |
| 278 | +- Report effective context window size |
| 279 | +- For models with dynamic windows, report current limit |
| 280 | +- Update size if it changes |
| 281 | +- Set to `null` if truly unlimited (rare) |
| 282 | + |
| 283 | +### What about rate limits and quotas? |
| 284 | + |
| 285 | +This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits. |
| 286 | + |
| 287 | +### Should cached tokens count toward context window? |
| 288 | + |
| 289 | +Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached). |
| 290 | + |
| 291 | +### Why notification instead of request? |
| 292 | + |
| 293 | +Using `session/update` notifications instead of a `session/status` request provides several benefits: |
| 294 | + |
| 295 | +1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`) |
| 296 | +2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting) |
| 297 | +3. **No polling**: Clients receive updates reactively without needing to poll |
| 298 | +4. **Real-time updates**: Updates flow naturally as part of the session lifecycle |
| 299 | + |
| 300 | +### What if the client connects mid-session? |
| 301 | + |
| 302 | +When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information. |
| 303 | + |
| 304 | +For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities. |
| 305 | + |
| 306 | +### What alternative approaches did you consider, and why did you settle on this one? |
| 307 | + |
| 308 | +**Alternatives considered:** |
| 309 | + |
| 310 | +1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns. |
| 311 | + |
| 312 | +2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties. |
| 313 | + |
| 314 | +3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing. |
| 315 | + |
| 316 | +4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent. |
| 317 | + |
| 318 | +## Revision history |
| 319 | + |
| 320 | +- 2025-12-07: Initial draft |
| 321 | +- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties. |
| 322 | +- 2025-12-17: Renamed `reasoning_tokens` to `thought_tokens` for consistency with ACP terminology. Removed `percentage` field (clients can compute as `used / size * 100`). |
| 323 | +- 2025-12-19: Renamed `sessionUpdate: "context_update"` to `sessionUpdate: "usage_update"` to better reflect the payload semantics (includes both context window info and cumulative cost). |
0 commit comments