docs(rfd): Add session usage and context status RFD (#316)

ahmedhesham6 · benbrandt · web-flow · commit 424d05f6184b · 2026-01-01T20:14:40.000Z
* docs(rfd): Add session usage and context status RFD

Proposes standardized tracking of token usage, cost estimation, and
context window status across ACP implementations.

- Token usage reported in PromptResponse (per-turn data)
- Context window and cost reported in session/status (session state)

* docs(rfd): Update session usage tracking to use session/update notifications

Refines the tracking of context window and cost information by transitioning from `session/status` requests to `session/update` notifications. This change allows agents to proactively push updates, enhancing flexibility and real-time data availability for clients. The `cost` field is now optional, and the `remaining` field has been removed, as clients can compute it from `size` and `used`. Updated documentation to reflect these changes and provide clearer usage patterns.

* docs(rfd): Rename `reasoning_tokens` to `thought_tokens` and update context update fields

* docs(rfd): Update session usage terminology from `context_update` to `usage_update`

* Add to website

---------

Co-authored-by: Ben Brandt &lt;benjamin.j.brandt@gmail.com&gt;
diff --git a/docs/docs.json b/docs/docs.json
@@ -112,7 +112,8 @@
               "rfds/meta-propagation",
               "rfds/session-info-update",
               "rfds/agent-telemetry-export",
-              "rfds/proxy-chains"
+              "rfds/proxy-chains",
+              "rfds/session-usage"
             ]
           },
           { "group": "Preview", "pages": [] },
diff --git a/docs/rfds/session-usage.mdx b/docs/rfds/session-usage.mdx
@@ -0,0 +1,323 @@
+---
+title: "Session Usage and Context Status"
+---
+
+- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6)
+- Champion: [@benbrandt](https://github.com/benbrandt)
+
+## Elevator pitch
+
+> What are you proposing to change?
+
+Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations.
+
+## Status quo
+
+> How do things work today and what problems does this cause? Why would we change things?
+
+Currently, the ACP protocol has no standardized way for agents to communicate:
+
+1. **Token usage** - How many tokens were consumed in a turn or cumulatively
+2. **Context window status** - How much of the model's context window is being used
+3. **Cost information** - Estimated costs for API usage
+4. **Prompt caching metrics** - Cache hits/misses for models that support caching
+
+This creates several problems:
+
+- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used
+- **No cost transparency** - Users can't track spending or estimate costs before operations
+- **No context management** - Clients can't warn users when approaching context limits or suggest compaction
+- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all)
+
+Industry research shows common patterns across AI coding tools:
+
+- LLM providers return cumulative token counts in API responses
+- IDE extensions display context percentage prominently (e.g., radial progress showing "19%")
+- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens")
+- Tools warn users at threshold percentages (75%, 90%, 95%)
+- Auto-compaction features trigger when approaching context limits
+- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns
+
+## What we propose to do about it
+
+> What are you proposing to improve the situation?
+
+We propose separating usage tracking into two distinct concerns:
+
+1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data)
+2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "usage_update"` (session state)
+
+This separation reflects how users consume this information:
+
+- Token counts are tied to specific turns and useful immediately after a prompt
+- Context window and cost are cumulative session state that agents push proactively when available
+
+Agents send context updates at appropriate times:
+
+- On `session/new` response (if agent can query usage immediately)
+- On `session/load` / `session/resume` (for resumed/forked sessions)
+- After each `session/prompt` response (when usage data becomes available)
+- Anytime context window state changes significantly
+
+This approach provides flexibility for different agent implementations:
+
+- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats
+- Agents that only provide usage when actively prompting can send updates after sending a new prompt
+
+### Token Usage in `PromptResponse`
+
+Add a `usage` field to `PromptResponse` for token consumption tracking:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "id": 1,
+  "result": {
+    "sessionId": "sess_abc123",
+    "stopReason": "end_turn",
+    "usage": {
+      "total_tokens": 53000,
+      "input_tokens": 35000,
+      "output_tokens": 12000,
+      "thought_tokens": 5000,
+      "cached_read_tokens": 5000,
+      "cached_write_tokens": 1000
+    }
+  }
+}
+```
+
+#### Usage Fields
+
+- `total_tokens` (number, required) - Sum of all token types across session
+- `input_tokens` (number, required) - Total input tokens across all turns
+- `output_tokens` (number, required) - Total output tokens across all turns
+- `thought_tokens` (number, optional) - Total thought/reasoning tokens (for o1/o3 models)
+- `cached_read_tokens` (number, optional) - Total cache read tokens
+- `cached_write_tokens` (number, optional) - Total cache write tokens
+
+### Context Window and Cost via `session/update`
+
+Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "usage_update"`:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "session/update",
+  "params": {
+    "sessionId": "sess_abc123",
+    "update": {
+      "sessionUpdate": "usage_update",
+      "used": 53000,
+      "size": 200000
+    }
+  }
+}
+```
+
+#### Context Window Fields (required)
+
+- `used` (number, required) - Tokens currently in context
+- `size` (number, required) - Total context window size in tokens
+
+Note: Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100` if needed.
+
+#### Cost Fields (optional)
+
+- `cost` (object, optional) - Cumulative session cost
+  - `amount` (number, required) - Total cumulative cost for session
+  - `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR")
+
+Example with optional cost:
+
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "session/update",
+  "params": {
+    "sessionId": "sess_abc123",
+    "update": {
+      "sessionUpdate": "usage_update",
+      "used": 53000,
+      "size": 200000,
+      "cost": {
+        "amount": 0.045,
+        "currency": "USD"
+      }
+    }
+  }
+}
+```
+
+### Design Principles
+
+1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state
+2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
+3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification
+4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD
+5. **Prompt caching support** - Include cache read/write tokens for models that support it
+6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility
+7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting
+
+## Shiny future
+
+> How will things will play out once this feature exists?
+
+**For Users:**
+
+- **Visibility**: Users see real-time context window usage with percentage indicators
+- **Cost awareness**: Users can track spending and check cumulative cost at any time
+- **Better planning**: Users know when to start new sessions or compact context
+- **Transparency**: Clear understanding of resource consumption
+
+**For Client Implementations:**
+
+- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings)
+- **Smart warnings**: Clients can warn users at 75%, 90% context usage
+- **Cost controls**: Clients can implement budget limits and alerts
+- **Analytics**: Clients can track usage patterns and optimize
+- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data
+- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll
+
+**For Agent Implementations:**
+
+- **Standard reporting**: Clear contract for what to report and when
+- **Flexibility**: Optional fields allow agents to report what they can calculate
+- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.)
+- **Caching support**: First-class support for prompt caching
+
+## Implementation details and plan
+
+> Tell me more about your implementation. What is your detailed implementation plan?
+
+1. **Update schema.json** to add:
+   - `Usage` type with token fields
+   - `Cost` type with `amount` and `currency` fields
+   - `ContextUpdate` type with `used`, `size` (required) and optional `cost` field
+   - Add optional `usage` field to `PromptResponse`
+   - Add `UsageUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "usage_update"`)
+
+2. **Update protocol documentation**:
+   - Document `usage` field in `/docs/protocol/prompt-turn.mdx`
+   - Document `session/update` notification with `sessionUpdate: "usage_update"` variant
+   - Add examples showing typical usage patterns and when agents send context updates
+
+## Frequently asked questions
+
+> What questions have arisen over the course of authoring this document or during subsequent discussions?
+
+### Why separate token usage from context window and cost?
+
+Different users care about different things at different times:
+
+- **Token counts**: Relevant immediately after a turn completes to understand the breakdown
+- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?"
+- **Cumulative cost**: Session-level state that agents push when available
+
+Separating them allows:
+
+- Cleaner data model where per-turn data stays in turn responses
+- Agents to push context updates proactively when data becomes available
+- Clients to receive updates reactively without needing to poll
+
+### Why is cost in session/update instead of PromptResponse?
+
+Cost is cumulative session state, similar to context window:
+
+- Users want to track total spending, not just per-turn costs
+- Keeps `PromptResponse` focused on per-turn token breakdown
+- Both cost and context window are session-level metrics that belong together
+- Cost is optional since not all agents track it
+
+### How do users know when to handoff or compact the context?
+
+The context update notification provides everything needed:
+
+- `used` and `size` give absolute numbers for precise tracking
+- Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100`
+- `size` lets clients understand the total budget
+
+**Recommended client behavior:**
+
+| Percentage | Action                                                           |
+| ---------- | ---------------------------------------------------------------- |
+| < 75%      | Normal operation                                                 |
+| 75-90%     | Yellow indicator, suggest "Context filling up"                   |
+| 90-95%     | Orange indicator, recommend "Start new session or summarize"     |
+| > 95%      | Red indicator, warn "Next prompt may fail - handoff recommended" |
+
+Clients can also:
+
+- Offer "Compact context" or "Summarize conversation" actions
+- Auto-suggest starting a new session
+- Implement automatic handoff when approaching limits
+
+### Why not assume USD for cost?
+
+Agents may bill in different currencies:
+
+- European agents might bill in EUR
+- Asian agents might bill in JPY or CNY
+- Some agents might use credits or points
+- Currency conversion rates change
+
+Better to report actual billing currency and let clients convert if needed.
+
+### What if the agent can't calculate some fields?
+
+All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully.
+
+### How does this work with streaming responses?
+
+- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes
+- Final response: Include complete token usage in `PromptResponse`
+- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "usage_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly)
+
+### What about models without fixed context windows?
+
+- Report effective context window size
+- For models with dynamic windows, report current limit
+- Update size if it changes
+- Set to `null` if truly unlimited (rare)
+
+### What about rate limits and quotas?
+
+This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits.
+
+### Should cached tokens count toward context window?
+
+Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached).
+
+### Why notification instead of request?
+
+Using `session/update` notifications instead of a `session/status` request provides several benefits:
+
+1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
+2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting)
+3. **No polling**: Clients receive updates reactively without needing to poll
+4. **Real-time updates**: Updates flow naturally as part of the session lifecycle
+
+### What if the client connects mid-session?
+
+When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information.
+
+For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities.
+
+### What alternative approaches did you consider, and why did you settle on this one?
+
+**Alternatives considered:**
+
+1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns.
+
+2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties.
+
+3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing.
+
+4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent.
+
+## Revision history
+
+- 2025-12-07: Initial draft
+- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties.
+- 2025-12-17: Renamed `reasoning_tokens` to `thought_tokens` for consistency with ACP terminology. Removed `percentage` field (clients can compute as `used / size * 100`).
+- 2025-12-19: Renamed `sessionUpdate: "context_update"` to `sessionUpdate: "usage_update"` to better reflect the payload semantics (includes both context window info and cumulative cost).
diff --git a/docs/updates.mdx b/docs/updates.mdx
@@ -4,6 +4,20 @@ description: Updates and announcements about the Agent Client Protocol
 rss: true
 ---
 
+<Update label="January 1, 2026" tags={["RFD"]}>
+## Session Usage RFD moves to Draft stage
+
+The RFD for adding a new `usage_update` variant on the `session/update` notification and `usage` field on prompt responses in the protocol has been moved to Draft stage. Please review the [RFD](./rfds/session-usage) for more information on the current proposal and provide feedback as work on the implementation begins.
+
+</Update>
+
+<Update label="December 31, 2025" tags={["RFD"]}>
+## Proxy Chains RFD moves to Draft stage
+
+The RFD for adding proxy chain functionality in the protocol has been moved to Draft stage. Please review the [RFD](./rfds/proxy-chains) for more information on the current proposal and provide feedback as work on the implementation begins.
+
+</Update>
+
 <Update label="December 11, 2025" tags={["RFD"]}>
 ## Agent Telemetry Export RFD moves to Draft stage