Skip to content

Commit 424d05f

Browse files
docs(rfd): Add session usage and context status RFD (#316)
* docs(rfd): Add session usage and context status RFD Proposes standardized tracking of token usage, cost estimation, and context window status across ACP implementations. - Token usage reported in PromptResponse (per-turn data) - Context window and cost reported in session/status (session state) * docs(rfd): Update session usage tracking to use session/update notifications Refines the tracking of context window and cost information by transitioning from `session/status` requests to `session/update` notifications. This change allows agents to proactively push updates, enhancing flexibility and real-time data availability for clients. The `cost` field is now optional, and the `remaining` field has been removed, as clients can compute it from `size` and `used`. Updated documentation to reflect these changes and provide clearer usage patterns. * docs(rfd): Rename `reasoning_tokens` to `thought_tokens` and update context update fields * docs(rfd): Update session usage terminology from `context_update` to `usage_update` * Add to website --------- Co-authored-by: Ben Brandt <[email protected]>
1 parent a489734 commit 424d05f

File tree

3 files changed

+339
-1
lines changed

3 files changed

+339
-1
lines changed

docs/docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,8 @@
112112
"rfds/meta-propagation",
113113
"rfds/session-info-update",
114114
"rfds/agent-telemetry-export",
115-
"rfds/proxy-chains"
115+
"rfds/proxy-chains",
116+
"rfds/session-usage"
116117
]
117118
},
118119
{ "group": "Preview", "pages": [] },

docs/rfds/session-usage.mdx

Lines changed: 323 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,323 @@
1+
---
2+
title: "Session Usage and Context Status"
3+
---
4+
5+
- Author(s): [@ahmedhesham6](https://github.com/ahmedhesham6)
6+
- Champion: [@benbrandt](https://github.com/benbrandt)
7+
8+
## Elevator pitch
9+
10+
> What are you proposing to change?
11+
12+
Add standardized usage and context window tracking to the Agent Client Protocol, enabling agents to report token consumption, cost estimates, and context window utilization in a consistent way across implementations.
13+
14+
## Status quo
15+
16+
> How do things work today and what problems does this cause? Why would we change things?
17+
18+
Currently, the ACP protocol has no standardized way for agents to communicate:
19+
20+
1. **Token usage** - How many tokens were consumed in a turn or cumulatively
21+
2. **Context window status** - How much of the model's context window is being used
22+
3. **Cost information** - Estimated costs for API usage
23+
4. **Prompt caching metrics** - Cache hits/misses for models that support caching
24+
25+
This creates several problems:
26+
27+
- **No visibility into resource consumption** - Clients can't show users how much of their context budget is being used
28+
- **No cost transparency** - Users can't track spending or estimate costs before operations
29+
- **No context management** - Clients can't warn users when approaching context limits or suggest compaction
30+
- **Inconsistent implementations** - Each agent implements usage tracking differently (if at all)
31+
32+
Industry research shows common patterns across AI coding tools:
33+
34+
- LLM providers return cumulative token counts in API responses
35+
- IDE extensions display context percentage prominently (e.g., radial progress showing "19%")
36+
- Clients show absolute numbers on hover/detail (e.g., "31.4K of 200K tokens")
37+
- Tools warn users at threshold percentages (75%, 90%, 95%)
38+
- Auto-compaction features trigger when approaching context limits
39+
- Cost tracking focuses on cumulative session totals rather than per-turn breakdowns
40+
41+
## What we propose to do about it
42+
43+
> What are you proposing to improve the situation?
44+
45+
We propose separating usage tracking into two distinct concerns:
46+
47+
1. **Token usage** - Reported in `PromptResponse` after each turn (per-turn data)
48+
2. **Context window and cost** - Reported via `session/update` notifications with `sessionUpdate: "usage_update"` (session state)
49+
50+
This separation reflects how users consume this information:
51+
52+
- Token counts are tied to specific turns and useful immediately after a prompt
53+
- Context window and cost are cumulative session state that agents push proactively when available
54+
55+
Agents send context updates at appropriate times:
56+
57+
- On `session/new` response (if agent can query usage immediately)
58+
- On `session/load` / `session/resume` (for resumed/forked sessions)
59+
- After each `session/prompt` response (when usage data becomes available)
60+
- Anytime context window state changes significantly
61+
62+
This approach provides flexibility for different agent implementations:
63+
64+
- Agents that support getting current usage without a prompt can immediately send updates when creating, resuming, or forking chats
65+
- Agents that only provide usage when actively prompting can send updates after sending a new prompt
66+
67+
### Token Usage in `PromptResponse`
68+
69+
Add a `usage` field to `PromptResponse` for token consumption tracking:
70+
71+
```json
72+
{
73+
"jsonrpc": "2.0",
74+
"id": 1,
75+
"result": {
76+
"sessionId": "sess_abc123",
77+
"stopReason": "end_turn",
78+
"usage": {
79+
"total_tokens": 53000,
80+
"input_tokens": 35000,
81+
"output_tokens": 12000,
82+
"thought_tokens": 5000,
83+
"cached_read_tokens": 5000,
84+
"cached_write_tokens": 1000
85+
}
86+
}
87+
}
88+
```
89+
90+
#### Usage Fields
91+
92+
- `total_tokens` (number, required) - Sum of all token types across session
93+
- `input_tokens` (number, required) - Total input tokens across all turns
94+
- `output_tokens` (number, required) - Total output tokens across all turns
95+
- `thought_tokens` (number, optional) - Total thought/reasoning tokens (for o1/o3 models)
96+
- `cached_read_tokens` (number, optional) - Total cache read tokens
97+
- `cached_write_tokens` (number, optional) - Total cache write tokens
98+
99+
### Context Window and Cost via `session/update`
100+
101+
Agents send context window and cost information via `session/update` notifications with `sessionUpdate: "usage_update"`:
102+
103+
```json
104+
{
105+
"jsonrpc": "2.0",
106+
"method": "session/update",
107+
"params": {
108+
"sessionId": "sess_abc123",
109+
"update": {
110+
"sessionUpdate": "usage_update",
111+
"used": 53000,
112+
"size": 200000
113+
}
114+
}
115+
}
116+
```
117+
118+
#### Context Window Fields (required)
119+
120+
- `used` (number, required) - Tokens currently in context
121+
- `size` (number, required) - Total context window size in tokens
122+
123+
Note: Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100` if needed.
124+
125+
#### Cost Fields (optional)
126+
127+
- `cost` (object, optional) - Cumulative session cost
128+
- `amount` (number, required) - Total cumulative cost for session
129+
- `currency` (string, required) - ISO 4217 currency code (e.g., "USD", "EUR")
130+
131+
Example with optional cost:
132+
133+
```json
134+
{
135+
"jsonrpc": "2.0",
136+
"method": "session/update",
137+
"params": {
138+
"sessionId": "sess_abc123",
139+
"update": {
140+
"sessionUpdate": "usage_update",
141+
"used": 53000,
142+
"size": 200000,
143+
"cost": {
144+
"amount": 0.045,
145+
"currency": "USD"
146+
}
147+
}
148+
}
149+
}
150+
```
151+
152+
### Design Principles
153+
154+
1. **Separation of concerns** - Token usage is per-turn data, context window and cost are session state
155+
2. **Agent-pushed notifications** - Agents proactively send context updates when data becomes available, following the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
156+
3. **Agent calculates, client can verify** - Agent knows its model best and provides calculations, but includes raw data for client verification
157+
4. **Flexible cost reporting** - Cost is optional since not all agents track it. Support any currency, don't assume USD
158+
5. **Prompt caching support** - Include cache read/write tokens for models that support it
159+
6. **Optional but recommended** - Usage tracking is optional to maintain backward compatibility
160+
7. **Flexible timing** - Agents send updates when they can: immediately for agents with on-demand APIs, or after prompts for agents that only provide usage during active prompting
161+
162+
## Shiny future
163+
164+
> How will things will play out once this feature exists?
165+
166+
**For Users:**
167+
168+
- **Visibility**: Users see real-time context window usage with percentage indicators
169+
- **Cost awareness**: Users can track spending and check cumulative cost at any time
170+
- **Better planning**: Users know when to start new sessions or compact context
171+
- **Transparency**: Clear understanding of resource consumption
172+
173+
**For Client Implementations:**
174+
175+
- **Consistent UI**: All clients can show usage in a standard way (progress bars, percentages, warnings)
176+
- **Smart warnings**: Clients can warn users at 75%, 90% context usage
177+
- **Cost controls**: Clients can implement budget limits and alerts
178+
- **Analytics**: Clients can track usage patterns and optimize
179+
- **Reactive updates**: Clients receive context updates reactively via notifications, updating UI immediately when agents push new data
180+
- **No polling needed**: Updates arrive automatically when agents have new information, eliminating the need for clients to poll
181+
182+
**For Agent Implementations:**
183+
184+
- **Standard reporting**: Clear contract for what to report and when
185+
- **Flexibility**: Optional fields allow agents to report what they can calculate
186+
- **Model diversity**: Works with any model (GPT, Claude, Llama, etc.)
187+
- **Caching support**: First-class support for prompt caching
188+
189+
## Implementation details and plan
190+
191+
> Tell me more about your implementation. What is your detailed implementation plan?
192+
193+
1. **Update schema.json** to add:
194+
- `Usage` type with token fields
195+
- `Cost` type with `amount` and `currency` fields
196+
- `ContextUpdate` type with `used`, `size` (required) and optional `cost` field
197+
- Add optional `usage` field to `PromptResponse`
198+
- Add `UsageUpdate` variant to `SessionUpdate` oneOf array (with `sessionUpdate: "usage_update"`)
199+
200+
2. **Update protocol documentation**:
201+
- Document `usage` field in `/docs/protocol/prompt-turn.mdx`
202+
- Document `session/update` notification with `sessionUpdate: "usage_update"` variant
203+
- Add examples showing typical usage patterns and when agents send context updates
204+
205+
## Frequently asked questions
206+
207+
> What questions have arisen over the course of authoring this document or during subsequent discussions?
208+
209+
### Why separate token usage from context window and cost?
210+
211+
Different users care about different things at different times:
212+
213+
- **Token counts**: Relevant immediately after a turn completes to understand the breakdown
214+
- **Context window remaining**: Relevant at any time, especially before issuing a large prompt. "Do I need to handoff or compact?"
215+
- **Cumulative cost**: Session-level state that agents push when available
216+
217+
Separating them allows:
218+
219+
- Cleaner data model where per-turn data stays in turn responses
220+
- Agents to push context updates proactively when data becomes available
221+
- Clients to receive updates reactively without needing to poll
222+
223+
### Why is cost in session/update instead of PromptResponse?
224+
225+
Cost is cumulative session state, similar to context window:
226+
227+
- Users want to track total spending, not just per-turn costs
228+
- Keeps `PromptResponse` focused on per-turn token breakdown
229+
- Both cost and context window are session-level metrics that belong together
230+
- Cost is optional since not all agents track it
231+
232+
### How do users know when to handoff or compact the context?
233+
234+
The context update notification provides everything needed:
235+
236+
- `used` and `size` give absolute numbers for precise tracking
237+
- Clients can compute `remaining` as `size - used` and `percentage` as `used / size * 100`
238+
- `size` lets clients understand the total budget
239+
240+
**Recommended client behavior:**
241+
242+
| Percentage | Action |
243+
| ---------- | ---------------------------------------------------------------- |
244+
| < 75% | Normal operation |
245+
| 75-90% | Yellow indicator, suggest "Context filling up" |
246+
| 90-95% | Orange indicator, recommend "Start new session or summarize" |
247+
| > 95% | Red indicator, warn "Next prompt may fail - handoff recommended" |
248+
249+
Clients can also:
250+
251+
- Offer "Compact context" or "Summarize conversation" actions
252+
- Auto-suggest starting a new session
253+
- Implement automatic handoff when approaching limits
254+
255+
### Why not assume USD for cost?
256+
257+
Agents may bill in different currencies:
258+
259+
- European agents might bill in EUR
260+
- Asian agents might bill in JPY or CNY
261+
- Some agents might use credits or points
262+
- Currency conversion rates change
263+
264+
Better to report actual billing currency and let clients convert if needed.
265+
266+
### What if the agent can't calculate some fields?
267+
268+
All fields except the basic token counts are optional. Agents report what they can calculate. Clients handle missing fields gracefully.
269+
270+
### How does this work with streaming responses?
271+
272+
- During streaming: Agents may send progressive context updates via `session/update` notifications as usage changes
273+
- Final response: Include complete token usage in `PromptResponse`
274+
- Context window and cost: Agents send `session/update` notifications with `sessionUpdate: "usage_update"` when data becomes available (after prompt completion, on session creation/resume, or when context state changes significantly)
275+
276+
### What about models without fixed context windows?
277+
278+
- Report effective context window size
279+
- For models with dynamic windows, report current limit
280+
- Update size if it changes
281+
- Set to `null` if truly unlimited (rare)
282+
283+
### What about rate limits and quotas?
284+
285+
This RFD focuses on token usage and context windows. Rate limits and quotas are a separate concern that could be addressed in a future RFD. However, the cost tracking here helps users understand their usage against quota limits.
286+
287+
### Should cached tokens count toward context window?
288+
289+
Yes, cached tokens still occupy context window space. They're just cheaper to process. The context window usage should include all tokens (regular + cached).
290+
291+
### Why notification instead of request?
292+
293+
Using `session/update` notifications instead of a `session/status` request provides several benefits:
294+
295+
1. **Consistency**: Follows the same pattern as other dynamic session properties (`available_commands_update`, `current_mode_update`, `session_info_update`)
296+
2. **Agent flexibility**: Agents can send updates when they have data available, whether that's immediately (for agents with on-demand APIs) or after prompts (for agents that only provide usage during active prompting)
297+
3. **No polling**: Clients receive updates reactively without needing to poll
298+
4. **Real-time updates**: Updates flow naturally as part of the session lifecycle
299+
300+
### What if the client connects mid-session?
301+
302+
When a client connects to an existing session (via `session/load` or `session/resume`), agents **SHOULD** send a context update notification if they have current usage data available. This ensures the client UI can immediately display accurate context window and cost information.
303+
304+
For agents that only provide usage during active prompting, the client UI may not show usage until after the first prompt is sent, which is acceptable given the agent's capabilities.
305+
306+
### What alternative approaches did you consider, and why did you settle on this one?
307+
308+
**Alternatives considered:**
309+
310+
1. **Everything in PromptResponse** - Simpler, but context window and cost are session state that users may want to track independently of turns.
311+
312+
2. **Request/response (`session/status`)** - Requires clients to poll, and some agents don't have APIs to query current status without a prompt. The notification approach is more flexible and consistent with other dynamic session properties.
313+
314+
3. **Client calculates everything** - Rejected because client doesn't know model's tokenizer, exact context window size, or pricing.
315+
316+
4. **Only percentage, no raw tokens** - Rejected because users want absolute numbers, clients can't verify calculations, and it's less transparent.
317+
318+
## Revision history
319+
320+
- 2025-12-07: Initial draft
321+
- 2025-12-13: Changed from `session/status` request method to `session/update` notification with `sessionUpdate: "context_update"`. Made `cost` optional and removed `remaining` field (clients can compute as `size - used`). This approach provides more flexibility for agents and follows the same pattern as other dynamic session properties.
322+
- 2025-12-17: Renamed `reasoning_tokens` to `thought_tokens` for consistency with ACP terminology. Removed `percentage` field (clients can compute as `used / size * 100`).
323+
- 2025-12-19: Renamed `sessionUpdate: "context_update"` to `sessionUpdate: "usage_update"` to better reflect the payload semantics (includes both context window info and cumulative cost).

docs/updates.mdx

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,20 @@ description: Updates and announcements about the Agent Client Protocol
44
rss: true
55
---
66

7+
<Update label="January 1, 2026" tags={["RFD"]}>
8+
## Session Usage RFD moves to Draft stage
9+
10+
The RFD for adding a new `usage_update` variant on the `session/update` notification and `usage` field on prompt responses in the protocol has been moved to Draft stage. Please review the [RFD](./rfds/session-usage) for more information on the current proposal and provide feedback as work on the implementation begins.
11+
12+
</Update>
13+
14+
<Update label="December 31, 2025" tags={["RFD"]}>
15+
## Proxy Chains RFD moves to Draft stage
16+
17+
The RFD for adding proxy chain functionality in the protocol has been moved to Draft stage. Please review the [RFD](./rfds/proxy-chains) for more information on the current proposal and provide feedback as work on the implementation begins.
18+
19+
</Update>
20+
721
<Update label="December 11, 2025" tags={["RFD"]}>
822
## Agent Telemetry Export RFD moves to Draft stage
923

0 commit comments

Comments
 (0)