Skip to content

Commit b95e2c8

Browse files
docs(rfd): Draft: Agent Telemetry Export
1 parent 6e58ea3 commit b95e2c8

File tree

1 file changed

+163
-0
lines changed

1 file changed

+163
-0
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
---
2+
title: "Agent Telemetry Export"
3+
---
4+
5+
- Author(s): [@codefromthecrypt](https://github.com/codefromthecrypt)
6+
7+
## Elevator pitch
8+
9+
> What are you proposing to change?
10+
11+
Define how agents export telemetry (logs, metrics, traces) to clients without tunneling it over the ACP transport. Clients run a local telemetry receiver and pass standard OpenTelemetry environment variables when launching agents. This keeps telemetry out-of-band and enables editors to display agent activity, debug issues, and integrate with observability backends.
12+
13+
## Status quo
14+
15+
> How do things work today and what problems does this cause? Why would we change things?
16+
17+
ACP defines how clients launch agents as subprocesses and communicate over stdio. The [meta-propagation RFD](./meta-propagation) addresses trace context propagation via `params._meta`, enabling trace correlation. However, there is no convention for how agents should export the actual telemetry data (spans, metrics, logs).
18+
19+
Without a standard approach:
20+
21+
1. **No visibility into agent behavior** - Editors cannot display what agents are doing (token usage, tool calls, timing)
22+
2. **Difficult debugging** - When agents fail, there's no structured way to capture diagnostics
23+
3. **Fragmented solutions** - Each agent/client pair invents their own telemetry mechanism
24+
4. **Credential exposure risk** - If agents need to send telemetry directly to backends, they need credentials
25+
26+
Tunneling telemetry over the ACP stdio transport is problematic:
27+
28+
- **Head-of-line blocking** - Telemetry traffic could delay agent messages
29+
- **Implementation burden** - ACP would need to define telemetry message formats
30+
- **Coupling** - Agents would need ACP-specific telemetry code instead of standard SDKs
31+
32+
## What we propose to do about it
33+
34+
> What are you proposing to improve the situation?
35+
36+
Clients that want to receive agent telemetry run a local OTLP (OpenTelemetry Protocol) receiver and inject environment variables when launching agent subprocesses:
37+
38+
```
39+
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
40+
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
41+
OTEL_SERVICE_NAME=agent-name
42+
```
43+
44+
Agents using OpenTelemetry SDKs auto-configure from these variables. The client's receiver can:
45+
46+
- Display telemetry in the editor UI (e.g., token counts, timing, errors)
47+
- Forward telemetry to the client's configured observability backend
48+
- Add client-side context before forwarding
49+
50+
This follows the [OpenTelemetry collector deployment pattern](https://opentelemetry.io/docs/collector/deployment/agent/) where a local receiver proxies telemetry to backends.
51+
52+
### Architecture
53+
54+
```
55+
┌────────────────────────────────────────────────────────────┐
56+
│ Client/Editor │
57+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
58+
│ │ ACP Handler │ │OTLP Receiver │───▶│ Exporter │ │
59+
│ └──────────────┘ └──────────────┘ └──────────────┘ │
60+
└────────┬─────────────────────▲──────────────────┬──────────┘
61+
│ stdio │ HTTP │
62+
▼ │ ▼
63+
┌─────────────────────┐ │ ┌───────────────────┐
64+
│ Agent Process │ │ │ Observability │
65+
│ ┌──────────────┐ │ │ │ Backend │
66+
│ │ ACP Agent │ │ │ └───────────────────┘
67+
│ ├──────────────┤ │ │
68+
│ │ OTEL SDK │────────────┘
69+
│ └──────────────┘ │
70+
└─────────────────────┘
71+
```
72+
73+
### Discovery
74+
75+
Environment variables must be set before launching the subprocess, but ACP capability exchange happens after connection. Options for discovery:
76+
77+
1. **Optimistic injection** - Clients inject OTEL environment variables unconditionally. Agents without OpenTelemetry support simply ignore them. This is pragmatic since environment variables are low-cost and OTEL SDKs handle misconfiguration gracefully.
78+
79+
2. **Registry metadata** - Agent registries (like the one proposed in PR #289) could include telemetry support in agent manifests, letting clients know ahead of time.
80+
81+
3. **Manual configuration** - Users configure their client to enable telemetry collection for specific agents.
82+
83+
## Shiny future
84+
85+
> How will things will play out once this feature exists?
86+
87+
1. **Editor integration** - Editors can show agent activity: token usage, tool call timing, model switches, errors
88+
2. **Unified debugging** - When agents fail, structured telemetry is available for diagnosis
89+
3. **End-to-end traces** - Combined with `params._meta` trace propagation, traces flow from client through agent to any downstream services
90+
4. **No credential sharing** - Agents never see backend credentials; the client handles authentication
91+
5. **Standard SDKs** - Agent authors use normal OpenTelemetry SDKs that work in any context, not ACP-specific code
92+
93+
## Implementation details
94+
95+
> Tell me more about your implementation. What is your detailed implementation plan?
96+
97+
### 1. Create `docs/protocol/observability.mdx`
98+
99+
Add a new protocol documentation page covering observability practices for ACP. This page will describe:
100+
101+
**For Clients/Editors:**
102+
- Running an OTLP receiver to collect agent telemetry
103+
- Injecting `OTEL_EXPORTER_*` environment variables when launching agent subprocesses
104+
- Respecting user-configured `OTEL_*` variables (do not override if already set)
105+
- Forwarding telemetry to configured backends with client credentials
106+
107+
**For Agent Authors:**
108+
- Using OpenTelemetry SDKs with standard auto-configuration
109+
- Recommended spans, metrics, and log patterns for agent operations
110+
- How telemetry flows when `OTEL_*` variables are present vs absent
111+
112+
### 2. Update `docs/protocol/extensibility.mdx`
113+
114+
Add a section linking to the new observability doc, similar to how extensibility concepts relate to other protocol features. Add a brief mention that observability practices (telemetry export) are documented separately.
115+
116+
### 3. Update `docs/docs.json`
117+
118+
Add `protocol/observability` to the Protocol navigation group.
119+
120+
## Frequently asked questions
121+
122+
> What questions have arisen over the course of authoring this document or during subsequent discussions?
123+
124+
### How does this relate to trace propagation in `params._meta`?
125+
126+
They are complementary:
127+
128+
- **Trace propagation** (`params._meta` with `traceparent`, etc.) passes trace context so spans can be correlated
129+
- **Telemetry export** (this RFD) defines where agents send the actual span/metric/log data
130+
131+
Both are needed for end-to-end observability.
132+
133+
### What if an agent doesn't use OpenTelemetry?
134+
135+
Agents without OTEL SDKs simply ignore the environment variables. No harm is done. Over time, as more agents adopt OpenTelemetry, the ecosystem benefits.
136+
137+
### What if the user already configured `OTEL_*` environment variables?
138+
139+
If `OTEL_*` variables are already set in the environment, clients should not override them. User-configured telemetry settings take precedence, allowing users to direct agent telemetry to their own backends when desired.
140+
141+
### Why not define ACP-specific telemetry messages?
142+
143+
This would duplicate OTLP functionality, add implementation burden to ACP, and force agent authors to use non-standard APIs. Using OTLP means agents work with standard tooling and documentation.
144+
145+
### What about agents that aren't launched as subprocesses?
146+
147+
This RFD focuses on the stdio transport where clients launch agents. For other transports (HTTP, etc.), agents would need alternative configuration mechanisms, which could be addressed in future RFDs.
148+
149+
### What alternative approaches did you consider, and why did you settle on this one?
150+
151+
1. **Tunneling telemetry over ACP** - Rejected due to head-of-line blocking concerns and implementation complexity
152+
2. **Agents export directly to backends** - Rejected because it requires sharing credentials with agents
153+
3. **File-based telemetry** - Rejected because it doesn't support real-time display and adds complexity
154+
155+
The environment variable approach:
156+
- Uses existing standards (OTLP, OpenTelemetry SDK conventions)
157+
- Keeps telemetry out-of-band from ACP messages
158+
- Lets clients control where telemetry goes without exposing credentials
159+
- Requires no changes to ACP message formats
160+
161+
## Revision history
162+
163+
- 2025-12-04: Initial draft

0 commit comments

Comments
 (0)