|
| 1 | +--- |
| 2 | +title: "Agent Telemetry Export" |
| 3 | +--- |
| 4 | + |
| 5 | +- Author(s): [@codefromthecrypt](https://github.com/codefromthecrypt) |
| 6 | + |
| 7 | +## Elevator pitch |
| 8 | + |
| 9 | +> What are you proposing to change? |
| 10 | +
|
| 11 | +Define how agents export telemetry (logs, metrics, traces) to clients without tunneling it over the ACP transport. Clients run a local telemetry receiver and pass standard OpenTelemetry environment variables when launching agents. This keeps telemetry out-of-band and enables editors to display agent activity, debug issues, and integrate with observability backends. |
| 12 | + |
| 13 | +## Status quo |
| 14 | + |
| 15 | +> How do things work today and what problems does this cause? Why would we change things? |
| 16 | +
|
| 17 | +ACP defines how clients launch agents as subprocesses and communicate over stdio. The [meta-propagation RFD](./meta-propagation) addresses trace context propagation via `params._meta`, enabling trace correlation. However, there is no convention for how agents should export the actual telemetry data (spans, metrics, logs). |
| 18 | + |
| 19 | +Without a standard approach: |
| 20 | + |
| 21 | +1. **No visibility into agent behavior** - Editors cannot display what agents are doing (token usage, tool calls, timing) |
| 22 | +2. **Difficult debugging** - When agents fail, there's no structured way to capture diagnostics |
| 23 | +3. **Fragmented solutions** - Each agent/client pair invents their own telemetry mechanism |
| 24 | +4. **Credential exposure risk** - If agents need to send telemetry directly to backends, they need credentials |
| 25 | + |
| 26 | +Tunneling telemetry over the ACP stdio transport is problematic: |
| 27 | + |
| 28 | +- **Head-of-line blocking** - Telemetry traffic could delay agent messages |
| 29 | +- **Implementation burden** - ACP would need to define telemetry message formats |
| 30 | +- **Coupling** - Agents would need ACP-specific telemetry code instead of standard SDKs |
| 31 | + |
| 32 | +## What we propose to do about it |
| 33 | + |
| 34 | +> What are you proposing to improve the situation? |
| 35 | +
|
| 36 | +Clients that want to receive agent telemetry run a local OTLP (OpenTelemetry Protocol) receiver and inject environment variables when launching agent subprocesses: |
| 37 | + |
| 38 | +``` |
| 39 | +OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 |
| 40 | +OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf |
| 41 | +OTEL_SERVICE_NAME=agent-name |
| 42 | +``` |
| 43 | + |
| 44 | +Agents using OpenTelemetry SDKs auto-configure from these variables. The client's receiver can: |
| 45 | + |
| 46 | +- Display telemetry in the editor UI (e.g., token counts, timing, errors) |
| 47 | +- Forward telemetry to the client's configured observability backend |
| 48 | +- Add client-side context before forwarding |
| 49 | + |
| 50 | +This follows the [OpenTelemetry collector deployment pattern](https://opentelemetry.io/docs/collector/deployment/agent/) where a local receiver proxies telemetry to backends. |
| 51 | + |
| 52 | +### Architecture |
| 53 | + |
| 54 | +``` |
| 55 | +┌────────────────────────────────────────────────────────────┐ |
| 56 | +│ Client/Editor │ |
| 57 | +│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ |
| 58 | +│ │ ACP Handler │ │OTLP Receiver │───▶│ Exporter │ │ |
| 59 | +│ └──────────────┘ └──────────────┘ └──────────────┘ │ |
| 60 | +└────────┬─────────────────────▲──────────────────┬──────────┘ |
| 61 | + │ stdio │ HTTP │ |
| 62 | + ▼ │ ▼ |
| 63 | +┌─────────────────────┐ │ ┌───────────────────┐ |
| 64 | +│ Agent Process │ │ │ Observability │ |
| 65 | +│ ┌──────────────┐ │ │ │ Backend │ |
| 66 | +│ │ ACP Agent │ │ │ └───────────────────┘ |
| 67 | +│ ├──────────────┤ │ │ |
| 68 | +│ │ OTEL SDK │────────────┘ |
| 69 | +│ └──────────────┘ │ |
| 70 | +└─────────────────────┘ |
| 71 | +``` |
| 72 | + |
| 73 | +### Discovery |
| 74 | + |
| 75 | +Environment variables must be set before launching the subprocess, but ACP capability exchange happens after connection. Options for discovery: |
| 76 | + |
| 77 | +1. **Optimistic injection** - Clients inject OTEL environment variables unconditionally. Agents without OpenTelemetry support simply ignore them. This is pragmatic since environment variables are low-cost and OTEL SDKs handle misconfiguration gracefully. |
| 78 | + |
| 79 | +2. **Registry metadata** - Agent registries (like the one proposed in PR #289) could include telemetry support in agent manifests, letting clients know ahead of time. |
| 80 | + |
| 81 | +3. **Manual configuration** - Users configure their client to enable telemetry collection for specific agents. |
| 82 | + |
| 83 | +## Shiny future |
| 84 | + |
| 85 | +> How will things will play out once this feature exists? |
| 86 | +
|
| 87 | +1. **Editor integration** - Editors can show agent activity: token usage, tool call timing, model switches, errors |
| 88 | +2. **Unified debugging** - When agents fail, structured telemetry is available for diagnosis |
| 89 | +3. **End-to-end traces** - Combined with `params._meta` trace propagation, traces flow from client through agent to any downstream services |
| 90 | +4. **No credential sharing** - Agents never see backend credentials; the client handles authentication |
| 91 | +5. **Standard SDKs** - Agent authors use normal OpenTelemetry SDKs that work in any context, not ACP-specific code |
| 92 | + |
| 93 | +## Implementation details |
| 94 | + |
| 95 | +> Tell me more about your implementation. What is your detailed implementation plan? |
| 96 | +
|
| 97 | +### 1. Create `docs/protocol/observability.mdx` |
| 98 | + |
| 99 | +Add a new protocol documentation page covering observability practices for ACP. This page will describe: |
| 100 | + |
| 101 | +**For Clients/Editors:** |
| 102 | +- Running an OTLP receiver to collect agent telemetry |
| 103 | +- Injecting `OTEL_EXPORTER_*` environment variables when launching agent subprocesses |
| 104 | +- Respecting user-configured `OTEL_*` variables (do not override if already set) |
| 105 | +- Forwarding telemetry to configured backends with client credentials |
| 106 | + |
| 107 | +**For Agent Authors:** |
| 108 | +- Using OpenTelemetry SDKs with standard auto-configuration |
| 109 | +- Recommended spans, metrics, and log patterns for agent operations |
| 110 | +- How telemetry flows when `OTEL_*` variables are present vs absent |
| 111 | + |
| 112 | +### 2. Update `docs/protocol/extensibility.mdx` |
| 113 | + |
| 114 | +Add a section linking to the new observability doc, similar to how extensibility concepts relate to other protocol features. Add a brief mention that observability practices (telemetry export) are documented separately. |
| 115 | + |
| 116 | +### 3. Update `docs/docs.json` |
| 117 | + |
| 118 | +Add `protocol/observability` to the Protocol navigation group. |
| 119 | + |
| 120 | +## Frequently asked questions |
| 121 | + |
| 122 | +> What questions have arisen over the course of authoring this document or during subsequent discussions? |
| 123 | +
|
| 124 | +### How does this relate to trace propagation in `params._meta`? |
| 125 | + |
| 126 | +They are complementary: |
| 127 | + |
| 128 | +- **Trace propagation** (`params._meta` with `traceparent`, etc.) passes trace context so spans can be correlated |
| 129 | +- **Telemetry export** (this RFD) defines where agents send the actual span/metric/log data |
| 130 | + |
| 131 | +Both are needed for end-to-end observability. |
| 132 | + |
| 133 | +### What if an agent doesn't use OpenTelemetry? |
| 134 | + |
| 135 | +Agents without OTEL SDKs simply ignore the environment variables. No harm is done. Over time, as more agents adopt OpenTelemetry, the ecosystem benefits. |
| 136 | + |
| 137 | +### What if the user already configured `OTEL_*` environment variables? |
| 138 | + |
| 139 | +If `OTEL_*` variables are already set in the environment, clients should not override them. User-configured telemetry settings take precedence, allowing users to direct agent telemetry to their own backends when desired. |
| 140 | + |
| 141 | +### Why not define ACP-specific telemetry messages? |
| 142 | + |
| 143 | +This would duplicate OTLP functionality, add implementation burden to ACP, and force agent authors to use non-standard APIs. Using OTLP means agents work with standard tooling and documentation. |
| 144 | + |
| 145 | +### What about agents that aren't launched as subprocesses? |
| 146 | + |
| 147 | +This RFD focuses on the stdio transport where clients launch agents. For other transports (HTTP, etc.), agents would need alternative configuration mechanisms, which could be addressed in future RFDs. |
| 148 | + |
| 149 | +### What alternative approaches did you consider, and why did you settle on this one? |
| 150 | + |
| 151 | +1. **Tunneling telemetry over ACP** - Rejected due to head-of-line blocking concerns and implementation complexity |
| 152 | +2. **Agents export directly to backends** - Rejected because it requires sharing credentials with agents |
| 153 | +3. **File-based telemetry** - Rejected because it doesn't support real-time display and adds complexity |
| 154 | + |
| 155 | +The environment variable approach: |
| 156 | +- Uses existing standards (OTLP, OpenTelemetry SDK conventions) |
| 157 | +- Keeps telemetry out-of-band from ACP messages |
| 158 | +- Lets clients control where telemetry goes without exposing credentials |
| 159 | +- Requires no changes to ACP message formats |
| 160 | + |
| 161 | +## Revision history |
| 162 | + |
| 163 | +- 2025-12-04: Initial draft |
0 commit comments