Skip to content

Commit 4b8cfd1

Browse files
swcollardDaleSeo
authored andcommitted
Draft documentation and user guide for mcp otel telemetry
Add telemetry docs to sidebar Cleaning up Fix title on grafana guide Update the overview Update configuration section to be more complete Add a quick start guide and info about prod Update production section with more useful ino Apply suggestions from code review Style edits Co-authored-by: Joseph Caudle <[email protected]> Remove custom from 'custom metrics' Remove Grafana how-to guide Move configuration reference to the config page Add a note about cardinality control using sampling and attribute filtering Fix typo: traaces -> traces
1 parent 6584d85 commit 4b8cfd1

File tree

3 files changed

+218
-0
lines changed

3 files changed

+218
-0
lines changed

docs/source/_sidebar.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ items:
3232
href: "./cors"
3333
- label: "Authorization"
3434
href: "./auth"
35+
- label: "Telemetry"
36+
href: "./telemetry"
3537
- label: "Best Practices"
3638
href: "./best-practices"
3739
- label: "Licensing"

docs/source/config-file.mdx

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ All fields are optional.
2929
| `overrides` | `Overrides` | | Overrides for server behavior |
3030
| `schema` | `SchemaSource` | | Schema configuration |
3131
| `transport` | `Transport` | | The type of server transport to use |
32+
| `telemetry` | `Telemetry` | | Configuration to export metrics and traces via OTLP |
33+
3234

3335
### GraphOS
3436

@@ -224,6 +226,52 @@ transport:
224226
- profile
225227
```
226228
229+
### Telemetry
230+
231+
| Option | Type | Default | Description |
232+
| :-------------- | :---------- | :-------------------------- | :--------------------------------------- |
233+
| `service_name` | `string` | "apollo-mcp-server" | The service name in telemetry data. |
234+
| `version` | `string` | Current crate version | The service version in telemetry data. |
235+
| `exporters` | `Exporters` | `null` (Telemetry disabled) | Configuration for telemetry exporters. |
236+
237+
#### Exporters
238+
239+
| Option | Type | Default | Description |
240+
| :--------- | :---------- | :-------------------------- | :--------------------------------------- |
241+
| `metrics` | `Metrics` | `null` (Metrics disabled) | Configuration for exporting metrics. |
242+
| `tracing` | `Tracing` | `null` (Tracing disabled) | Configuration for exporting traces. |
243+
244+
245+
#### Metrics
246+
247+
| Option | Type | Default | Description |
248+
| :-------------------- | :--------------- | :-------------------------- | :--------------------------------------------- |
249+
| `otlp` | `OTLP Exporter` | `null` (Exporting disabled) | Configuration for exporting metrics via OTLP. |
250+
| `omitted_attributes` | `List<String>` | | List of attributes to be omitted from metrics. |
251+
252+
#### Traces
253+
254+
| Option | Type | Default | Description |
255+
| :-------------------- | :--------------- | :-------------------------- | :--------------------------------------------- |
256+
| `otlp` | `OTLP Exporter` | `null` (Exporting disabled) | Configuration for exporting traces via OTLP. |
257+
| `sampler` | `SamplerOption` | `ALWAYS_ON` | Configuration to control sampling of traces. |
258+
| `omitted_attributes` | `List<String>` | | List of attributes to be omitted from traces. |
259+
260+
#### OTLP Exporter
261+
262+
| Option | Type | Default | Description |
263+
| :--------- | :-------- | :-------------------------- | :--------------------------------------------------------------- |
264+
| `endpoint` | `URL` | `http://localhost:4137` | URL to export data to. Requires full path. |
265+
| `protocol` | `string` | `grpc` | Protocol for export. `grpc` and `http/protobuf` are supported. |
266+
267+
#### SamplerOption
268+
269+
| Option | Type | Description |
270+
| :----------- | :-------- | :------------------------------------------------------- |
271+
| `always_on` | `string` | All traces will be exported. |
272+
| `always_off` | `string` | Sampling is turned off, no traces will be exported. |
273+
| `0.0-1.0` | `f64` | Percentage of traces to export. |
274+
227275
## Example config file
228276

229277
The following example file sets your endpoint to `localhost:4001`, configures transport over Streamable HTTP, enables introspection, and provides two local MCP operations for the server to expose.

docs/source/telemetry.mdx

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
title: OpenTelemetry Integration
3+
---
4+
5+
AI agents create unpredictable usage patterns and complex request flows that are hard to monitor with traditional methods. The Apollo MCP Server's OpenTelemetry integration provides the visibility you need to run a reliable service for AI agents.
6+
7+
## What you can monitor
8+
9+
- **Agent behavior**: Which tools and operations are used most frequently
10+
- **Performance**: Response times and bottlenecks across tool executions and GraphQL operations
11+
- **Reliability**: Error rates, failed operations, and request success patterns
12+
- **Distributed request flows**: Complete traces from agent request through your Apollo Router and subgraphs, with automatic trace context propagation
13+
14+
## How it works
15+
16+
The server exports metrics, traces, and events using the OpenTelemetry Protocol (OTLP), ensuring compatibility with your existing observability stack and seamless integration with other instrumented Apollo services.
17+
18+
## Usage guide
19+
20+
### Quick start: Local development
21+
22+
The fastest way to see Apollo MCP Server telemetry in action is with a local setup that requires only Docker.
23+
24+
#### 5-minute setup
25+
1. Start local observability stack:
26+
<code>docker run -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm -ti grafana/otel-lgtm</code>
27+
1. Add telemetry config to your `config.yaml`:
28+
```yaml
29+
telemetry:
30+
exporters:
31+
metrics:
32+
otlp:
33+
endpoint: "http://localhost:4318/v1/metrics"
34+
protocol: "http/protobuf"
35+
tracing:
36+
otlp:
37+
endpoint: "http://localhost:4318/v1/traces"
38+
protocol: "http/protobuf"
39+
```
40+
1. Restart your MCP server with the updated config
41+
1. Open Grafana at `http://localhost:3000` and explore your telemetry data. Default credentials are username `admin` with password `admin`.
42+
43+
For detailed steps and dashboard examples, see the [complete Grafana setup guide](guides/telemetry-grafana.mdx).
44+
45+
### Production deployment
46+
47+
For production environments, configure your MCP server to send telemetry to any OTLP-compatible backend. The Apollo MCP Server uses standard OpenTelemetry protocols, ensuring compatibility with all major observability platforms.
48+
49+
#### Configuration example
50+
51+
```yaml
52+
telemetry:
53+
service_name: "mcp-server-prod" # Custom service name
54+
exporters:
55+
metrics:
56+
otlp:
57+
endpoint: "https://your-metrics-endpoint"
58+
protocol: "http/protobuf" # or "grpc"
59+
tracing:
60+
otlp:
61+
endpoint: "https://your-traces-endpoint"
62+
protocol: "http/protobuf"
63+
```
64+
65+
#### Observability platform integration
66+
67+
The MCP server works with any OTLP-compatible backend. Consult your provider's documentation for specific endpoint URLs and authentication:
68+
69+
- [Datadog OTLP Integration](https://docs.datadoghq.com/tracing/setup_overview/open_standards/otlp_ingest_in_datadog/) - Native OTLP support
70+
- [New Relic OpenTelemetry](https://docs.newrelic.com/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/) - Direct OTLP ingestion
71+
- [AWS Observability](https://aws-otel.github.io/docs/introduction) - Via AWS Distro for OpenTelemetry
72+
- [Grafana Cloud](https://grafana.com/docs/grafana-cloud/send-data/otlp/) - Hosted Grafana with OTLP
73+
- [Honeycomb](https://docs.honeycomb.io/getting-data-in/opentelemetry/) - OpenTelemetry-native platform
74+
- [Jaeger](https://www.jaegertracing.io/docs/1.50/deployment/) - Self-hosted tracing
75+
- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/deployment/) - Self-hosted with flexible routing
76+
77+
#### Production configuration best practices
78+
79+
##### Environment and security
80+
```yaml
81+
# Set via environment variable
82+
export ENVIRONMENT=production
83+
84+
telemetry:
85+
service_name: "apollo-mcp-server"
86+
version: "1.0.0" # Version for correlation
87+
exporters:
88+
metrics:
89+
otlp:
90+
endpoint: "https://secure-endpoint" # Always use HTTPS
91+
protocol: "http/protobuf" # Generally more reliable than gRPC
92+
```
93+
94+
##### Performance considerations
95+
- **Protocol choice**: `http/protobuf` is often more reliable through firewalls and load balancers than `grpc`
96+
- **Batch export**: OpenTelemetry automatically batches telemetry data for efficiency
97+
- **Network timeouts**: Default timeouts are usually appropriate, but monitor for network issues
98+
99+
##### Resource correlation
100+
- The `ENVIRONMENT` variable automatically tags all telemetry with `deployment.environment.name`
101+
- Use consistent `service_name` across all your Apollo infrastructure (Router, subgraphs, MCP server)
102+
- Set `version` to track releases and correlate issues with deployments
103+
104+
#### Troubleshooting
105+
106+
##### Common issues
107+
- **Connection refused**: Verify endpoint URL and network connectivity
108+
- **Authentication errors**: Check if your provider requires API keys or special headers
109+
- **Missing data**: Confirm your observability platform supports OTLP and is configured to receive data
110+
- **High memory usage**: Monitor telemetry export frequency and consider sampling for high-volume environments
111+
112+
##### Verification
113+
```bash
114+
# Check if telemetry is being exported (look for connection attempts)
115+
curl -v https://your-endpoint/v1/metrics
116+
117+
# Monitor server logs for OpenTelemetry export errors
118+
./apollo-mcp-server --config config.yaml 2>&1 | grep -i "otel\|telemetry"
119+
```
120+
121+
## Configuration Reference
122+
123+
The OpenTelemetry integration is configured via the `telemetry` section of the [configuration reference page](/apollo-mcp-server/config-file#telemetry).
124+
125+
## Emitted Metrics
126+
127+
The server emits the following metrics, which are invaluable for monitoring and alerting. All duration metrics are in milliseconds.
128+
129+
| Metric Name | Type | Description | Attributes |
130+
|---|---|---|---|
131+
| `apollo.mcp.initialize.count` | Counter | Incremented for each `initialize` request. | (none) |
132+
| `apollo.mcp.list_tools.count` | Counter | Incremented for each `list_tools` request. | (none) |
133+
| `apollo.mcp.get_info.count` | Counter | Incremented for each `get_info` request. | (none) |
134+
| `apollo.mcp.tool.count` | Counter | Incremented for each tool call. | `tool_name`, `success` (bool) |
135+
| `apollo.mcp.tool.duration` | Histogram | Measures the execution duration of each tool call. | `tool_name`, `success` (bool) |
136+
| `apollo.mcp.operation.count`| Counter | Incremented for each downstream GraphQL operation executed by a tool. | `operation.id`, `operation.type` ("persisted_query" or "operation"), `success` (bool) |
137+
| `apollo.mcp.operation.duration`| Histogram | Measures the round-trip duration of each downstream GraphQL operation. | `operation.id`, `operation.type`, `success` (bool) |
138+
139+
In addition to these metrics, the server also emits standard [HTTP server metrics](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/) (e.g., `http.server.duration`, `http.server.active_requests`) courtesy of the `axum-otel-metrics` library.
140+
141+
142+
## Emitted Traces
143+
144+
Spans are generated for the following actions:
145+
146+
- **Incoming HTTP Requests**: A root span is created for every HTTP request to the MCP server.
147+
- **MCP Handler Methods**: Nested spans are created for each of the main MCP protocol methods (`initialize`, `call_tool`, `list_tools`).
148+
- **Tool Execution**: `call_tool` spans contain nested spans for the specific tool being executed (e.g., `introspect`, `search`, or a custom GraphQL operation).
149+
- **Downstream GraphQL Calls**: The `execute` tool and custom operation tools create child spans for their outgoing `reqwest` HTTP calls, capturing the duration of the downstream request. The `traceparent` and `tracestate` headers are propagated automatically, enabling distributed traces.
150+
151+
### Cardinality Control
152+
153+
High-cardinality metrics can occur in MCP Servers with large number of tools or when clients are allowed to generate freeform operations.
154+
To prevent performance issues and reduce costs, the Apollo MCP Server provides two mechanisms to control metric cardinality, trace sampling and attribute filtering.
155+
156+
#### Trace Sampling
157+
158+
Configure the Apollo MCP Server to sample traces sent to your OpenTelemetry Collector using the `sampler` field in the `telemetry.tracing` configuration:
159+
160+
- **always_on** - Send every trace
161+
- **always_off** - Disable trace collection entirely
162+
- **0.0-1.0** - Send a specified percentage of traces
163+
164+
#### Attribute Filtering
165+
166+
The Apollo MCP Server configuration also allows for omitting attributes such as `tool_name` or `operation_id` that can often lead to high cardinality metrics in systems that treat each collected attribute value as a new metric.
167+
Both traces and metrics have an `omitted_attributes` option that takes a list of strings. Any attribute name in the list will be filtered out and not sent to the collector.
168+
For detailed configuration options, see the [telemetry configuration reference](/apollo-mcp-server/config-file#telemetry).

0 commit comments

Comments
 (0)