|
| 1 | +--- |
| 2 | +title: OpenTelemetry Integration |
| 3 | +--- |
| 4 | + |
| 5 | +AI agents create unpredictable usage patterns and complex request flows that are hard to monitor with traditional methods. The Apollo MCP Server's OpenTelemetry integration provides the visibility you need to run a reliable service for AI agents. |
| 6 | + |
| 7 | +## What you can monitor |
| 8 | + |
| 9 | +- **Agent behavior**: Which tools and operations are used most frequently |
| 10 | +- **Performance**: Response times and bottlenecks across tool executions and GraphQL operations |
| 11 | +- **Reliability**: Error rates, failed operations, and request success patterns |
| 12 | +- **Distributed request flows**: Complete traces from agent request through your Apollo Router and subgraphs, with automatic trace context propagation |
| 13 | + |
| 14 | +## How it works |
| 15 | + |
| 16 | +The server exports metrics, traces, and events using the OpenTelemetry Protocol (OTLP), ensuring compatibility with your existing observability stack and seamless integration with other instrumented Apollo services. |
| 17 | + |
| 18 | +## Usage guide |
| 19 | + |
| 20 | +### Quick start: Local development |
| 21 | + |
| 22 | +The fastest way to see Apollo MCP Server telemetry in action is with a local setup that requires only Docker. |
| 23 | + |
| 24 | +#### 5-minute setup |
| 25 | +1. Start local observability stack: |
| 26 | +<code>docker run -p 3000:3000 -p 4317:4317 -p 4318:4318 --rm -ti grafana/otel-lgtm</code> |
| 27 | +1. Add telemetry config to your `config.yaml`: |
| 28 | + ```yaml |
| 29 | + telemetry: |
| 30 | + exporters: |
| 31 | + metrics: |
| 32 | + otlp: |
| 33 | + endpoint: "http://localhost:4318/v1/metrics" |
| 34 | + protocol: "http/protobuf" |
| 35 | + tracing: |
| 36 | + otlp: |
| 37 | + endpoint: "http://localhost:4318/v1/traces" |
| 38 | + protocol: "http/protobuf" |
| 39 | + ``` |
| 40 | +1. Restart your MCP server with the updated config |
| 41 | +1. Open Grafana at `http://localhost:3000` and explore your telemetry data. Default credentials are username `admin` with password `admin`. |
| 42 | + |
| 43 | +For detailed steps and dashboard examples, see the [complete Grafana setup guide](guides/telemetry-grafana.mdx). |
| 44 | + |
| 45 | +### Production deployment |
| 46 | + |
| 47 | +For production environments, configure your MCP server to send telemetry to any OTLP-compatible backend. The Apollo MCP Server uses standard OpenTelemetry protocols, ensuring compatibility with all major observability platforms. |
| 48 | + |
| 49 | +#### Configuration example |
| 50 | + |
| 51 | +```yaml |
| 52 | +telemetry: |
| 53 | + service_name: "mcp-server-prod" # Custom service name |
| 54 | + exporters: |
| 55 | + metrics: |
| 56 | + otlp: |
| 57 | + endpoint: "https://your-metrics-endpoint" |
| 58 | + protocol: "http/protobuf" # or "grpc" |
| 59 | + tracing: |
| 60 | + otlp: |
| 61 | + endpoint: "https://your-traces-endpoint" |
| 62 | + protocol: "http/protobuf" |
| 63 | +``` |
| 64 | + |
| 65 | +#### Observability platform integration |
| 66 | + |
| 67 | +The MCP server works with any OTLP-compatible backend. Consult your provider's documentation for specific endpoint URLs and authentication: |
| 68 | + |
| 69 | +- [Datadog OTLP Integration](https://docs.datadoghq.com/tracing/setup_overview/open_standards/otlp_ingest_in_datadog/) - Native OTLP support |
| 70 | +- [New Relic OpenTelemetry](https://docs.newrelic.com/docs/more-integrations/open-source-telemetry-integrations/opentelemetry/) - Direct OTLP ingestion |
| 71 | +- [AWS Observability](https://aws-otel.github.io/docs/introduction) - Via AWS Distro for OpenTelemetry |
| 72 | +- [Grafana Cloud](https://grafana.com/docs/grafana-cloud/send-data/otlp/) - Hosted Grafana with OTLP |
| 73 | +- [Honeycomb](https://docs.honeycomb.io/getting-data-in/opentelemetry/) - OpenTelemetry-native platform |
| 74 | +- [Jaeger](https://www.jaegertracing.io/docs/1.50/deployment/) - Self-hosted tracing |
| 75 | +- [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/deployment/) - Self-hosted with flexible routing |
| 76 | + |
| 77 | +#### Production configuration best practices |
| 78 | + |
| 79 | +##### Environment and security |
| 80 | +```yaml |
| 81 | +# Set via environment variable |
| 82 | +export ENVIRONMENT=production |
| 83 | +
|
| 84 | +telemetry: |
| 85 | + service_name: "apollo-mcp-server" |
| 86 | + version: "1.0.0" # Version for correlation |
| 87 | + exporters: |
| 88 | + metrics: |
| 89 | + otlp: |
| 90 | + endpoint: "https://secure-endpoint" # Always use HTTPS |
| 91 | + protocol: "http/protobuf" # Generally more reliable than gRPC |
| 92 | +``` |
| 93 | + |
| 94 | +##### Performance considerations |
| 95 | +- **Protocol choice**: `http/protobuf` is often more reliable through firewalls and load balancers than `grpc` |
| 96 | +- **Batch export**: OpenTelemetry automatically batches telemetry data for efficiency |
| 97 | +- **Network timeouts**: Default timeouts are usually appropriate, but monitor for network issues |
| 98 | + |
| 99 | +##### Resource correlation |
| 100 | +- The `ENVIRONMENT` variable automatically tags all telemetry with `deployment.environment.name` |
| 101 | +- Use consistent `service_name` across all your Apollo infrastructure (Router, subgraphs, MCP server) |
| 102 | +- Set `version` to track releases and correlate issues with deployments |
| 103 | + |
| 104 | +#### Troubleshooting |
| 105 | + |
| 106 | +##### Common issues |
| 107 | +- **Connection refused**: Verify endpoint URL and network connectivity |
| 108 | +- **Authentication errors**: Check if your provider requires API keys or special headers |
| 109 | +- **Missing data**: Confirm your observability platform supports OTLP and is configured to receive data |
| 110 | +- **High memory usage**: Monitor telemetry export frequency and consider sampling for high-volume environments |
| 111 | + |
| 112 | +##### Verification |
| 113 | +```bash |
| 114 | +# Check if telemetry is being exported (look for connection attempts) |
| 115 | +curl -v https://your-endpoint/v1/metrics |
| 116 | +
|
| 117 | +# Monitor server logs for OpenTelemetry export errors |
| 118 | +./apollo-mcp-server --config config.yaml 2>&1 | grep -i "otel\|telemetry" |
| 119 | +``` |
| 120 | + |
| 121 | +## Configuration Reference |
| 122 | + |
| 123 | +The OpenTelemetry integration is configured via the `telemetry` section of the [configuration reference page](/apollo-mcp-server/config-file#telemetry). |
| 124 | + |
| 125 | +## Emitted Metrics |
| 126 | + |
| 127 | +The server emits the following metrics, which are invaluable for monitoring and alerting. All duration metrics are in milliseconds. |
| 128 | + |
| 129 | +| Metric Name | Type | Description | Attributes | |
| 130 | +|---|---|---|---| |
| 131 | +| `apollo.mcp.initialize.count` | Counter | Incremented for each `initialize` request. | (none) | |
| 132 | +| `apollo.mcp.list_tools.count` | Counter | Incremented for each `list_tools` request. | (none) | |
| 133 | +| `apollo.mcp.get_info.count` | Counter | Incremented for each `get_info` request. | (none) | |
| 134 | +| `apollo.mcp.tool.count` | Counter | Incremented for each tool call. | `tool_name`, `success` (bool) | |
| 135 | +| `apollo.mcp.tool.duration` | Histogram | Measures the execution duration of each tool call. | `tool_name`, `success` (bool) | |
| 136 | +| `apollo.mcp.operation.count`| Counter | Incremented for each downstream GraphQL operation executed by a tool. | `operation.id`, `operation.type` ("persisted_query" or "operation"), `success` (bool) | |
| 137 | +| `apollo.mcp.operation.duration`| Histogram | Measures the round-trip duration of each downstream GraphQL operation. | `operation.id`, `operation.type`, `success` (bool) | |
| 138 | + |
| 139 | +In addition to these metrics, the server also emits standard [HTTP server metrics](https://opentelemetry.io/docs/specs/semconv/http/http-metrics/) (e.g., `http.server.duration`, `http.server.active_requests`) courtesy of the `axum-otel-metrics` library. |
| 140 | + |
| 141 | + |
| 142 | +## Emitted Traces |
| 143 | + |
| 144 | +Spans are generated for the following actions: |
| 145 | + |
| 146 | +- **Incoming HTTP Requests**: A root span is created for every HTTP request to the MCP server. |
| 147 | +- **MCP Handler Methods**: Nested spans are created for each of the main MCP protocol methods (`initialize`, `call_tool`, `list_tools`). |
| 148 | +- **Tool Execution**: `call_tool` spans contain nested spans for the specific tool being executed (e.g., `introspect`, `search`, or a custom GraphQL operation). |
| 149 | +- **Downstream GraphQL Calls**: The `execute` tool and custom operation tools create child spans for their outgoing `reqwest` HTTP calls, capturing the duration of the downstream request. The `traceparent` and `tracestate` headers are propagated automatically, enabling distributed traces. |
| 150 | + |
| 151 | +### Cardinality Control |
| 152 | + |
| 153 | +High-cardinality metrics can occur in MCP Servers with large number of tools or when clients are allowed to generate freeform operations. |
| 154 | +To prevent performance issues and reduce costs, the Apollo MCP Server provides two mechanisms to control metric cardinality, trace sampling and attribute filtering. |
| 155 | + |
| 156 | +#### Trace Sampling |
| 157 | + |
| 158 | +Configure the Apollo MCP Server to sample traces sent to your OpenTelemetry Collector using the `sampler` field in the `telemetry.tracing` configuration: |
| 159 | + |
| 160 | +- **always_on** - Send every trace |
| 161 | +- **always_off** - Disable trace collection entirely |
| 162 | +- **0.0-1.0** - Send a specified percentage of traces |
| 163 | + |
| 164 | +#### Attribute Filtering |
| 165 | + |
| 166 | +The Apollo MCP Server configuration also allows for omitting attributes such as `tool_name` or `operation_id` that can often lead to high cardinality metrics in systems that treat each collected attribute value as a new metric. |
| 167 | +Both traces and metrics have an `omitted_attributes` option that takes a list of strings. Any attribute name in the list will be filtered out and not sent to the collector. |
| 168 | +For detailed configuration options, see the [telemetry configuration reference](/apollo-mcp-server/config-file#telemetry). |
0 commit comments