|
| 1 | +--- |
| 2 | +title: Telemetry (metrics and traces) |
| 3 | +description: |
| 4 | + How to enable OpenTelemetry (metrics and traces) and Prometheus |
| 5 | + instrumentation for ToolHive MCP servers inside of Kubernetes using the |
| 6 | + ToolHive Operator |
| 7 | +--- |
| 8 | + |
| 9 | +ToolHive includes built-in instrumentation using OpenTelemetry, which gives you |
| 10 | +comprehensive observability for your MCP server interactions. You can export |
| 11 | +traces and metrics to popular observability backends like Jaeger, Honeycomb, |
| 12 | +Datadog, and Grafana Cloud, or expose Prometheus metrics directly. |
| 13 | + |
| 14 | +## What you can monitor |
| 15 | + |
| 16 | +ToolHive's telemetry captures detailed information about MCP interactions |
| 17 | +including traces, metrics, and performance data. For a comprehensive overview of |
| 18 | +the telemetry architecture, metrics collection, and monitoring capabilities, see |
| 19 | +the [observability overview](../concepts/observability.md). |
| 20 | + |
| 21 | +## Enable telemetry |
| 22 | + |
| 23 | +You can enable telemetry when deploying an MCP server by specifying Telemetry |
| 24 | +configuration in the `MCPServer` custom resource. |
| 25 | + |
| 26 | +This example runs the Fetch MCP server and exports traces to a deployed instance |
| 27 | +of the [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/): |
| 28 | + |
| 29 | +```yaml |
| 30 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 31 | +kind: MCPServer |
| 32 | +metadata: |
| 33 | + name: gofetch |
| 34 | + namespace: toolhive-system |
| 35 | +spec: |
| 36 | + image: ghcr.io/stackloklabs/gofetch/server |
| 37 | + transport: streamable-http |
| 38 | + port: 8080 |
| 39 | + targetPort: 8080 |
| 40 | + ... |
| 41 | + ... |
| 42 | + telemetry: |
| 43 | + openTelemetry: |
| 44 | + enabled: true |
| 45 | + endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 |
| 46 | + serviceName: mcp-fetch-server |
| 47 | + insecure: true |
| 48 | + metrics: |
| 49 | + enabled: true |
| 50 | + tracing: |
| 51 | + enabled: true |
| 52 | + samplingRate: "0.05" |
| 53 | + prometheus: |
| 54 | + enabled: true |
| 55 | +``` |
| 56 | +
|
| 57 | +The `spec.telemetry.openTelemetry.endpoint` will be the OpenTelemetry collector |
| 58 | +that is deployed inside of your infrastructure, the |
| 59 | +`spec.telemetry.openTelemetry.serviceName` will be what you can use to identify |
| 60 | +your MCP server in your observability stack. |
| 61 | + |
| 62 | +### Export metrics to an OTLP endpoint |
| 63 | + |
| 64 | +If you want to enable ToolHive to export metrics to your OTel collector, you can |
| 65 | +enable the `spec.telemetry.openTelemetry.metrics.enabled` flag. |
| 66 | + |
| 67 | +### Export traces to an OTLP endpoint |
| 68 | + |
| 69 | +If you want to enable ToolHive to export tracing information, you can enable the |
| 70 | +`spec.telemetry.openTelemetry.tracing.enabled` flag. |
| 71 | + |
| 72 | +You can also set the sampling rate of your traces by setting the |
| 73 | +`spec.telemetry.openTelemetry.tracing.sampleRate` option to a number between 0 |
| 74 | +and 1.0. By default this will be `0.05` which equates to 5% of all requests. |
| 75 | + |
| 76 | +:::note |
| 77 | + |
| 78 | +The `spec.telemetry.openTelemetry.endpoint` is provided as a hostname and |
| 79 | +optional port, without a scheme or path (e.g., use `api.honeycomb.io` or |
| 80 | +`api.honeycomb.io:443`, not `https://api.honeycomb.io`). ToolHive automatically |
| 81 | +uses HTTPS unless `--otel-insecure` is specified. |
| 82 | + |
| 83 | +::: |
| 84 | + |
| 85 | +By default, the service name is set to `toolhive-mcp-proxy`, and the sampling |
| 86 | +rate is `0.05` (5%). |
| 87 | + |
| 88 | +:::tip[Recommendation] |
| 89 | + |
| 90 | +Set the `spec.telemetry.openTelemetry.serviceName` flag to a meaningful name for |
| 91 | +each MCP server. This helps you identify the server in your observability |
| 92 | +backend. |
| 93 | + |
| 94 | +::: |
| 95 | + |
| 96 | +### Enable Prometheus metrics |
| 97 | + |
| 98 | +You can expose Prometheus-style metrics at `/metrics` on the main transport port |
| 99 | +for local scraping by enabling the `spec.telemetry.prometheus.enabled` flag. |
| 100 | + |
| 101 | +To access the metrics, you can use `curl` or any Prometheus-compatible scraper. |
| 102 | +The metrics are available at `http://<HOST>:<PORT>/metrics`, where `<HOST>` is |
| 103 | +resolvable address of the ToolHive ProxyRunner fronting your MCP server pod and |
| 104 | +`<PORT>` is the port of which the ProxyRunner service is configured to expose |
| 105 | +for traffic. |
| 106 | + |
| 107 | +### Dual export |
| 108 | + |
| 109 | +You can export to both an OTLP endpoint and expose Prometheus metrics |
| 110 | +simultaneously. |
| 111 | + |
| 112 | +The `MCPServer` example at the top of this page has dual export enabled. |
| 113 | + |
| 114 | +## Observability backends |
| 115 | + |
| 116 | +ToolHive can export telemetry data to many different observability backends. It |
| 117 | +supports exporting traces and metrics to any backend that implements the OTLP |
| 118 | +protocol. Some common examples are listed below, but specific configurations |
| 119 | +will vary based on your environment and requirements. |
| 120 | + |
| 121 | +### OpenTelemetry Collector (recommended) |
| 122 | + |
| 123 | +The OpenTelemetry Collector is a vendor-agnostic way to receive, process and |
| 124 | +export telemetry data. It supports many backend services, scalable deployment |
| 125 | +options, and advanced processing capabilities. |
| 126 | + |
| 127 | +```mermaid |
| 128 | +graph LR |
| 129 | + A[ToolHive] -->|traces & metrics| B[OpenTelemetry Collector] |
| 130 | + B --> C[AWS CloudWatch] |
| 131 | + B --> D[Splunk] |
| 132 | + B --> E[New Relic] |
| 133 | + B <--> F[Prometheus] |
| 134 | + B --> G[Other OTLP backends] |
| 135 | +``` |
| 136 | + |
| 137 | +You can run the OpenTelemetry Collector inside of a Kubernetes cluster, follow |
| 138 | +the |
| 139 | +[OpenTelemetry Collector documentation](https://opentelemetry.io/docs/collector/) |
| 140 | +for more information. |
| 141 | + |
| 142 | +To export data to a local OpenTelemetry Collector, set your OTLP endpoint to the |
| 143 | +OTLP http receiver port (default is `4318`): |
| 144 | + |
| 145 | +```yaml |
| 146 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 147 | +kind: MCPServer |
| 148 | +metadata: |
| 149 | + name: gofetch |
| 150 | + namespace: toolhive-system |
| 151 | +spec: |
| 152 | + ... |
| 153 | + ... |
| 154 | + telemetry: |
| 155 | + openTelemetry: |
| 156 | + enabled: true |
| 157 | + endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 |
| 158 | + serviceName: mcp-fetch-server |
| 159 | + insecure: true |
| 160 | + metrics: |
| 161 | + enabled: true |
| 162 | +``` |
| 163 | + |
| 164 | +### Prometheus |
| 165 | + |
| 166 | +To collect metrics using Prometheus, run your MCP server with the |
| 167 | +`spec.telemetry.prometheus.enabled` flag enabled and add the following to your |
| 168 | +Prometheus configuration: |
| 169 | + |
| 170 | +```yaml title="prometheus.yml" |
| 171 | +scrape_configs: |
| 172 | + - job_name: 'toolhive-mcp-proxy' |
| 173 | + static_configs: |
| 174 | + - targets: ['<MCP_SERVER_PROXY_SVC_URL>:<MCP_SERVER_PORT>'] |
| 175 | + scrape_interval: 15s |
| 176 | + metrics_path: /metrics |
| 177 | +``` |
| 178 | + |
| 179 | +You can add multiple MCP servers to the `targets` list. Replace |
| 180 | +`<MCP_SERVER_PROXY_SVC_URL>` with the ProxyRunner SVC name and |
| 181 | +`<MCP_SERVER_PORT>` with the port number exposed by the SVC. |
| 182 | + |
| 183 | +### Jaeger |
| 184 | + |
| 185 | +[Jaeger](https://www.jaegertracing.io) is a popular open-source distributed |
| 186 | +tracing system. You can run it inside of a Kubernetes cluster in order to store |
| 187 | +tracing telemetry data exported by the ToolHive proxy. |
| 188 | + |
| 189 | +You can export traces to Jaeger by setting the OTLP endpoint to an OpenTelemetry |
| 190 | +collector, and then configuring the collector to export tracing data to Jaeger. |
| 191 | + |
| 192 | +```yaml |
| 193 | +apiVersion: toolhive.stacklok.dev/v1alpha1 |
| 194 | +kind: MCPServer |
| 195 | +metadata: |
| 196 | + name: gofetch |
| 197 | + namespace: toolhive-system |
| 198 | +spec: |
| 199 | + ... |
| 200 | + ... |
| 201 | + telemetry: |
| 202 | + openTelemetry: |
| 203 | + enabled: true |
| 204 | + endpoint: otel-collector-opentelemetry-collector.monitoring.svc.cluster.local:4318 |
| 205 | + serviceName: mcp-fetch-server |
| 206 | + insecure: true |
| 207 | + tracing: |
| 208 | + enabled: true |
| 209 | +``` |
| 210 | + |
| 211 | +Inside of your OpenTelemetry collector configuration. |
| 212 | + |
| 213 | +```yaml |
| 214 | +config: |
| 215 | + receivers: |
| 216 | + otlp: |
| 217 | + protocols: |
| 218 | + grpc: |
| 219 | + endpoint: 0.0.0.0:4317 |
| 220 | + http: |
| 221 | + endpoint: 0.0.0.0:4318 |
| 222 | +
|
| 223 | + exporters: |
| 224 | + otlp/jaeger: |
| 225 | + endpoint: http://jaeger-all-in-one-collector.monitoring:4317 |
| 226 | +
|
| 227 | + service: |
| 228 | + pipelines: |
| 229 | + traces: |
| 230 | + receivers: [otlp] |
| 231 | + processors: [batch] |
| 232 | + exporters: [otlp/jaeger] |
| 233 | +``` |
| 234 | + |
| 235 | +### Honeycomb |
| 236 | + |
| 237 | +Coming soon. |
| 238 | + |
| 239 | +You'll need your Honeycomb API key, which you can find in your |
| 240 | +[Honeycomb account settings](https://ui.honeycomb.io/account). |
| 241 | + |
| 242 | +### Datadog |
| 243 | + |
| 244 | +Datadog has [multiple options](https://docs.datadoghq.com/opentelemetry/) for |
| 245 | +collecting OpenTelemetry data: |
| 246 | + |
| 247 | +- The |
| 248 | + [**OpenTelemetry Collector**](https://docs.datadoghq.com/opentelemetry/setup/collector_exporter/) |
| 249 | + is recommended for existing OpenTelemetry users or users wanting a |
| 250 | + vendor-neutral solution. |
| 251 | + |
| 252 | +- The [**Datadog Agent**](https://docs.datadoghq.com/opentelemetry/setup/agent) |
| 253 | + is recommended for existing Datadog users. |
| 254 | + |
| 255 | +### Grafana Cloud |
| 256 | + |
| 257 | +Coming soon. |
| 258 | + |
| 259 | +## Performance considerations |
| 260 | + |
| 261 | +### Sampling rates |
| 262 | + |
| 263 | +Adjust sampling rates based on your environment: |
| 264 | + |
| 265 | +- **Development**: `spec.telemetry.openTelemetry.tracing.samplingRate: 1.0` |
| 266 | + (100% sampling) |
| 267 | +- **Production**: `spec.telemetry.openTelemetry.tracing.samplingRate 0.01` (1% |
| 268 | + sampling for high-traffic systems) |
| 269 | +- **Default**: `spec.telemetry.openTelemetry.tracing.samplingRate 0.05` (5% |
| 270 | + sampling) |
| 271 | + |
| 272 | +### Network overhead |
| 273 | + |
| 274 | +Telemetry adds minimal overhead when properly configured: |
| 275 | + |
| 276 | +- Use appropriate sampling rates for your traffic volume |
| 277 | +- Monitor your observability backend costs and adjust sampling accordingly |
| 278 | + |
| 279 | +## Related information |
| 280 | + |
| 281 | +- [Kubernetes CRD reference](../reference/crd-spec.mdx) - Reference for the |
| 282 | + `MCPServer` Custom Resource Definition (CRD) |
| 283 | +- [Deploy the operator using Helm](./deploy-operator-helm.md) - Install the |
| 284 | + ToolHive operator |
0 commit comments