|
| 1 | +## Instrumenting Kubernetes with Traces |
| 2 | + |
| 3 | +The following references and outlines general guidelines for trace instrumentation |
| 4 | +in Kubernetes components. Components are instrumented using the |
| 5 | +[OpenTelemetry Go client library](https://github.com/open-telemetry/opentelemetry-go). |
| 6 | +For non-Go components. [Libraries in other languages](https://opentelemetry.io/docs/languages/) |
| 7 | +are available. |
| 8 | + |
| 9 | +Traces are exposed via gRPC using the [OpenTelemetry Protocol](https://opentelemetry.io/docs/specs/otel/protocol/) |
| 10 | +(OTLP), which is open and well-understood by a wide range of third party |
| 11 | +applications and vendors in the cloud-native eco-system. |
| 12 | + |
| 13 | +The [general instrumentation advice](https://opentelemetry.io/docs/concepts/instrumentation/libraries/) |
| 14 | +from the OpenTelemetry documentation applies. This document reiterates common pitfalls and some |
| 15 | +Kubernetes specific considerations. |
| 16 | + |
| 17 | +### When to instrument |
| 18 | + |
| 19 | +While spans are sampled to avoid high costs, recording too many spans will |
| 20 | +force consumers to lower the sampling rate, and will "drown out" important |
| 21 | +spans. If your component has more than two or three nested spans, you are |
| 22 | +likely over-using trace instrumentation. Most trace instrumentation in |
| 23 | +Kubernetes components falls into one of two categories: |
| 24 | + |
| 25 | +1. Spans for incoming or outgoing network calls |
| 26 | +2. Spans when initiating new work, such as reconciling an object, which may result in network calls. |
| 27 | + |
| 28 | +For network-based telemetry, Kubernetes components should use OpenTelemetry |
| 29 | +instrumentation libraries for |
| 30 | +[HTTP](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp) and |
| 31 | +[gRPC](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc). |
| 32 | + |
| 33 | +**Note:** When creating spans at the start of reconciling an Object, only |
| 34 | +create the span changes are actually required. Avoid creating "empty" spans |
| 35 | +which simply compare the desired and actual state of an object without |
| 36 | +performing any real work, or making any network requests. |
| 37 | + |
| 38 | +### Configuration and Setup |
| 39 | + |
| 40 | +Kubernetes components should expose a flag, `--tracing-config-file`, which accepts a |
| 41 | +[TracingConfiguration](https://kubernetes.io/docs/reference/config-api/apiserver-config.v1beta1/#apiserver-k8s-io-v1beta1-TracingConfiguration) |
| 42 | +object. The `component-base/tracing` library provides a `NewProvider()` helper |
| 43 | +to convert a TracingConfiguration to a TracerProvider, which can be used to |
| 44 | +record spans. Components should avoid using OpenTelemetry globals, and instead |
| 45 | +pass the configured TracerProvider to libraries where they are used. Components |
| 46 | +should use the W3C Traceparent and Baggage propagators, as provided by the |
| 47 | +`Propagators()` helper. |
| 48 | + |
| 49 | +### Context Propagation |
| 50 | + |
| 51 | +Generally, components should not interact directly with OpenTelemetry |
| 52 | +Propagators, other than by passing them to libraries. Context propagation |
| 53 | +across network boundaries is handled by the |
| 54 | +[HTTP](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp) and |
| 55 | +[gRPC](https://pkg.go.dev/go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc) |
| 56 | +network client and server instrumentation libraries. |
| 57 | + |
| 58 | +Components need to propagate Golang's `context.Context` from incoming network |
| 59 | +calls or spans from the initiation of new work to any outgoing network calls to |
| 60 | +ensure spans are properly connected into traces. |
| 61 | + |
| 62 | +### Naming and Style |
| 63 | + |
| 64 | +Follow the OpenTelemetry [guidelines for span naming](https://opentelemetry.io/docs/specs/otel/trace/api/#span), and the OpenTelemetry [guidelines for attributes](https://opentelemetry.io/docs/specs/semconv/general/attribute-naming/). |
| 65 | + |
| 66 | +### Tracing stability |
| 67 | + |
| 68 | +Tracing instrumentation in Kubernetes components does not currently have |
| 69 | +stability guarantees, but component owners should be aware of which changes are |
| 70 | +breaking to users so such changes are done with proper consideration. In |
| 71 | +particular, it is breaking for users for a component to stop propagating |
| 72 | +context in a way that breaks parent/child relationships for spans, to remove |
| 73 | +spans without replacement, or to remove an attribute from a span without |
| 74 | +replacement. Component owners should not treat general modification spans |
| 75 | +(e.g. renaming the span, or renaming an attribute) as breaking. |
0 commit comments