Skip to content

Commit c35f925

Browse files
committed
switch to file-based configuration
1 parent 2fca880 commit c35f925

File tree

1 file changed

+63
-9
lines changed

1 file changed

+63
-9
lines changed

keps/sig-instrumentation/0034-distributed-tracing-kep.md

Lines changed: 63 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,20 +3,21 @@ title: Tracing API Server Requests
33
authors:
44
- "@Monkeyanator"
55
- "@dashpole"
6+
- "@logicalhan"
67
editor: "@dashpole"
78
owning-sig: sig-instrumentation
89
participating-sigs:
910
- sig-architecture
1011
- sig-api-machinery
1112
- sig-scalability
12-
- sig-cli
1313
reviewers:
1414
- "@logicalhan"
15+
- "@caesarxuchao"
1516
approvers:
1617
- "@brancz"
1718
- "@lavalamp"
1819
creation-date: 2018-12-04
19-
last-updated: 2020-04-29
20+
last-updated: 2020-09-30
2021
status: implementable
2122
---
2223

@@ -32,10 +33,13 @@ status: implementable
3233
- [Non-Goals](#non-goals)
3334
- [Proposal](#proposal)
3435
- [Tracing API Requests](#tracing-api-requests)
35-
- [Vendor OpenTelemetry and the OT Exporter](#vendor-opentelemetry-and-the-ot-exporter)
36+
- [Exporting Spans](#exporting-spans)
37+
- [Running the OpenTelemetry Collector](#running-the-opentelemetry-collector)
38+
- [APIServer Configuration and EgressSelectors](#apiserver-configuration-and-egressselectors)
3639
- [Controlling use of the OpenTelemetry library](#controlling-use-of-the-opentelemetry-library)
3740
- [Graduation requirements](#graduation-requirements)
3841
- [Alternatives considered](#alternatives-considered)
42+
- [Introducing a new EgressSelector type](#introducing-a-new-egressselector-type)
3943
- [Other OpenTelemetry Exporters](#other-opentelemetry-exporters)
4044
- [Production Readiness Survey](#production-readiness-survey)
4145
- [Implementation History](#implementation-history)
@@ -45,7 +49,6 @@ status: implementable
4549

4650
This Kubernetes Enhancement Proposal (KEP) proposes enhancing the API Server to allow tracing requests.
4751

48-
4952
## Motivation
5053

5154
Along with metrics and logs, traces are a useful form of telemetry to aid with debugging incoming requests. The API Server currently uses a poor-man's form of tracing (see [github.com/kubernetes/utils/trace](https://github.com/kubernetes/utils/tree/master/trace)), but we can make use of distributed tracing to improve the ease of use and enable easier analysis of trace data. Trace data is structured, providing the detail necessary to debug requests, and context propagation allows plugins, such as admission webhooks, to add to API Server requests.
@@ -60,7 +63,6 @@ Along with metrics and logs, traces are a useful form of telemetry to aid with d
6063

6164
* The API Server generates and exports spans for incoming and outgoing requests.
6265
* The API Server propagates context from incoming requests to outgoing requests.
63-
* Kubectl clients can easily specify that a request should be traced.
6466

6567
### Non-Goals
6668

@@ -69,22 +71,69 @@ Along with metrics and logs, traces are a useful form of telemetry to aid with d
6971
* Trace operations from all Kubernetes resource types in a generic manner (i.e. without manual instrumentation)
7072
* Change metrics or logging (e.g. to support trace-metric correlation)
7173
* Access control to tracing backends
74+
* Add tracing to components outside kubernetes (e.g. etcd client library).
7275

7376
## Proposal
7477

7578
### Tracing API Requests
7679

77-
We will wrap the API Server's http server and http clients with [othttp](https://github.com/open-telemetry/opentelemetry-go/tree/master/plugin/othttp) to get spans for incoming and outgoing http requests, and add the [otgrpc](https://github.com/grpc-ecosystem/grpc-opentracing/tree/master/go/otgrpc) DialOption to the etcd grpc client. This generates spans for all sampled incoming requests and propagates context with all client requests. For incoming requests, this would go below [WithRequestInfo](https://github.com/kubernetes/kubernetes/blob/9eb097c4b07ea59c674a69e19c1519f0d10f2fa8/staging/src/k8s.io/apiserver/pkg/server/config.go#L676) in the filter stack, as it must be after authentication and authorization, before the panic filter, and is closest in function to the WithRequestInfo filter.
80+
We will wrap the API Server's http server and http clients with [othttp](https://github.com/open-telemetry/opentelemetry-go/tree/master/plugin/othttp) to get spans for incoming and outgoing http requests. This generates spans for all sampled incoming requests and propagates context with all client requests. For incoming requests, this would go below [WithRequestInfo](https://github.com/kubernetes/kubernetes/blob/9eb097c4b07ea59c674a69e19c1519f0d10f2fa8/staging/src/k8s.io/apiserver/pkg/server/config.go#L676) in the filter stack, as it must be after authentication and authorization, before the panic filter, and is closest in function to the WithRequestInfo filter.
7881

7982
Note that some clients of the API Server, such as webhooks, may make reentrant calls to the API Server. To gain the full benefit of tracing, such clients should propagate context with requests back to the API Server.
8083

81-
### Vendor OpenTelemetry and the OT Exporter
84+
### Exporting Spans
8285

8386
This KEP proposes the use of the [OpenTelemetry tracing framework](https://opentelemetry.io/) to create and export spans to configured backends.
8487

85-
The API Server will use the [OpenTelemetry exporter format](https://github.com/open-telemetry/opentelemetry-proto), which exports traces to a local port. This format is compatible with the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector), which allows importing and configuring exporters for trace storage backends to be done out-of-tree in addition to other useful features. The exporter stores spans in memory, and uses the [batching processor](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk.md#batching-processor) to batch requests and send them asynchronously.
88+
The API Server will use the [OpenTelemetry exporter format](https://github.com/open-telemetry/opentelemetry-proto), and the [OTlp exporter](https://github.com/open-telemetry/opentelemetry-go/tree/master/exporters/otlp#opentelemetry-collector-go-exporter) which can export traces. This format is easy to use with the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector), which allows importing and configuring exporters for trace storage backends to be done out-of-tree in addition to other useful features.
89+
90+
### Running the OpenTelemetry Collector
91+
92+
The [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector) can be run as a sidecar, a daemonset, a deployment , or a combination in which the daemonset buffers telemetry and forwards to the deployment for aggregation (e.g. tail-base sampling) and routing to a telemetry backend. To support these various setups, the API Server should be able to send traffic either to a local (on the master) collector, or to a cluster service (in the cluster).
93+
94+
### APIServer Configuration and EgressSelectors
95+
96+
The API Server controls where traffic is sent using an [EgressSelector](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190226-network-proxy.md), and has separate controls for `Master`, `Cluster`, and `Etcd` traffic. As described above, we would like to support either sending telemetry to a url using the `Master` egress, or a service using the `Cluster` egress. To accomplish this, we will introduce a flag, `--opentelemetry-config-file`, that will point to the file that defines the opentelemetry exporter configuration. That file will have the following format:
97+
98+
```golang
99+
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
86100

87-
Add configuration to the API Server required to configure the opentelemetry exporter, including the address and egress proxy to send spans to. The [egress proxy](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190226-network-proxy.md) can be added to the opentelemetry exporter by adding a ContextDialer grpc DialOption similar to the one used by the apiserver's etcd client. This will add a new "OpenTelemetry" [EgressType](https://github.com/kubernetes/kubernetes/blob/4b9b9ab75376b7b53876ab6b2be42d0940c7eb26/staging/src/k8s.io/apiserver/pkg/server/egressselector/egress_selector.go#L53) to the API Server's configuration.
101+
// OpenTelemetryClientConfiguration provides versioned configuration for opentelemetry clients.
102+
type OpenTelemetryClientConfiguration struct {
103+
metav1.TypeMeta `json:",inline"`
104+
105+
// +optional
106+
// URL of the collector that's running on the master.
107+
// if URL is specified, APIServer uses the egressType Master when sending tracing data to the collector.
108+
URL *string `json:"url,omitempty" protobuf:"bytes,3,opt,name=url"`
109+
110+
// +optional
111+
// Service that's the frontend of the collector deployment running in the cluster.
112+
// If Service is specified, APIServer uses the egressType Cluster when sending tracing data to the collector.
113+
Service *ServiceReference `json:"service,omitempty" protobuf:"bytes,1,opt,name=service"`
114+
}
115+
116+
// ServiceReference holds a reference to Service.legacy.k8s.io
117+
type ServiceReference struct {
118+
// `namespace` is the namespace of the service.
119+
// Required
120+
Namespace string `json:"namespace" protobuf:"bytes,1,opt,name=namespace"`
121+
// `name` is the name of the service.
122+
// Required
123+
Name string `json:"name" protobuf:"bytes,2,opt,name=name"`
124+
125+
// `path` is an optional URL path which will be sent in any request to
126+
// this service.
127+
// +optional
128+
Path *string `json:"path,omitempty" protobuf:"bytes,3,opt,name=path"`
129+
130+
// If specified, the port on the service that hosting webhook.
131+
// Default to 443 for backward compatibility.
132+
// `port` should be a valid port number (1-65535, inclusive).
133+
// +optional
134+
Port *int32 `json:"port,omitempty" protobuf:"varint,4,opt,name=port"`
135+
}
136+
```
88137

89138
### Controlling use of the OpenTelemetry library
90139

@@ -101,11 +150,16 @@ Alpha
101150
Beta
102151

103152
- [] Tracing 100% of requests does not break scalability tests.
153+
- [] OpenTelemetry reaches GA
104154
- [] Publish documentation on examples of how to use the OT Collector with kubernetes
105155

106156

107157
## Alternatives considered
108158

159+
### Introducing a new EgressSelector type
160+
161+
Instead of a configuration file to choose between a url on the `Master` network, or a service on the `Cluster` network, we considered introducing a new `OpenTelemetry` egress type, which could be configured separately. However, we aren't actually introducing a new destination for traffic, so it is more conventional to make use of existing egress types. We will also likely want to add additional configuration for the OpenTelemetry client in the future.
162+
109163
### Other OpenTelemetry Exporters
110164

111165
This KEP suggests that we utilize the OpenTelemetry exporter format in all components. Alternative options include:

0 commit comments

Comments
 (0)