Skip to content

Commit 36fa4b8

Browse files
committed
add user stories section
1 parent 438436a commit 36fa4b8

File tree

1 file changed

+24
-0
lines changed
  • keps/sig-instrumentation/647-apiserver-tracing

1 file changed

+24
-0
lines changed

keps/sig-instrumentation/647-apiserver-tracing/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
- [Goals](#goals)
99
- [Non-Goals](#non-goals)
1010
- [Proposal](#proposal)
11+
- [User Stories](#user-stories)
12+
- [Steady-State trace collection](#steady-state-trace-collection)
13+
- [On-Demand trace collection](#on-demand-trace-collection)
1114
- [Tracing API Requests](#tracing-api-requests)
1215
- [Exporting Spans](#exporting-spans)
1316
- [Running the OpenTelemetry Collector](#running-the-opentelemetry-collector)
@@ -73,6 +76,27 @@ Along with metrics and logs, traces are a useful form of telemetry to aid with d
7376

7477
## Proposal
7578

79+
### User Stories
80+
81+
Since this feature is for diagnosing problems with the Kube-API Server, it is targeted at Cluster Operators and Cloud Vendors that manage kubernetes control-planes.
82+
83+
For the following use-cases, I can deploy an OpenTelemetry collector as a sidecar to the API Server. I can use the API Server's `--opentelemetry-config-file` flag with the default URL to make the API Server send its spans to the sidecar collector. Alternatively, I can point the API Server at an OpenTelemetry collector listening on a different port or URL if I need to.
84+
85+
#### Steady-State trace collection
86+
87+
As a cluster operator or cloud provider, I would like to collect traces for API requests to the API Server to help debug a variety of control-plane problems. I can set the `SamplingRatePerMillion` in the configuration file to a non-zero number to have spans collected for a small fraction of requests. Depending on the symptoms I need to debug, I can search span metadata to find a trace which displays the symptoms I am looking to debug. Even for issues which occur non-deterministically, a low sampling rate is generally still enough to surface a representative trace over time.
88+
89+
#### On-Demand trace collection
90+
91+
As a cluster operator or cloud provider, I would like to collect a trace for a specific request to the API Server. This will often happen when debugging a live problem. In such cases, I don't want to change the `SamplingRatePerMillion` to collecting a high percentage of requests, which would be expensive and collect many things I don't care about. I also don't want to restart the API Server, which may fix the problem I am trying to debug. Instead, I can make sure the incoming request to the API Server is sampled. The tooling to do this easily doesn't exist today, but could be added in the future.
92+
93+
For example, to trace a request to list nodes, with traceid=4bf92f3577b34da6a3ce929d0e0e4737, no parent span, and sampled=true:
94+
95+
```bash
96+
kubectl proxy --port=8080 &
97+
curl http://localhost:8080/api/v1/nodes -H "traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4737-0000000000000000-01"
98+
```
99+
76100
### Tracing API Requests
77101

78102
We will wrap the API Server's http server and http clients with [otelhttp](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/master/instrumentation/net/http/otelhttp) to get spans for incoming and outgoing http requests. This generates spans for all sampled incoming requests and propagates context with all client requests. For incoming requests, this would go below [WithRequestInfo](https://github.com/kubernetes/kubernetes/blob/9eb097c4b07ea59c674a69e19c1519f0d10f2fa8/staging/src/k8s.io/apiserver/pkg/server/config.go#L676) in the filter stack, as it must be after authentication and authorization, before the panic filter, and is closest in function to the WithRequestInfo filter.

0 commit comments

Comments
 (0)