Skip to content

Commit 715bda6

Browse files
authored
KEP-2831: update kubelet tracing KEP to target GA in 1.33 (kubernetes#5134)
* update kubelet tracing KEP to target GA in 1.33 * update to latest template * add progress for otel self-observability metrics * add explicit steps to test upgrade-downgrade-upgrade flow
1 parent 49cff2e commit 715bda6

File tree

3 files changed

+71
-39
lines changed

3 files changed

+71
-39
lines changed

keps/prod-readiness/sig-instrumentation/2831.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@ alpha:
33
approver: "@ehashman"
44
beta:
55
approver: "@wojtek-t"
6-
6+
stable:
7+
approver: "@wojtek-t"

keps/sig-instrumentation/2831-kubelet-tracing/README.md

Lines changed: 66 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@
2222
- [Integration tests](#integration-tests)
2323
- [e2e tests](#e2e-tests)
2424
- [Graduation Requirements](#graduation-requirements)
25+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
26+
- [Version Skew Strategy](#version-skew-strategy)
2527
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
2628
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
2729
- [Does enabling the feature change any default behavior?](#does-enabling-the-feature-change-any-default-behavior)
@@ -47,14 +49,23 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
4749
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
4850
- [X] (R) KEP approvers have approved the KEP status as `implementable`
4951
- [X] (R) Design details are appropriately documented
50-
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
52+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
53+
- [X] e2e Tests for all Beta API Operations (endpoints)
54+
- [X] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
55+
- [X] (R) Minimum Two Week Window for GA e2e tests to prove flake free
5156
- [X] (R) Graduation criteria is in place
57+
- [X] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5258
- [X] (R) Production readiness review completed
53-
- [X] Production readiness review approved
59+
- [X] (R) Production readiness review approved
5460
- [X] "Implementation History" section is up-to-date for milestone
5561
- [X] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
5662
- [X] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5763

64+
[kubernetes.io]: https://kubernetes.io/
65+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
66+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
67+
[kubernetes/website]: https://git.k8s.io/website
68+
5869
## Summary
5970

6071
This Kubernetes Enhancement Proposal (KEP) is to enhance the kubelet to allow tracing gRPC and HTTP API requests.
@@ -242,16 +253,26 @@ Beta
242253
- [X] OpenTelemetry reaches GA
243254
- [X] Publish examples of how to use the OT Collector with kubernetes
244255
- [X] Allow time for feedback
245-
- [ ] Test and document results of upgrade and rollback while feature-gate is enabled.
246-
- [ ] Add top level traces to connect spans in sync loops, incoming requests, and outgoing requests.
247-
- [ ] Unit/integration test to verify connected traces in kubelet.
248-
- [ ] Revisit the format used to export spans.
249-
- [ ] Parity with the old text-based Traces
250-
- [ ] Connecting traces from container runtimes via the Container Runtime Interface
256+
- [X] Test and document results of upgrade and rollback while feature-gate is enabled.
257+
- [X] Add top level traces to connect spans in sync loops, incoming requests, and outgoing requests.
258+
- [X] Unit/integration test to verify connected traces in kubelet.
259+
- [X] Revisit the format used to export spans.
260+
- [X] Parity with the old text-based Traces
261+
- [X] Connecting traces from container runtimes via the Container Runtime Interface
251262
- https://github.com/kubernetes/kubernetes/pull/114504
252263

253264
GA
254265

266+
- [X] Feedback from users collected and incorporated over multiple releases
267+
268+
### Upgrade / Downgrade Strategy
269+
270+
Tracing will work if the kubelet version supports the feature, and will not export spans if it doesn't. It does not impact the ability to upgrade or rollback kubelet versions.
271+
272+
### Version Skew Strategy
273+
274+
Version skew isn't applicable because this feature only involves the kubelet.
275+
255276
## Production Readiness Review Questionnaire
256277

257278
### Feature Enablement and Rollback
@@ -319,20 +340,37 @@ _This section must be completed when targeting beta graduation to a release._
319340

320341

321342
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
322-
Upgrades and rollbacks will be tested while feature-gate is experimental
343+
344+
Yes. These were tested on a 1.27 kind cluster by enabling, disabling, and re-enabling the feature-gate on the kubelet.
345+
346+
Use a basic config.yaml:
347+
348+
```yaml
349+
kind: Cluster
350+
apiVersion: kind.x-k8s.io/v1alpha4
351+
```
352+
353+
1. Create kind cluster: `kind create cluster --name kubelet-tracing --image kindest/node:v1.27.0 --config config.yaml`
354+
2. exec into kind "node" container: `docker exec -it <container id for node> sh`
355+
3. use apt-get to install vim
356+
4. Edit the kubelet configuration to enable the `KubeletTracing` feature gate: `vim /var/lib/kubelet/config.yaml`
357+
5. Restart the kubelet: `systemctl restart kubelet`.
358+
6. Verify that the node status is still being updated (not from within docker container): `kubectl describe no`
359+
7. Repeat steps 2-6, but disable `KubeletTracing`, and verify that the kubelet works after the restart.
360+
8. Repeat steps 2-6, but re-enable `KubeletTracing`, and verify that the kubelet works after the restart.
323361

324362
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
325363
No
326364

327365
### Monitoring Requirements
328366

329-
_This section must be completed when targeting beta graduation to a release._
330-
331367
###### How can an operator determine if the feature is in use by workloads?
332368

333369
Operators are expected to have access to and/or control of the OpenTelemetry agent deployment and trace storage backend.
334370
KubeletConfiguration will show the FeatureGate and TracingConfiguration.
335371

372+
Workloads do not directly use this feature.
373+
336374
###### How can someone using this feature know that it is working for their instance?
337375

338376
The tracing backend will display the traces with a service "kubelet".
@@ -344,34 +382,26 @@ _This section must be completed when targeting beta graduation to a release._
344382

345383
##### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
346384

347-
- [] Metrics
348-
- Metric name: tbd [opentelemetry-go issue #2547](https://github.com/open-telemetry/opentelemetry-go/issues/2547)
349-
- Components exposing the metric: kubelet
385+
None. Operators can use the absence of traces which an observability signal in their own right.
350386

351-
##### Are there any missing metrics that would be useful to have to improve observability
352-
To be determined.
387+
##### Are there any missing metrics that would be useful to have to improve observability
353388

389+
It would be helpful to have metrics about span generation and export: [opentelemetry-go issue #2547](https://github.com/open-telemetry/opentelemetry-go/issues/2547)
354390

355-
### Dependencies
391+
There is progress on defining and implementing OpenTelemetry trace SDK self-observability metrics:
356392

357-
_This section must be completed when targeting beta graduation to a release._
393+
* Proposal for names: https://github.com/open-telemetry/semantic-conventions/pull/1631
394+
* Prototype for OpenTelemetry-Go: https://github.com/open-telemetry/opentelemetry-go/pull/6153
395+
396+
### Dependencies
358397

359398
###### Does this feature depend on any specific services running in the cluster?**
360399

361400
Yes. In the current version of the proposal, users must run the [OpenTelemetry Collector](https://github.com/open-telemetry/opentelemetry-collector)
362-
as a daemonset and configure a backend trace visualization tool (jaeger, zipkin, etc).
363-
401+
as a daemonset and configure a backend trace visualization tool (jaeger, zipkin, etc). There are also a wide variety of vendors and cloud providers which support OTLP.
364402

365403
### Scalability
366404

367-
_For alpha, this section is encouraged: reviewers should consider these questions
368-
and attempt to answer them._
369-
370-
_For beta, this section is required: reviewers must answer these questions._
371-
372-
_For GA, this section is required: approvers should be able to confirm the
373-
previous answers based on experience in the field._
374-
375405
###### Will enabling / using this feature result in any new API calls?
376406

377407
This will not add any additional API calls.
@@ -401,28 +431,30 @@ previous answers based on experience in the field._
401431

402432
The tracing client library has a small, in-memory cache for outgoing spans.
403433

434+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
435+
436+
No.
437+
404438
### Troubleshooting
405439

406440
The Troubleshooting section currently serves the `Playbook` role. We may consider
407441
splitting it into a dedicated `Playbook` document (potentially with some monitoring
408442
details). For now, we leave it here.
409443

410-
_This section must be completed when targeting beta graduation to a release._
411-
412444
###### How does this feature react if the API server and/or etcd is unavailable?
413445

414446
No reaction specific to this feature if API server and/or etcd is unavailable.
415447

416448
###### What are other known failure modes?
417449

418-
- [The controller is misconfigured and cannot talk to the collector or the collector cannot send traces to the backend]
450+
- [The kubelet is misconfigured and cannot talk to the collector or the kubelet cannot send traces to the backend]
419451
- Detection: How can it be detected via metrics? Stated another way:
420452
how can an operator troubleshoot without logging into a master or worker node?
421453
**kubelet logs, component logs, collector logs**
422-
- Mitigations: **Disable KubeletTracing, update collector, backend configuration**
454+
- Mitigations: **Fix the kubelet configuration, update collector, backend configuration**
423455
- Diagnostics: What are the useful log messages and their required logging
424456
levels that could help debug the issue? **go-opentelemetry sdk provides logs indicating failure**
425-
- Testing: To be added.
457+
- Testing: It isn't particularly useful to test misconfigurations.
426458

427459
## Implementation History
428460

@@ -431,6 +463,7 @@ _This section must be completed when targeting beta graduation to a release._
431463
- 2022-03-29: KEP deemed not ready for Alpha in 1.24
432464
- 2022-06-09: KEP targeted at Alpha in 1.25
433465
- 2023-01-09: KEP targeted at Beta in 1.27
466+
- 2025-02-07: KEP targeted at Stable in 1.33
434467

435468
## Drawbacks
436469

keps/sig-instrumentation/2831-kubelet-tracing/kep.yaml

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,16 +23,14 @@ approvers:
2323
see-also:
2424
- "https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/647-apiserver-tracing"
2525
replaces:
26-
stage: beta
27-
latest-milestone: "v1.27"
26+
stage: stable
27+
latest-milestone: "v1.33"
2828
milestone:
2929
alpha: "v1.25"
3030
beta: "v1.27"
31-
stable: "v1.28"
31+
stable: "v1.33"
3232
feature-gates:
3333
- name: KubeletTracing
3434
components:
3535
- kubelet
3636
disable-supported: true
37-
metrics:
38-
- "tbd"

0 commit comments

Comments
 (0)