Skip to content

Commit 97465c4

Browse files
authored
Merge pull request #4219 from pohly/contextual-logging
KEP-3077: contextual logging: promotion to beta in 1.30
2 parents 866854e + 3b20681 commit 97465c4

File tree

3 files changed

+176
-5
lines changed

3 files changed

+176
-5
lines changed

keps/prod-readiness/sig-instrumentation/3077.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 3077
55
alpha:
66
approver: "@ehashman"
7+
beta:
8+
approver: "@johnbelamaric"

keps/sig-instrumentation/3077-contextual-logging/README.md

Lines changed: 172 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
- [Unit testing](#unit-testing)
3838
- [Injecting common value, logger passed through existing ctx parameter or new parameter](#injecting-common-value-logger-passed-through-existing-ctx-parameter-or-new-parameter)
3939
- [Resulting output](#resulting-output)
40+
- [Integration with log/slog](#integration-with-logslog)
4041
- [Test Plan](#test-plan)
4142
- [Graduation Criteria](#graduation-criteria)
4243
- [Alpha](#alpha)
@@ -52,12 +53,14 @@
5253
- [Scalability](#scalability)
5354
- [Troubleshooting](#troubleshooting)
5455
- [Implementation History](#implementation-history)
56+
- [Status and next steps](#status-and-next-steps)
5557
- [Drawbacks](#drawbacks)
5658
- [Alternatives](#alternatives)
5759
- [Per-component logger](#per-component-logger)
5860
- [Propagating a logger to init code](#propagating-a-logger-to-init-code)
5961
- [Panic when FromContext is called before setting a logger](#panic-when-fromcontext-is-called-before-setting-a-logger)
6062
- [Clean separation of contextual logging and traditional klog logging](#clean-separation-of-contextual-logging-and-traditional-klog-logging)
63+
- [Use log/slog instead of klog+logr](#use-logslog-instead-of-kloglogr)
6164
<!-- /toc -->
6265

6366
## Release Signoff Checklist
@@ -507,6 +510,36 @@ The logcheck static code analysis tool will warn about code in Kubernetes which
507510
calls the underlying functions directly. Once the feature gate is no longer needed,
508511
a global search/replace can remove the usage of these wrapper functions again.
509512

513+
Because the feature gate is off during alpha, log calls have to repeat
514+
important key/value pairs even if those also got passed to `WithValues`:
515+
516+
```
517+
logger := logger.WithValues("pod", klog.KObj(pod))
518+
...
519+
logger.Info("Processing", "pod", klog.KObj(pod))
520+
...
521+
logger.Info("Done", "pod", klog.KObj(pod))
522+
```
523+
524+
Starting with GA, the feature will always be enabled and code can be written
525+
without such duplication:
526+
527+
```
528+
logger := logger.WithValues("pod", klog.KObj(pod))
529+
...
530+
logger.Info("Processing")
531+
...
532+
logger.Info("Done")
533+
```
534+
535+
Documentation of APIs has to make it clear which values will always be included
536+
in log entries and thus don't need to be repeated. If in doubt, repeating them
537+
is okay: the text format will filter out duplicates if log call parameters
538+
overlap with `WithValues` parameters. For performance reasons it will not do
539+
that for duplicates between different `WithValues` calls. In JSON, repeating
540+
keys increases log volume size because there is no de-duplication, but the
541+
semantic is the same ("most recent wins").
542+
510543
### Text format
511544

512545
The formatting and verbosity code will be moved into `internal` packages where
@@ -839,6 +872,15 @@ I1026 16:21:00.461886 801139 scheduler.go:464] "Status after running PostFilter
839872
I1026 16:21:00.461918 801139 factory.go:209] "Unable to schedule pod; no fit; waiting" pod="default/my-csi-app-inline-volume" err="0/1 nodes are available: 1 node(s) did not have enough free storage."
840873
```
841874

875+
### Integration with log/slog
876+
877+
[`log/slog`](https://pkg.go.dev/log/slog) got added in Go
878+
1.21. Interoperability with slog is [provided by
879+
logr](https://github.com/go-logr/logr/pull/222). Applications which use slog
880+
can route log output from Kubernetes packages into their `slog.Handler` and
881+
vice versa, as demonstrated with [`component-base/logs`
882+
examples](https://github.com/kubernetes/kubernetes/pull/120696).
883+
842884
### Test Plan
843885

844886
The new code will be covered by unit tests that execute as part of
@@ -870,7 +912,7 @@ logging.
870912

871913
#### Beta
872914

873-
- All of kube-scheduler (in-tree) and CSI external-provisioner (out-of-tree) converted
915+
- [All of kube-controller-manager](https://github.com/kubernetes/kubernetes/pull/119250) and some [parts of kube-scheduler](https://github.com/kubernetes/kubernetes/pull/115588) converted (in-tree), conversion of out-of-tree components possible, whether they use pflag ([external-provisioner](https://github.com/kubernetes-csi/external-provisioner/pull/639)] or plain Go flags ([node-driver-registrar](https://github.com/kubernetes-csi/node-driver-registrar/pull/259))
874916
- Gathered feedback from developers and surveys
875917
- New APIs in `k8s.io/klog/v2` no longer marked as experimental
876918

@@ -1033,10 +1075,131 @@ None besides bugs that could cause a program to panic (null logger).
10331075

10341076
###### What steps should be taken if SLOs are not being met to determine the problem?
10351077

1036-
Revert commits that changed log calls.
1078+
A cluster operator can disable the feature via the feature gate.
1079+
1080+
Kubernetes developers can revert individual commits that changed log calls once
1081+
it has been determined that they introduce too much overhead.
10371082

10381083
## Implementation History
10391084

1085+
* Kubernetes 1.24: initial alpha
1086+
* Kubernetes 1.27: parts of kube-controller-manager converted
1087+
* Kubernetes 1.28: kube-controller-manager converted completely, relationship
1088+
with log/slog in Go 1.21 clarified
1089+
* Kubernetes 1.29: kube-scheduler converted completely
1090+
1091+
## Status and next steps
1092+
1093+
As of Kubernetes 1.29.1, kube-controller-manager and kube-scheduler have been
1094+
converted. The logcheck tool can be used to count remaining log calls that need
1095+
to be converted:
1096+
1097+
```
1098+
go install sigs.k8s.io/logtools/logcheck@latest
1099+
1100+
echo "Component | Non-Structured Logging | Non-Contextual Logging " && echo "------ | ------- | -------" && for i in $(find pkg/* cmd/* staging/src/k8s.io/* -maxdepth 0 -type d | sort); do echo "$i | $(cd $i; ${GOPATH}/bin/logcheck -check-structured -check-deprecations=false 2>&1 ./... | wc -l ) | $(cd $i; ${GOPATH}/bin/logcheck -check-structured -check-deprecations=false -check-contextual ./... 2>&1 | wc -l )"; done
1101+
```
1102+
1103+
Note that this also counts calls where it was decided to not convert them. The
1104+
actual check with golangci-lint ignores those because of a `//nolint:logcheck`
1105+
suppression comment.
1106+
1107+
Component | Non-Structured Logging | Non-Contextual Logging
1108+
------ | ------- | -------
1109+
cmd/clicheck | 0 | 0
1110+
cmd/cloud-controller-manager | 6 | 8
1111+
cmd/dependencycheck | 0 | 0
1112+
cmd/dependencyverifier | 0 | 0
1113+
cmd/fieldnamedocscheck | 1 | 1
1114+
cmd/gendocs | 0 | 0
1115+
cmd/genkubedocs | 0 | 0
1116+
cmd/genman | 0 | 0
1117+
cmd/genswaggertypedocs | 2 | 2
1118+
cmd/genutils | 0 | 0
1119+
cmd/genyaml | 0 | 0
1120+
cmd/gotemplate | 0 | 0
1121+
cmd/importverifier | 0 | 0
1122+
cmd/kubeadm | 264 | 463
1123+
cmd/kube-apiserver | 6 | 7
1124+
cmd/kube-controller-manager | 0 | 0
1125+
cmd/kubectl | 0 | 0
1126+
cmd/kubectl-convert | 0 | 0
1127+
cmd/kubelet | 0 | 52
1128+
cmd/kubemark | 1 | 1
1129+
cmd/kube-proxy | 0 | 42
1130+
cmd/kube-scheduler | 0 | 0
1131+
cmd/preferredimports | 0 | 0
1132+
cmd/prune-junit-xml | 0 | 0
1133+
cmd/yamlfmt | 0 | 0
1134+
pkg/api | 0 | 0
1135+
pkg/apis | 0 | 0
1136+
pkg/auth | 1 | 1
1137+
pkg/capabilities | 0 | 0
1138+
pkg/client | 0 | 0
1139+
pkg/cloudprovider | 0 | 0
1140+
pkg/cluster | 0 | 0
1141+
pkg/controller | 0 | 3
1142+
pkg/controlplane | 53 | 69
1143+
pkg/credentialprovider | 48 | 77
1144+
pkg/features | 0 | 0
1145+
pkg/fieldpath | 0 | 0
1146+
pkg/generated | 0 | 0
1147+
pkg/kubeapiserver | 4 | 4
1148+
pkg/kubectl | 1 | 2
1149+
pkg/kubelet | 2 | 1983
1150+
pkg/kubemark | 7 | 7
1151+
pkg/printers | 0 | 0
1152+
pkg/probe | 7 | 24
1153+
pkg/proxy | 0 | 360
1154+
pkg/quota | 0 | 0
1155+
pkg/registry | 46 | 99
1156+
pkg/routes | 2 | 2
1157+
pkg/scheduler | 0 | 0
1158+
pkg/security | 0 | 0
1159+
pkg/securitycontext | 0 | 0
1160+
pkg/serviceaccount | 25 | 44
1161+
pkg/util | 20 | 57
1162+
pkg/volume | 704 | 1110
1163+
pkg/windows | 1 | 1
1164+
staging/src/k8s.io/api | 0 | 0
1165+
staging/src/k8s.io/apiextensions-apiserver | 58 | 89
1166+
staging/src/k8s.io/apimachinery | 80 | 125
1167+
staging/src/k8s.io/apiserver | 285 | 655
1168+
staging/src/k8s.io/client-go | 163 | 283
1169+
staging/src/k8s.io/cli-runtime | 1 | 2
1170+
staging/src/k8s.io/cloud-provider | 122 | 162
1171+
staging/src/k8s.io/cluster-bootstrap | 2 | 4
1172+
staging/src/k8s.io/code-generator | 108 | 155
1173+
staging/src/k8s.io/component-base | 33 | 64
1174+
staging/src/k8s.io/component-helpers | 2 | 4
1175+
staging/src/k8s.io/controller-manager | 10 | 10
1176+
staging/src/k8s.io/cri-api | 0 | 0
1177+
staging/src/k8s.io/csi-translation-lib | 3 | 4
1178+
staging/src/k8s.io/dynamic-resource-allocation | 0 | 0
1179+
staging/src/k8s.io/endpointslice | 0 | 0
1180+
staging/src/k8s.io/kms | 0 | 0
1181+
staging/src/k8s.io/kube-aggregator | 45 | 62
1182+
staging/src/k8s.io/kube-controller-manager | 0 | 0
1183+
staging/src/k8s.io/kubectl | 96 | 160
1184+
staging/src/k8s.io/kubelet | 0 | 32
1185+
staging/src/k8s.io/kube-proxy | 0 | 0
1186+
staging/src/k8s.io/kube-scheduler | 0 | 0
1187+
staging/src/k8s.io/legacy-cloud-providers | 1281 | 2015
1188+
staging/src/k8s.io/metrics | 0 | 0
1189+
staging/src/k8s.io/mount-utils | 55 | 95
1190+
staging/src/k8s.io/pod-security-admission | 0 | 1
1191+
staging/src/k8s.io/sample-apiserver | 0 | 0
1192+
staging/src/k8s.io/sample-cli-plugin | 0 | 0
1193+
staging/src/k8s.io/sample-controller | 0 | 0
1194+
1195+
For Kubernetes 1.30, the focus is on client-go. APIs need to be extended
1196+
carefully without breaking existing code so that a context can be provided for
1197+
log calls. In some cases, this also makes a context available to code which
1198+
currently uses `context.TODO` as a stop-gap measure. Currently there are over
1199+
300 of those in `staging/src/k8s.io/client-go`. Whenever new APIs get
1200+
introduced, components which were already converted to contextual logging get
1201+
updated to use those.
1202+
10401203
## Drawbacks
10411204

10421205
Supporting contextual logging is a key design decision that has implications
@@ -1098,3 +1261,10 @@ would have removed all legacy code from Kubernetes. However, that transition
10981261
would have been complicated and forced all consumers of Kubernetes code to
10991262
adjust their code. Therefore the scope of the KEP was reduced from "remove
11001263
dependency on klog" to "remove dependency on global logger in klog".
1264+
1265+
### Use log/slog instead of klog+logr
1266+
1267+
This isn't viable because `slog` doesn't provide a mechanism to pass a logger
1268+
through a context. Therefore it would not be possible to support contextual
1269+
logging in packages like client-go where adding an explicit logger parameter
1270+
would be a major API break.

keps/sig-instrumentation/3077-contextual-logging/kep.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,13 +24,12 @@ stage: alpha
2424
# The most recent milestone for which work toward delivery of this KEP has been
2525
# done. This can be the current (upcoming) milestone, if it is being actively
2626
# worked on.
27-
latest-milestone: "v1.24"
27+
latest-milestone: "v1.30"
2828

2929
# The milestone at which this feature was, or is targeted to be, at each stage.
3030
milestone:
3131
alpha: "v1.24"
32-
beta: "v1.28"
33-
stable: "v1.30"
32+
beta: "v1.30"
3433

3534
feature-gates:
3635
- name: ContextualLogging

0 commit comments

Comments
 (0)