|
| 1 | +# KEP-2845: Deprecate klog specific flags in Kubernetes Compnents |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [Removed klog flags](#removed-klog-flags) |
| 11 | + - [Logging defaults](#logging-defaults) |
| 12 | + - [Split stdout and stderr](#split-stdout-and-stderr) |
| 13 | + - [Logging headers](#logging-headers) |
| 14 | + - [User Stories](#user-stories) |
| 15 | + - [Writing logs to files](#writing-logs-to-files) |
| 16 | + - [Caveats](#caveats) |
| 17 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 18 | + - [Users don't want to use go-runner as replacement.](#users-dont-want-to-use-go-runner-as-replacement) |
| 19 | + - [Log processing in parent process causes performance problems](#log-processing-in-parent-process-causes-performance-problems) |
| 20 | +- [Design Details](#design-details) |
| 21 | + - [Test Plan](#test-plan) |
| 22 | + - [Graduation Criteria](#graduation-criteria) |
| 23 | + - [Alpha](#alpha) |
| 24 | + - [Beta](#beta) |
| 25 | + - [GA](#ga) |
| 26 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 27 | + - [Version Skew Strategy](#version-skew-strategy) |
| 28 | +- [Implementation History](#implementation-history) |
| 29 | +- [Drawbacks](#drawbacks) |
| 30 | +- [Alternatives](#alternatives) |
| 31 | + - [Continue supporting all klog features](#continue-supporting-all-klog-features) |
| 32 | + - [Release klog 3.0 with removed features](#release-klog-30-with-removed-features) |
| 33 | +<!-- /toc --> |
| 34 | + |
| 35 | +## Release Signoff Checklist |
| 36 | + |
| 37 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 38 | + |
| 39 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 40 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 41 | +- [ ] (R) Design details are appropriately documented |
| 42 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 43 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 44 | + - [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 45 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 46 | +- [ ] (R) Graduation criteria is in place |
| 47 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 48 | +- [ ] (R) Production readiness review completed |
| 49 | +- [ ] (R) Production readiness review approved |
| 50 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 51 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 52 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 53 | + |
| 54 | +[kubernetes.io]: https://kubernetes.io/ |
| 55 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 56 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 57 | +[kubernetes/website]: https://git.k8s.io/website |
| 58 | + |
| 59 | +## Summary |
| 60 | + |
| 61 | +This KEP proposes to deprecate and in the future to remove a subset of the klog |
| 62 | +command line flags from Kubernetes components, with goal of making logging of |
| 63 | +k8s core components simpler, easier to maintain and extend by community. |
| 64 | + |
| 65 | +## Motivation |
| 66 | + |
| 67 | +Early on Kubernetes adopted glog logging library for logging. There was no |
| 68 | +larger motivation for picking glog, as the Go ecosystem was in its infancy at |
| 69 | +that time and there were no alternatives. As Kubernetes community needs grew |
| 70 | +glog was not flexible enough, prompting creation of its fork klog. By forking we |
| 71 | +inherited a lot of glog features that we never intended to support. Introduction |
| 72 | +of alternative log formats like JSON created a conundrum, should we implement |
| 73 | +all klog features for JSON? Most of them don't make sense and method for their |
| 74 | +configuration leaves much to be desired. Klog features are controlled by set of |
| 75 | +global flags that remain last bastion of global state in k/k repository. Those |
| 76 | +flags don't have a single naming standard (some start with log prefix, some |
| 77 | +not), don't comply to k8s flag naming (use underscore instead of hyphen) and |
| 78 | +many other problems. We need to revisit how logging configuration is done in |
| 79 | +klog, so it can work with alternative log formats and comply with current best |
| 80 | +practices. |
| 81 | + |
| 82 | +Lack of investment and growing number of klog features impacted project quality. |
| 83 | +Klog has multiple problems, including: |
| 84 | +* performance is much worse than alternatives, for example 7-8x than |
| 85 | + [JSON format](https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging#logger-implementation-performance) |
| 86 | +* doesn't support throughput to fulfill Kubernetes scalability requirements |
| 87 | + [kubernetes/kubernetes#90804](https://github.com/kubernetes/kubernetes/pull/90804) |
| 88 | +* complexity and confusion caused by maintaining backward compatibility for |
| 89 | + legacy glog features and flags. For example |
| 90 | + [kuberrnetes/klog#54](https://github.com/kubernetes/klog/issues/54) |
| 91 | + |
| 92 | +Fixing all those issues would require big investment into logging, but would not |
| 93 | +solve the underlying problem of having to maintain a logging library. We have |
| 94 | +already seen cases like [kubernetes/kubernetes#90804](https://github.com/kubernetes/kubernetes/pull/90804) |
| 95 | +where it's easier to reimplement a klog feature in external project than fixing |
| 96 | +the problem in klog. To conclude, we should drive to reduce maintenance cost and |
| 97 | +improve quality by narrowing scope of logging library. |
| 98 | + |
| 99 | +As for what configuration options should be standardized for all logging formats |
| 100 | +I would look into 12 factor app standard (https://12factor.net/). It defines |
| 101 | +logs as steams of events and discourages applications from taking on |
| 102 | +responsibility for log file management, log rotation and any other processing |
| 103 | +that can be done externally. This is something that Kubernetes already |
| 104 | +encourages by collecting stdout and stderr logs and making them available via |
| 105 | +kubectl logs. It's somewhat confusing that K8s components don't comply to K8s |
| 106 | +best practices. |
| 107 | + |
| 108 | +### Goals |
| 109 | + |
| 110 | +* Unblock development of alternative logging formats |
| 111 | +* Narrow scope of logging with more opinionated approach and smaller set of features |
| 112 | +* Reduce complexity of logging configuration and follow standard component configuration mechanism. |
| 113 | + |
| 114 | +### Non-Goals |
| 115 | + |
| 116 | +* Change klog output format |
| 117 | + |
| 118 | +## Proposal |
| 119 | + |
| 120 | +I propose to remove klog specific feature flags in Kubernetes core components |
| 121 | +(kube-apiserver, kube-scheduler, kube-controller-manager, kubelet) and set them |
| 122 | +to agreed good defaults. From klog flags we would remove all flags besides "-v" |
| 123 | +and "-vmodule". With removal of flags to route logs based on type we want to |
| 124 | +change the default routing that will work as better default. Changing the |
| 125 | +defaults will be done in via multi release process, that will introduce some |
| 126 | +temporary flags that will be removed at the same time as other klog flags. |
| 127 | + |
| 128 | +### Removed klog flags |
| 129 | + |
| 130 | +To adopt 12 factor app standard for logging we would drop all flags that extend |
| 131 | +logging over events streams. This change should be |
| 132 | +scoped to only those components and not affect broader klog community. |
| 133 | + |
| 134 | +Flags that should be deprecated: |
| 135 | + |
| 136 | +* --log-dir, --log-file, --log-flush-frequency - responsible for writing to |
| 137 | + files and syncs to disk. |
| 138 | + Motivation: Not critical as there are easy to set up alternatives like: |
| 139 | + shell redirection, systemd service management or docker log driver. Removing |
| 140 | + them reduces complexity and allows development of non-text loggers like one |
| 141 | + writing to journal. |
| 142 | +* --logtostderr, --alsologtostderr, --one-output, --stderrthreshold - |
| 143 | + responsible enabling/disabling writing to stderr (vs file). |
| 144 | + Motivation: Routing logs can be easily implemented by any log processors like: |
| 145 | + Fluentd, Fluentbit, Logstash. |
| 146 | +* --log-file-max-size, --skip-log-headers - responsible configuration of file |
| 147 | + rotation. |
| 148 | + Motivation: Not needed if writing to files is removed. |
| 149 | +* --add-dir-header, --skip-headers - klog format specific flags . |
| 150 | + Motivation: don't apply to other log formats |
| 151 | +* --log-backtrace-at - A legacy glog feature. |
| 152 | + Motivation: No trace of anyone using this feature. |
| 153 | + |
| 154 | +Flag deprecation should comply with standard k8s policy and require 3 releases before removal. |
| 155 | + |
| 156 | +This leaves that two flags that should be implemented by all log formats |
| 157 | + |
| 158 | +* -v - control global log verbosity of Info logs |
| 159 | +* --vmodule - control log verbosity of Info logs on per file level |
| 160 | + |
| 161 | +Those flags were chosen as they have effect of which logs are written, |
| 162 | +directly impacting log volume and component performance. |
| 163 | + |
| 164 | +### Logging defaults |
| 165 | + |
| 166 | +With removal of configuration alternatives we need to make sure that defaults |
| 167 | +make sense. List of logging features implemented by klog and proposed actions: |
| 168 | +* Routing logs based on type/verbosity - Should be reconsidered. |
| 169 | +* Writing logs to file - Feature removed. |
| 170 | +* Log file rotation based on file size - Feature removed. |
| 171 | +* Configuration of log headers - Use the current defaults. |
| 172 | +* Adding stacktrace - Feature removed. |
| 173 | + |
| 174 | +For log routing I propose to adopt UNIX convention of writing info logs to |
| 175 | +stdout and errors to stderr. For log headers I propose to use the current |
| 176 | +default. |
| 177 | + |
| 178 | +#### Split stdout and stderr |
| 179 | + |
| 180 | +As logs should be treated as event streams I would propose that we separate two |
| 181 | +main streams "info" and "error" based on log method called. As error logs should |
| 182 | +usually be treated with higher priority, having two streams prevents single |
| 183 | +pipeline from being clogged down (for example |
| 184 | +[kubernetes/klog#209](https://github.com/kubernetes/klog/issues/209)). |
| 185 | +For logging formats writing to standard streams, we should follow UNIX standard |
| 186 | +of mapping "info" logs to stdout and "error" logs to stderr. |
| 187 | + |
| 188 | +Splitting stdout from stderr would be a breaking change in both klog and |
| 189 | +kubernetes components. However, we expect only minimal impact on users, as |
| 190 | +redirecting both streams is a common practice. In rare cases that will be |
| 191 | +impacted, adapting to this change should be a 1 line change. Still we will want |
| 192 | +to give users a proper heads up before making this change, so we will hide the |
| 193 | +change behind a new logging flag `--logtostdout`. This flag will be used avoid |
| 194 | +introducing breaking change in klog. |
| 195 | + |
| 196 | +With this flag we can follow multi release plan to minimize user impact (each |
| 197 | +point should be done in a separate Kubernetes release): |
| 198 | +1. Introduce the flag in disabled state and start using it in tests. |
| 199 | +1. Announce flag availability and encourage users to adopt it. |
| 200 | +1. Enable the flag by default and deprecate it (allows users to flip back to previous behavior) |
| 201 | +1. Remove the flag following the deprecation policy. |
| 202 | + |
| 203 | +#### Logging headers |
| 204 | + |
| 205 | +Default logging headers configuration results in klog writing information about |
| 206 | +log type (error/info), timestamp when log was created and code line responsible |
| 207 | +for generation it. All this information is useful and should be utilized by |
| 208 | +modern logging solutions. Log type is useful for log filtering when looking for |
| 209 | +an issue. Log generation timestamp is useful to preserve ordering of logs and |
| 210 | +should be always preferred over time of injection which can be much later. |
| 211 | +Source code location is important to identify how log line was generated. |
| 212 | + |
| 213 | +Example: |
| 214 | +``` |
| 215 | +I0605 22:03:07.224378 3228948 logger.go:59] "Log using InfoS" key="value" |
| 216 | +``` |
| 217 | + |
| 218 | +### User Stories |
| 219 | + |
| 220 | +#### Writing logs to files |
| 221 | + |
| 222 | +We should use go-runner as a official fallback for users that want to retain |
| 223 | +writing logs to files. go-runner runs as parent process to components binary |
| 224 | +reading it's stdout/stderr and is able to route them to files. go-runner is |
| 225 | +already released as part of official K8s images it should be as simple as changing: |
| 226 | + |
| 227 | +``` |
| 228 | +/usr/local/bin/kube-apiserver --log-file=/var/log/kube-apiserver.log |
| 229 | +``` |
| 230 | + |
| 231 | +to |
| 232 | + |
| 233 | +``` |
| 234 | +/go-runner --log-file=/var/log/kube-apiserver.log /usr/local/bin/kube-apiserver |
| 235 | +``` |
| 236 | + |
| 237 | +### Caveats |
| 238 | + |
| 239 | +Is it ok for K8s components to drop support for subset of klog flags? |
| 240 | + |
| 241 | +Technically K8s already doesn't support klog flags. Klog flags are renamed to |
| 242 | +comply with K8s flag naming convention (underscores are replaced with hyphens). |
| 243 | +Full klog support was never promised to users and removal of those flags should |
| 244 | +be treated as removal of any other flag. |
| 245 | + |
| 246 | +Is it ok for K8s components to drop support writing to files? |
| 247 | +Writing directly to files is an important feature still used by users, but this |
| 248 | +doesn't directly necessitates direct support in components. By providing a |
| 249 | +external solution like go-runner we can allow community to develop more advanced |
| 250 | +features while maintaining high quality implementation within components. |
| 251 | +Having more extendable solution developed externally should be more beneficial |
| 252 | +to community when compared to forcing closed list of features on everyone. |
| 253 | + |
| 254 | +### Risks and Mitigations |
| 255 | + |
| 256 | +#### Users don't want to use go-runner as replacement. |
| 257 | + |
| 258 | +There are multiple alternatives that allow users to redirect logs to a file. |
| 259 | +Exact solution depends on users preferred way to run the process with one shared |
| 260 | +property, all of them supports consuming stdout/stderr. For example shell |
| 261 | +redirection, systemd service management or |
| 262 | +[docker logging driver](https://docs.docker.com/config/containers/logging/configure/). |
| 263 | +Not all of them support log rotation, but it's users responsibility to know |
| 264 | +complementary tooling that provides it. For example tools like |
| 265 | +[logrotate](https://linux.die.net/man/8/logrotate). |
| 266 | + |
| 267 | +#### Log processing in parent process causes performance problems |
| 268 | + |
| 269 | +Passing logs through a parent process is a normal linux pattern used by |
| 270 | +systemd-run, docker or containerd. For kubernetes we already use go-runner in |
| 271 | +scalability testing to read apiserver logs and write them to file. Before we |
| 272 | +reach Beta we should conduct detailed throughput testing of go-runner to |
| 273 | +validate upper limit, but we don't expect any performance problem just based on |
| 274 | +architecture. |
| 275 | + |
| 276 | +## Design Details |
| 277 | + |
| 278 | +### Test Plan |
| 279 | + |
| 280 | +Go-runner is already used for scalability tests. We should ensure that we cover |
| 281 | +all existing klog features. |
| 282 | + |
| 283 | +### Graduation Criteria |
| 284 | + |
| 285 | +#### Alpha |
| 286 | + |
| 287 | +- Klog can be configured without registering flags |
| 288 | +- Kubernetes logging configuration drops global state |
| 289 | +- Go-runner is feature complementary to klog flags planned for deprecation |
| 290 | +- Projects in Kubernetes Org are migrated to go-runner |
| 291 | +- Add --logtostdout flag to klog disabled by default |
| 292 | +- Use --logtostdout in kubernetes tests |
| 293 | + |
| 294 | +#### Beta |
| 295 | + |
| 296 | +- Go-runner project is well maintained and documented |
| 297 | +- Documentation on migrating off klog flags is publicly available |
| 298 | +- Kubernetes klog flags are marked as deprecated |
| 299 | +- Enable --logtostdout in Kubernetes components by default |
| 300 | + |
| 301 | +#### GA |
| 302 | + |
| 303 | +- Kubernetes klog specific flags are removed (including --logtostdout) |
| 304 | + |
| 305 | +### Upgrade / Downgrade Strategy |
| 306 | + |
| 307 | +N/A |
| 308 | + |
| 309 | +### Version Skew Strategy |
| 310 | + |
| 311 | +N/A |
| 312 | + |
| 313 | +## Implementation History |
| 314 | + |
| 315 | +- 20/06/2021 - Original proposal created in https://github.com/kubernetes/kubernetes/issues/99270 |
| 316 | +- 30/07/2021 - First KEP draft was created |
| 317 | + |
| 318 | +## Drawbacks |
| 319 | + |
| 320 | +Deprecating klog features outside klog might create confusion in community. |
| 321 | +Large part of community doesn't know that klog was created from necessity and |
| 322 | +is not the end goal for logging in Kubernetes. We should do due diligence to |
| 323 | +let community know about our plans and their impact on external components |
| 324 | +depending on klog. |
| 325 | + |
| 326 | +## Alternatives |
| 327 | + |
| 328 | +### Continue supporting all klog features |
| 329 | +At some point we should migrate all logging |
| 330 | +configuration to Options or Configuration. Doing so while supporting all klog |
| 331 | +features makes their future removal much harder. |
| 332 | + |
| 333 | +### Release klog 3.0 with removed features |
| 334 | +Removal of those features cannot be done without whole k8s community instead of |
| 335 | +just k8s core components |
0 commit comments