@@ -226,34 +226,33 @@ tests will not be possible for this feature.
226
226
227
227
##### e2e tests
228
228
229
- We will add a test that query the kubelet service logs on Windows and Linux nodes.
230
- On Windows node, the same kubelet service logs will queried by explicitly
231
- specifying the log file. In Linux the explicit log file query will be tested by
232
- querying a random file in present in /var/log.
229
+ Tests have been added that query the kubelet service logs on Linux nodes and
230
+ Microsoft-Windows-Security-SPP logs on Windows nodes with various options.
233
231
234
- On the Linux side tests will be added to [ kubelet node] ( https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node/kubelet.go )
235
- e2e tests. For Windows a new set of tests will be added to the existing
236
- [ e2e tests ] ( https://github.com/kubernetes/kubernetes/tree/ master/test/e2e/ windows ) .
232
+ These tests are part of the [ kubelet node] ( https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node/kubelet.go )
233
+ e2e tests that are run as a daily periodic job:
234
+ - https://testgrid.k8s.io/sig-windows- master-release#capz-master- windows-alpha-nodelogquery
237
235
238
- - node: https://storage.googleapis.com/k8s-triage/index.html?sig=node
239
- - windows: https://storage.googleapis.com/k8s-triage/index.html?sig=windows
236
+ This job runs tests against both Windows and Linux nodes.
240
237
241
238
### Graduation Criteria
242
239
243
240
The plan is to introduce the feature as alpha in the v1.27 time frame behind the
244
- ` NodeLogQuery ` kubelet feature gate and using the ` kubectl alpha node-logs `
245
- sub-command.
241
+ ` NodeLogQuery ` kubelet feature gate and ` enableSystemLogQuery ` kubelet option.
246
242
247
243
#### Alpha -> Beta Graduation
248
244
249
- The plan is to graduate the feature to beta in the v1.29 time frame. At that
250
- point we would have collected feedback from cluster administrators and
251
- developers who have enabled the feature. In addition we will provide a kubectl
252
- plugin for querying the logs more elegantly instead of using raw API calls.
245
+ The plan is to graduate the feature to beta in the v1.30 time frame. So far we
246
+ have not received any negative feedback from cluster administrators and
247
+ developers who have enabled the feature.
248
+
249
+ A [ kubectl plugin] ( https://github.com/aravindhp/kubectl-node-logs ) has been released
250
+ and added to the Krew [ index] ( https://github.com/kubernetes-sigs/krew-index/blob/master/plugins/node-logs.yaml )
251
+ for querying the logs more elegantly instead of using raw API calls.
253
252
254
253
#### Beta -> GA Graduation
255
254
256
- The plan is to graduate the feature to GA in the v1.30 time frame at which point
255
+ The plan is to graduate the feature to GA in the v1.32 time frame at which point
257
256
any major issues should have been surfaced and addressed during the alpha and
258
257
beta phases.
259
258
@@ -287,15 +286,47 @@ a 404 will be returned.
287
286
288
287
### Rollout, Upgrade and Rollback Planning
289
288
290
- _ This section must be completed when targeting beta graduation to a release._
289
+ ###### How can a rollout or rollback fail? Can it impact already running workloads?
290
+ A rollout can fail on enabling the feature if there is a bug in the node log query code
291
+ which can cause the kubelet to crash. However this has not been observed in practice or
292
+ in the end to end tests. When the kubelet comes up successfully on enabling the feature,
293
+ it will have no impact on workloads.
294
+ There should be no impact on rolling back this feature.
295
+
296
+ ###### What specific metrics should inform a rollback?
297
+ A kubelet crash on enabling just this feature would be an indicator that a rollback is
298
+ required. So far no CPU or memory spikes have been observed on enabling this feature but
299
+ that could be another indicator.
300
+
301
+ ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
302
+ Yes. The following manual tests were done:
303
+ - Brought up a 1.30-alpha cluster without the kubelet feature gate and kubelet option. Enabled it
304
+ the feature and ensured that the feature worked. Disabled the feature and ensured that the
305
+ log proxy endpoint worked as before.
306
+ - Brought up a 1.29 cluster and enabled the feature. Upgraded the kubelet to 1.30-alpha and ensured
307
+ that the feature continued to work. Downgraded the kubelet to 1.29 and ensured that the feature
308
+ continued to work. Upgraded the kubelet again to 1.30 and ensured that the feature worked.
309
+ - Brought up a 1.29 cluster and enabled the feature. Upgraded the kubelet to 1.30-alpha and ensured
310
+ that the feature continued to work. Disabled the feature and downgraded the kubelet to 1.29 and
311
+ ensured that the log proxy endpoint worked as before. Upgraded the kubelet to 1.30-alpha and
312
+ ensured that the log proxy endpoint worked as before. Enabled the feature again and ensured it worked
313
+ as advertised.
314
+
315
+ ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
316
+ No
291
317
292
318
### Monitoring Requirements
293
319
294
- _ This section must be completed when targeting beta graduation to a release._
320
+ ###### How can an operator determine if the feature is in use by workloads?
321
+ While this feature does not affect any workloads an operator can determine if this feature
322
+ is enabled by checking the kubelet logs for "feature gates: {map[ NodeLogQuery: true ] }".
295
323
296
- ### Dependencies
324
+ ###### How can someone using this feature know that it is working for their instance?
325
+ - [x] Other
326
+ - Details: The cluster administrator can confirm that this feature works by querying the kubelet log proxy
327
+ endpoint. Example: "kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet"
297
328
298
- _ This section must be completed when targeting beta graduation to a release. _
329
+ ### Dependencies
299
330
300
331
* ** Does this feature depend on any specific services running in the cluster?**
301
332
- kubelet
@@ -312,8 +343,8 @@ _This section must be completed when targeting beta graduation to a release._
312
343
No
313
344
314
345
* ** Will enabling / using this feature result in introducing new API types?**
315
- Yes. We will need to add a ` NodeLogOptions ` counterpart to
316
- [ PodLogOptions ] ( https://github.com/kubernetes/kubernetes/blob/548ad1b8d35d51e6d33ea21dcc75d60a789b00e6/pkg/apis/core/types.go#L4409 )
346
+ The feature does not introduce a new API from an API server perspective but
347
+ the existing kubelet proxy/log endpoint will have new features built into it.
317
348
318
349
* ** Will enabling / using this feature result in any new calls to the cloud
319
350
provider?**
@@ -330,9 +361,8 @@ operations covered by [existing SLIs/SLOs]?**
330
361
* ** Will enabling / using this feature result in non-negligible increase of
331
362
resource usage (CPU, RAM, disk, IO, ...) in any components?**
332
363
In the case of large logs, there is potential for an increase in RAM and CPU
333
- usage on the node when an attempt is made to stream them. Feedback from the
334
- field during alpha will provide more clarity as we graduate from alpha to
335
- beta.
364
+ usage on the node when an attempt is made to stream them. However, so far no
365
+ CPU or memory spikes have been reported from the field.
336
366
337
367
### Troubleshooting
338
368
@@ -342,6 +372,7 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
342
372
- Updated on May 5th, 2021
343
373
- Updated on Dec 13th, 2022
344
374
- Updated on May 2nd, 2023
375
+ - Updated on Feb 5th, 2024
345
376
346
377
## Drawbacks
347
378
0 commit comments