Skip to content

Commit 9b9ffef

Browse files
authored
Merge pull request #4480 from aravindhp/nlq-promote-beta
KEP-2258: Promote NodeLogQuery to beta
2 parents 808cafb + 7371410 commit 9b9ffef

File tree

3 files changed

+61
-27
lines changed

3 files changed

+61
-27
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2258
22
alpha:
33
approver: "@johnbelamaric"
4+
beta:
5+
approver: "@johnbelamaric"

keps/sig-windows/2258-node-log-query/README.md

Lines changed: 56 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -226,34 +226,33 @@ tests will not be possible for this feature.
226226

227227
##### e2e tests
228228

229-
We will add a test that query the kubelet service logs on Windows and Linux nodes.
230-
On Windows node, the same kubelet service logs will queried by explicitly
231-
specifying the log file. In Linux the explicit log file query will be tested by
232-
querying a random file in present in /var/log.
229+
Tests have been added that query the kubelet service logs on Linux nodes and
230+
Microsoft-Windows-Security-SPP logs on Windows nodes with various options.
233231

234-
On the Linux side tests will be added to [kubelet node](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node/kubelet.go)
235-
e2e tests. For Windows a new set of tests will be added to the existing
236-
[e2e tests](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/windows).
232+
These tests are part of the [kubelet node](https://github.com/kubernetes/kubernetes/blob/master/test/e2e/node/kubelet.go)
233+
e2e tests that are run as a daily periodic job:
234+
- https://testgrid.k8s.io/sig-windows-master-release#capz-master-windows-alpha-nodelogquery
237235

238-
- node: https://storage.googleapis.com/k8s-triage/index.html?sig=node
239-
- windows: https://storage.googleapis.com/k8s-triage/index.html?sig=windows
236+
This job runs tests against both Windows and Linux nodes.
240237

241238
### Graduation Criteria
242239

243240
The plan is to introduce the feature as alpha in the v1.27 time frame behind the
244-
`NodeLogQuery` kubelet feature gate and using the `kubectl alpha node-logs`
245-
sub-command.
241+
`NodeLogQuery` kubelet feature gate and `enableSystemLogQuery` kubelet option.
246242

247243
#### Alpha -> Beta Graduation
248244

249-
The plan is to graduate the feature to beta in the v1.29 time frame. At that
250-
point we would have collected feedback from cluster administrators and
251-
developers who have enabled the feature. In addition we will provide a kubectl
252-
plugin for querying the logs more elegantly instead of using raw API calls.
245+
The plan is to graduate the feature to beta in the v1.30 time frame. So far we
246+
have not received any negative feedback from cluster administrators and
247+
developers who have enabled the feature.
248+
249+
A [kubectl plugin](https://github.com/aravindhp/kubectl-node-logs) has been released
250+
and added to the Krew [index](https://github.com/kubernetes-sigs/krew-index/blob/master/plugins/node-logs.yaml)
251+
for querying the logs more elegantly instead of using raw API calls.
253252

254253
#### Beta -> GA Graduation
255254

256-
The plan is to graduate the feature to GA in the v1.30 time frame at which point
255+
The plan is to graduate the feature to GA in the v1.32 time frame at which point
257256
any major issues should have been surfaced and addressed during the alpha and
258257
beta phases.
259258

@@ -287,15 +286,47 @@ a 404 will be returned.
287286

288287
### Rollout, Upgrade and Rollback Planning
289288

290-
_This section must be completed when targeting beta graduation to a release._
289+
###### How can a rollout or rollback fail? Can it impact already running workloads?
290+
A rollout can fail on enabling the feature if there is a bug in the node log query code
291+
which can cause the kubelet to crash. However this has not been observed in practice or
292+
in the end to end tests. When the kubelet comes up successfully on enabling the feature,
293+
it will have no impact on workloads.
294+
There should be no impact on rolling back this feature.
295+
296+
###### What specific metrics should inform a rollback?
297+
A kubelet crash on enabling just this feature would be an indicator that a rollback is
298+
required. So far no CPU or memory spikes have been observed on enabling this feature but
299+
that could be another indicator.
300+
301+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
302+
Yes. The following manual tests were done:
303+
- Brought up a 1.30-alpha cluster without the kubelet feature gate and kubelet option. Enabled it
304+
the feature and ensured that the feature worked. Disabled the feature and ensured that the
305+
log proxy endpoint worked as before.
306+
- Brought up a 1.29 cluster and enabled the feature. Upgraded the kubelet to 1.30-alpha and ensured
307+
that the feature continued to work. Downgraded the kubelet to 1.29 and ensured that the feature
308+
continued to work. Upgraded the kubelet again to 1.30 and ensured that the feature worked.
309+
- Brought up a 1.29 cluster and enabled the feature. Upgraded the kubelet to 1.30-alpha and ensured
310+
that the feature continued to work. Disabled the feature and downgraded the kubelet to 1.29 and
311+
ensured that the log proxy endpoint worked as before. Upgraded the kubelet to 1.30-alpha and
312+
ensured that the log proxy endpoint worked as before. Enabled the feature again and ensured it worked
313+
as advertised.
314+
315+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
316+
No
291317

292318
### Monitoring Requirements
293319

294-
_This section must be completed when targeting beta graduation to a release._
320+
###### How can an operator determine if the feature is in use by workloads?
321+
While this feature does not affect any workloads an operator can determine if this feature
322+
is enabled by checking the kubelet logs for "feature gates: {map[NodeLogQuery:true]}".
295323

296-
### Dependencies
324+
###### How can someone using this feature know that it is working for their instance?
325+
- [x] Other
326+
- Details: The cluster administrator can confirm that this feature works by querying the kubelet log proxy
327+
endpoint. Example: "kubectl get --raw "/api/v1/nodes/node-1.example/proxy/logs/?query=kubelet"
297328

298-
_This section must be completed when targeting beta graduation to a release._
329+
### Dependencies
299330

300331
* **Does this feature depend on any specific services running in the cluster?**
301332
- kubelet
@@ -312,8 +343,8 @@ _This section must be completed when targeting beta graduation to a release._
312343
No
313344

314345
* **Will enabling / using this feature result in introducing new API types?**
315-
Yes. We will need to add a `NodeLogOptions` counterpart to
316-
[PodLogOptions](https://github.com/kubernetes/kubernetes/blob/548ad1b8d35d51e6d33ea21dcc75d60a789b00e6/pkg/apis/core/types.go#L4409)
346+
The feature does not introduce a new API from an API server perspective but
347+
the existing kubelet proxy/log endpoint will have new features built into it.
317348

318349
* **Will enabling / using this feature result in any new calls to the cloud
319350
provider?**
@@ -330,9 +361,8 @@ operations covered by [existing SLIs/SLOs]?**
330361
* **Will enabling / using this feature result in non-negligible increase of
331362
resource usage (CPU, RAM, disk, IO, ...) in any components?**
332363
In the case of large logs, there is potential for an increase in RAM and CPU
333-
usage on the node when an attempt is made to stream them. Feedback from the
334-
field during alpha will provide more clarity as we graduate from alpha to
335-
beta.
364+
usage on the node when an attempt is made to stream them. However, so far no
365+
CPU or memory spikes have been reported from the field.
336366

337367
### Troubleshooting
338368

@@ -342,6 +372,7 @@ resource usage (CPU, RAM, disk, IO, ...) in any components?**
342372
- Updated on May 5th, 2021
343373
- Updated on Dec 13th, 2022
344374
- Updated on May 2nd, 2023
375+
- Updated on Feb 5th, 2024
345376

346377
## Drawbacks
347378

keps/sig-windows/2258-node-log-query/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,17 @@ approvers:
1818
creation-date: 2021-01-14
1919
last-updated: 2023-05-02
2020
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
21+
stage: beta
2222

2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "v1.27"
26+
latest-milestone: "v1.30"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:
3030
alpha: "v1.27"
31+
beta: "v1.30"
3132

3233
# The following PRR answers are required at alpha release
3334
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)