22
22
- [ API] ( #api )
23
23
- [ Test Plan] ( #test-plan )
24
24
- [ Graduation Criteria] ( #graduation-criteria )
25
+ - [ Alpha Criteria] ( #alpha-criteria )
25
26
- [ Alpha -> ; Beta Graduation] ( #alpha---beta-graduation )
26
27
- [ Beta -> ; GA Graduation] ( #beta---ga-graduation )
27
28
- [ Version Skew Strategy] ( #version-skew-strategy )
41
42
42
43
Items marked with (R) are required * prior to targeting to a milestone / release* .
43
44
44
- - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
45
- - [ ] (R) KEP approvers have approved the KEP status as ` implementable `
46
- - [ ] (R) Design details are appropriately documented
47
- - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
48
- - [ ] (R) Graduation criteria is in place
49
- - [ ] (R) Production readiness review completed
45
+ - [x ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
46
+ - [x ] (R) KEP approvers have approved the KEP status as ` implementable `
47
+ - [x ] (R) Design details are appropriately documented
48
+ - [x ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
49
+ - [x ] (R) Graduation criteria is in place
50
+ - [x ] (R) Production readiness review completed
50
51
- [ ] Production readiness review approved
51
52
- [ ] "Implementation History" section is up-to-date for milestone
52
53
- [ ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
@@ -110,17 +111,17 @@ policy](https://kubernetes.io/docs/setup/release/version-skew-policy/#kubelet) r
110
111
Kubelet not be newer than the kube-apiserver, so backwards compatibility with older apiservers is
111
112
not required.
112
113
114
+ In case there are non-apiserver clients, all the backwards-incompatible Kubelet changes described
115
+ below will be guarded by the feature gate ` DeprecatedKubeletStreamingAPI ` . This feature gate will
116
+ start in the default-disabled state and ` Deprecated ` prerelease channel, and serves to provide a
117
+ temporary escape hatch for these API changes.
118
+
113
119
#### 1. Remove the ` /run ` endpoint
114
120
115
121
The ` run ` endpoint provides the option to run a command in a container with a synchronous
116
122
request. This endpoint is not used by any Kubernetes components, and should be removed to reduce the
117
123
attack surface.
118
124
119
- ** Risk:** Although this endpoint is not exposed in the Kubernetes API, it is still reachable via a
120
- proxy request to a node, or by connecting to the Kubelet directly. If desired, we could tie the
121
- removal to the deprecation feature gate discussed
122
- [ below] ( #1-require-options-to-be-included-in-the-post-request-body-for-exec-requests ) .
123
-
124
125
#### 2. Require a ` POST ` request to the streaming endpoints
125
126
126
127
Exec, attach, and port-forward (and run) all respond to either GET or POST requests. Currently the
@@ -184,7 +185,7 @@ supported clients have been updated with the new request format.
184
185
185
186
<<[ UNRESOLVED] >>
186
187
187
- OPEN QUESTION: How many releases should we wait before requiring request body parameters ?
188
+ OPEN QUESTION: How many releases should we wait before graduating ` HardenedExecRequests ` to Beta ?
188
189
189
190
<<[ /UNRESOLVED] >>
190
191
@@ -202,7 +203,8 @@ be provided in the request body](#4-require-the-request-options-to-be-provided-i
202
203
#### 3. Require the websocket protocol for GET requests.
203
204
204
205
We cannot completely remove support for GET exec requests without breaking websockets. However, we
205
- can require the websocket protocol be used to perform a GET exec.
206
+ can require the websocket protocol be used to perform a GET exec. This requirement is guarded by the
207
+ ` HardenedExecRequests ` feature gate.
206
208
207
209
** Risk:** This is a breaking change for non-websocket clients, and leaves websocket clients exposed
208
210
to SSRF risks.
@@ -290,6 +292,20 @@ The tests should be implemented under `test/e2e/common` for inclusion in the `e2
290
292
291
293
### Graduation Criteria
292
294
295
+ #### Alpha Criteria
296
+
297
+ - Update ` PodExecOptions ` with pod reference
298
+ - Update Kubelet API (guarded by ` DeprecatedKubeletStreamingAPI ` )
299
+ - Remove the kubelet's ` /run ` and UID-specific endpoints
300
+ - Require POST request for kubelet streaming endpoints
301
+ - Require options in request body
302
+ - Update kube-apiserver:
303
+ - Always use POST for streaming requests to Kubelet
304
+ - Send options in request body (but also query params)
305
+ - Require POST with request body for non-websocket ` exec ` requests, guarded by ** alpha** ` HardenedExecRequests `
306
+ - Update go-client to send exec POST requests with options in the body (and also in query params)
307
+ - Expand E2E test coverage - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1898-hardened-exec#test-plan
308
+
293
309
#### Alpha -> Beta Graduation
294
310
295
311
- Clients have been updated for a sufficient amount of time.
@@ -307,64 +323,36 @@ send both query & body parameters, both old and new Kubelets will accept the req
307
323
308
324
## Production Readiness Review Questionnaire
309
325
310
- <!--
311
-
312
- Production readiness reviews are intended to ensure that features merging into
313
- Kubernetes are observable, scalable and supportable; can be safely operated in
314
- production environments, and can be disabled or rolled back in the event they
315
- cause increased failures in production. See more in the PRR KEP at
316
- https://git.k8s.io/enhancements/keps/sig-architecture/20190731-production-readiness-review-process.md.
317
-
318
- The production readiness review questionnaire must be completed for features in
319
- v1.19 or later, but is non-blocking at this time. That is, approval is not
320
- required in order to be in the release.
321
-
322
- In some cases, the questions below should also have answers in `kep.yaml`. This
323
- is to enable automation to verify the presence of the review, and to reduce review
324
- burden and latency.
325
-
326
- The KEP must have a approver from the
327
- [`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
328
- team. Please reach out on the
329
- [#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
330
- you need any help or guidance.
331
-
332
- -->
333
-
334
- TODO
335
-
336
326
### Feature Enablement and Rollback
337
327
338
328
_ This section must be completed when targeting alpha to a release._
339
329
340
330
* ** How can this feature be enabled / disabled in a live cluster?**
341
- - [ ] Feature gate (also fill in values in ` kep.yaml ` )
342
- - Feature gate name:
343
- - Components depending on the feature gate:
344
- - [ ] Other
345
- - Describe the mechanism:
346
- - Will enabling / disabling the feature require downtime of the control
347
- plane?
348
- - Will enabling / disabling the feature require downtime or reprovisioning
349
- of a node? (Do not assume ` Dynamic Kubelet Config ` feature is enabled).
331
+ - Feature gate ` HardenedExecRequests `
332
+ - Components depending on the feature gate: kube-apiserver
333
+ - Description: Guards new backwards-incompatible requirements on pod exec requests to the
334
+ kube-apiserver
335
+ - Feature gate: ` DeprecatedKubeletStreamingAPI `
336
+ - Components depending on the feature gate: kubelet
337
+ - Description: Enables the unused (by kube-apiserver) kubelet streaming APIs. Default disabled.
350
338
351
339
* ** Does enabling the feature change any default behavior?**
352
- Any change of default behavior may be surprising to users or break existing
353
- automations, so be extremely careful here.
340
+ Yes, enabling ` HardenedExecRequests ` alters the pod exec API by adding additional constraints.
341
+ Disabling ` DeprecatedKubeletStreamingAPI ` (default) also removes several request paths from the
342
+ Kubelet API in addition to further constraining several remaining APIs. These APIs are only
343
+ intended for use by the kube-apiserver, and are only required to be forwards-compatible by the
344
+ [ Kubernetes version skew
345
+ policy] ( https://kubernetes.io/docs/setup/release/version-skew-policy/#kubelet ) .
354
346
355
- * ** Can the feature be disabled once it has been enabled (i.e. can we roll back
356
- the enablement)?**
357
- Also set ` disable-supported ` to ` true ` or ` false ` in ` kep.yaml ` .
358
- Describe the consequences on existing workloads (e.g., if this is a runtime
359
- feature, can it break the existing applications?).
347
+ * ** Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
348
+ Yes, both features are stateless and can be enabled or disabled without any consequence outside of
349
+ those explicitly controlled by the feature gates.
360
350
361
351
* ** What happens if we reenable the feature if it was previously rolled back?**
352
+ Nothing. See previous question.
362
353
363
354
* ** Are there any tests for feature enablement/disablement?**
364
- The e2e framework does not currently support enabling or disabling feature
365
- gates. However, unit tests in each component dealing with managing data, created
366
- with and without the feature, are necessary. At the very least, think about
367
- conversion tests if API types are being modified.
355
+ See [ Test Plan] ( #test-plan )
368
356
369
357
### Rollout, Upgrade and Rollback Planning
370
358
@@ -390,9 +378,25 @@ fields of API types, flags, etc.?**
390
378
_ This section must be completed when targeting beta graduation to a release._
391
379
392
380
* ** How can an operator determine if the feature is in use by workloads?**
393
- Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
394
- checking if there are objects with field X set) may be a last resort. Avoid
395
- logs or events for this purpose.
381
+
382
+ Any request counts for the following metrics indicate that the deprecated Kubelet APIs are in use,
383
+ and clients may be broken by disabling ` DeprecatedKubeletStreamingAPIs ` (` * ` indicates any value):
384
+
385
+ - ` kubelet_http_requests_total{long_running=*,method="GET",path="exec",server_type="*"} `
386
+ - ` kubelet_http_requests_total{long_running=*,method="GET",path="attach",server_type="*"} `
387
+ - ` kubelet_http_requests_total{long_running=*,method="GET",path="portforward",server_type="*"} `
388
+ - ` kubelet_http_requests_total{long_running=*,method=*,path="run",server_type="*"} ` _ Note the method wildcard_
389
+
390
+ There are no metrics identifying requests missing body parameters, or metrics that break out the UID
391
+ sub-paths. These requests, along with reject requests can be identified in the Kubelet's logs.
392
+
393
+ Unfortunately there are no metrics recording Kubelet response status, so requests that are broken by
394
+ disabling ` DeprecatedKubeletStreamingAPIs ` will need to be detected client-side. See
395
+ https://github.com/kubernetes/kubernetes/issues/95307 .
396
+
397
+ On the API server side, GET exec requests can be identified from the audit logs, which can also be
398
+ used to identify the client. An increase in 400 Bad Request codes to exec may indicate a breakage by
399
+ ` HardenedExecRequests ` .
396
400
397
401
* ** What are the SLIs (Service Level Indicators) an operator can use to determine
398
402
the health of the service?**
@@ -447,45 +451,26 @@ _For GA, this section is required: approvers should be able to confirm the
447
451
previous answers based on experience in the field._
448
452
449
453
* ** Will enabling / using this feature result in any new API calls?**
450
- Describe them, providing:
451
- - API call type (e.g. PATCH pods)
452
- - estimated throughput
453
- - originating component(s) (e.g. Kubelet, Feature-X-controller)
454
- focusing mostly on:
455
- - components listing and/or watching resources they didn't before
456
- - API calls that may be triggered by changes of some Kubernetes resources
457
- (e.g. update of object X triggers new updates of object Y)
458
- - periodic API calls to reconcile state (e.g. periodic fetching state,
459
- heartbeats, leader election, etc.)
454
+ No.
460
455
461
456
* ** Will enabling / using this feature result in introducing new API types?**
462
- Describe them, providing:
463
- - API type
464
- - Supported number of objects per cluster
465
- - Supported number of objects per namespace (for namespace-scoped objects)
457
+ No.
466
458
467
459
* ** Will enabling / using this feature result in any new calls to the cloud
468
460
provider?**
461
+ No.
469
462
470
463
* ** Will enabling / using this feature result in increasing size or count of
471
464
the existing API objects?**
472
- Describe them, providing:
473
- - API type(s):
474
- - Estimated increase in size: (e.g., new annotation of size 32B)
475
- - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
465
+ This will expand the ` PodExecOptions ` API, but this API is not stored.
476
466
477
467
* ** Will enabling / using this feature result in increasing time taken by any
478
468
operations covered by [ existing SLIs/SLOs] ?**
479
- Think about adding additional work or introducing new steps in between
480
- (e.g. need to do X to start a container), etc. Please describe the details.
469
+ No.
481
470
482
471
* ** Will enabling / using this feature result in non-negligible increase of
483
472
resource usage (CPU, RAM, disk, IO, ...) in any components?**
484
- Things to keep in mind include: additional in-memory state, additional
485
- non-trivial computations, excessive access to disks (including increased log
486
- volume), significant amount of data sent and/or received over network, etc.
487
- This through this both in small and large cases, again with respect to the
488
- [ supported limits] .
473
+ No.
489
474
490
475
### Troubleshooting
491
476
0 commit comments