Skip to content

Commit 7db0451

Browse files
authored
Hardened Exec Requests: Fill in PRR, add Kubelet feature gate, mark implementable (kubernetes#2062)
* Fill in PRR, add feature gate, mark implementable * Add alpha criteria * Add note on identifying breakages
1 parent 15cd22b commit 7db0451

File tree

2 files changed

+81
-93
lines changed

2 files changed

+81
-93
lines changed

keps/sig-node/1898-hardened-exec/README.md

Lines changed: 72 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
- [API](#api)
2323
- [Test Plan](#test-plan)
2424
- [Graduation Criteria](#graduation-criteria)
25+
- [Alpha Criteria](#alpha-criteria)
2526
- [Alpha -> Beta Graduation](#alpha---beta-graduation)
2627
- [Beta -> GA Graduation](#beta---ga-graduation)
2728
- [Version Skew Strategy](#version-skew-strategy)
@@ -41,12 +42,12 @@
4142

4243
Items marked with (R) are required *prior to targeting to a milestone / release*.
4344

44-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
45-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
46-
- [ ] (R) Design details are appropriately documented
47-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
48-
- [ ] (R) Graduation criteria is in place
49-
- [ ] (R) Production readiness review completed
45+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
46+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
47+
- [x] (R) Design details are appropriately documented
48+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
49+
- [x] (R) Graduation criteria is in place
50+
- [x] (R) Production readiness review completed
5051
- [ ] Production readiness review approved
5152
- [ ] "Implementation History" section is up-to-date for milestone
5253
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -110,17 +111,17 @@ policy](https://kubernetes.io/docs/setup/release/version-skew-policy/#kubelet) r
110111
Kubelet not be newer than the kube-apiserver, so backwards compatibility with older apiservers is
111112
not required.
112113

114+
In case there are non-apiserver clients, all the backwards-incompatible Kubelet changes described
115+
below will be guarded by the feature gate `DeprecatedKubeletStreamingAPI`. This feature gate will
116+
start in the default-disabled state and `Deprecated` prerelease channel, and serves to provide a
117+
temporary escape hatch for these API changes.
118+
113119
#### 1. Remove the `/run` endpoint
114120

115121
The `run` endpoint provides the option to run a command in a container with a synchronous
116122
request. This endpoint is not used by any Kubernetes components, and should be removed to reduce the
117123
attack surface.
118124

119-
**Risk:** Although this endpoint is not exposed in the Kubernetes API, it is still reachable via a
120-
proxy request to a node, or by connecting to the Kubelet directly. If desired, we could tie the
121-
removal to the deprecation feature gate discussed
122-
[below](#1-require-options-to-be-included-in-the-post-request-body-for-exec-requests).
123-
124125
#### 2. Require a `POST` request to the streaming endpoints
125126

126127
Exec, attach, and port-forward (and run) all respond to either GET or POST requests. Currently the
@@ -184,7 +185,7 @@ supported clients have been updated with the new request format.
184185

185186
<<[UNRESOLVED]>>
186187

187-
OPEN QUESTION: How many releases should we wait before requiring request body parameters?
188+
OPEN QUESTION: How many releases should we wait before graduating `HardenedExecRequests` to Beta?
188189

189190
<<[/UNRESOLVED]>>
190191

@@ -202,7 +203,8 @@ be provided in the request body](#4-require-the-request-options-to-be-provided-i
202203
#### 3. Require the websocket protocol for GET requests.
203204

204205
We cannot completely remove support for GET exec requests without breaking websockets. However, we
205-
can require the websocket protocol be used to perform a GET exec.
206+
can require the websocket protocol be used to perform a GET exec. This requirement is guarded by the
207+
`HardenedExecRequests` feature gate.
206208

207209
**Risk:** This is a breaking change for non-websocket clients, and leaves websocket clients exposed
208210
to SSRF risks.
@@ -290,6 +292,20 @@ The tests should be implemented under `test/e2e/common` for inclusion in the `e2
290292

291293
### Graduation Criteria
292294

295+
#### Alpha Criteria
296+
297+
- Update `PodExecOptions` with pod reference
298+
- Update Kubelet API (guarded by `DeprecatedKubeletStreamingAPI`)
299+
- Remove the kubelet's `/run` and UID-specific endpoints
300+
- Require POST request for kubelet streaming endpoints
301+
- Require options in request body
302+
- Update kube-apiserver:
303+
- Always use POST for streaming requests to Kubelet
304+
- Send options in request body (but also query params)
305+
- Require POST with request body for non-websocket `exec` requests, guarded by **alpha** `HardenedExecRequests`
306+
- Update go-client to send exec POST requests with options in the body (and also in query params)
307+
- Expand E2E test coverage - https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/1898-hardened-exec#test-plan
308+
293309
#### Alpha -> Beta Graduation
294310

295311
- Clients have been updated for a sufficient amount of time.
@@ -307,64 +323,36 @@ send both query & body parameters, both old and new Kubelets will accept the req
307323

308324
## Production Readiness Review Questionnaire
309325

310-
<!--
311-
312-
Production readiness reviews are intended to ensure that features merging into
313-
Kubernetes are observable, scalable and supportable; can be safely operated in
314-
production environments, and can be disabled or rolled back in the event they
315-
cause increased failures in production. See more in the PRR KEP at
316-
https://git.k8s.io/enhancements/keps/sig-architecture/20190731-production-readiness-review-process.md.
317-
318-
The production readiness review questionnaire must be completed for features in
319-
v1.19 or later, but is non-blocking at this time. That is, approval is not
320-
required in order to be in the release.
321-
322-
In some cases, the questions below should also have answers in `kep.yaml`. This
323-
is to enable automation to verify the presence of the review, and to reduce review
324-
burden and latency.
325-
326-
The KEP must have a approver from the
327-
[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
328-
team. Please reach out on the
329-
[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
330-
you need any help or guidance.
331-
332-
-->
333-
334-
TODO
335-
336326
### Feature Enablement and Rollback
337327

338328
_This section must be completed when targeting alpha to a release._
339329

340330
* **How can this feature be enabled / disabled in a live cluster?**
341-
- [ ] Feature gate (also fill in values in `kep.yaml`)
342-
- Feature gate name:
343-
- Components depending on the feature gate:
344-
- [ ] Other
345-
- Describe the mechanism:
346-
- Will enabling / disabling the feature require downtime of the control
347-
plane?
348-
- Will enabling / disabling the feature require downtime or reprovisioning
349-
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
331+
- Feature gate `HardenedExecRequests`
332+
- Components depending on the feature gate: kube-apiserver
333+
- Description: Guards new backwards-incompatible requirements on pod exec requests to the
334+
kube-apiserver
335+
- Feature gate: `DeprecatedKubeletStreamingAPI`
336+
- Components depending on the feature gate: kubelet
337+
- Description: Enables the unused (by kube-apiserver) kubelet streaming APIs. Default disabled.
350338

351339
* **Does enabling the feature change any default behavior?**
352-
Any change of default behavior may be surprising to users or break existing
353-
automations, so be extremely careful here.
340+
Yes, enabling `HardenedExecRequests` alters the pod exec API by adding additional constraints.
341+
Disabling `DeprecatedKubeletStreamingAPI` (default) also removes several request paths from the
342+
Kubelet API in addition to further constraining several remaining APIs. These APIs are only
343+
intended for use by the kube-apiserver, and are only required to be forwards-compatible by the
344+
[Kubernetes version skew
345+
policy](https://kubernetes.io/docs/setup/release/version-skew-policy/#kubelet).
354346

355-
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
356-
the enablement)?**
357-
Also set `disable-supported` to `true` or `false` in `kep.yaml`.
358-
Describe the consequences on existing workloads (e.g., if this is a runtime
359-
feature, can it break the existing applications?).
347+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
348+
Yes, both features are stateless and can be enabled or disabled without any consequence outside of
349+
those explicitly controlled by the feature gates.
360350

361351
* **What happens if we reenable the feature if it was previously rolled back?**
352+
Nothing. See previous question.
362353

363354
* **Are there any tests for feature enablement/disablement?**
364-
The e2e framework does not currently support enabling or disabling feature
365-
gates. However, unit tests in each component dealing with managing data, created
366-
with and without the feature, are necessary. At the very least, think about
367-
conversion tests if API types are being modified.
355+
See [Test Plan](#test-plan)
368356

369357
### Rollout, Upgrade and Rollback Planning
370358

@@ -390,9 +378,25 @@ fields of API types, flags, etc.?**
390378
_This section must be completed when targeting beta graduation to a release._
391379

392380
* **How can an operator determine if the feature is in use by workloads?**
393-
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
394-
checking if there are objects with field X set) may be a last resort. Avoid
395-
logs or events for this purpose.
381+
382+
Any request counts for the following metrics indicate that the deprecated Kubelet APIs are in use,
383+
and clients may be broken by disabling `DeprecatedKubeletStreamingAPIs` (`*` indicates any value):
384+
385+
- `kubelet_http_requests_total{long_running=*,method="GET",path="exec",server_type="*"}`
386+
- `kubelet_http_requests_total{long_running=*,method="GET",path="attach",server_type="*"}`
387+
- `kubelet_http_requests_total{long_running=*,method="GET",path="portforward",server_type="*"}`
388+
- `kubelet_http_requests_total{long_running=*,method=*,path="run",server_type="*"}` _Note the method wildcard_
389+
390+
There are no metrics identifying requests missing body parameters, or metrics that break out the UID
391+
sub-paths. These requests, along with reject requests can be identified in the Kubelet's logs.
392+
393+
Unfortunately there are no metrics recording Kubelet response status, so requests that are broken by
394+
disabling `DeprecatedKubeletStreamingAPIs` will need to be detected client-side. See
395+
https://github.com/kubernetes/kubernetes/issues/95307.
396+
397+
On the API server side, GET exec requests can be identified from the audit logs, which can also be
398+
used to identify the client. An increase in 400 Bad Request codes to exec may indicate a breakage by
399+
`HardenedExecRequests`.
396400

397401
* **What are the SLIs (Service Level Indicators) an operator can use to determine
398402
the health of the service?**
@@ -447,45 +451,26 @@ _For GA, this section is required: approvers should be able to confirm the
447451
previous answers based on experience in the field._
448452

449453
* **Will enabling / using this feature result in any new API calls?**
450-
Describe them, providing:
451-
- API call type (e.g. PATCH pods)
452-
- estimated throughput
453-
- originating component(s) (e.g. Kubelet, Feature-X-controller)
454-
focusing mostly on:
455-
- components listing and/or watching resources they didn't before
456-
- API calls that may be triggered by changes of some Kubernetes resources
457-
(e.g. update of object X triggers new updates of object Y)
458-
- periodic API calls to reconcile state (e.g. periodic fetching state,
459-
heartbeats, leader election, etc.)
454+
No.
460455

461456
* **Will enabling / using this feature result in introducing new API types?**
462-
Describe them, providing:
463-
- API type
464-
- Supported number of objects per cluster
465-
- Supported number of objects per namespace (for namespace-scoped objects)
457+
No.
466458

467459
* **Will enabling / using this feature result in any new calls to the cloud
468460
provider?**
461+
No.
469462

470463
* **Will enabling / using this feature result in increasing size or count of
471464
the existing API objects?**
472-
Describe them, providing:
473-
- API type(s):
474-
- Estimated increase in size: (e.g., new annotation of size 32B)
475-
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
465+
This will expand the `PodExecOptions` API, but this API is not stored.
476466

477467
* **Will enabling / using this feature result in increasing time taken by any
478468
operations covered by [existing SLIs/SLOs]?**
479-
Think about adding additional work or introducing new steps in between
480-
(e.g. need to do X to start a container), etc. Please describe the details.
469+
No.
481470

482471
* **Will enabling / using this feature result in non-negligible increase of
483472
resource usage (CPU, RAM, disk, IO, ...) in any components?**
484-
Things to keep in mind include: additional in-memory state, additional
485-
non-trivial computations, excessive access to disks (including increased log
486-
volume), significant amount of data sent and/or received over network, etc.
487-
This through this both in small and large cases, again with respect to the
488-
[supported limits].
473+
No.
489474

490475
### Troubleshooting
491476

keps/sig-node/1898-hardened-exec/kep.yaml

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,15 @@ owning-sig: sig-node
66
participating-sigs:
77
- sig-api-machinery
88
- sig-auth
9-
status: provisional
9+
status: implementable
1010
creation-date: 2020-07-08
1111
reviewers:
12-
- TBD
12+
- derekwaynecarr
13+
- liggitt
14+
- sftim
1315
approvers:
14-
- TBD
16+
- derekwaynecarr
17+
- liggitt
1518
prr-approvers:
1619
- deads2k
1720
see-also:
@@ -36,10 +39,10 @@ milestone:
3639
feature-gates:
3740
- name: HardenedExecRequests
3841
components:
39-
- kubelet
4042
- kube-apiserver
41-
- client-go
42-
- kubectl
43+
- name: DeprecatedKubeletStreamingAPI
44+
components:
45+
- kubelet
4346
disable-supported: true
4447

4548
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)