Skip to content

Commit 849e2af

Browse files
authored
Merge pull request #4942 from tssurya/psa-host-field
Add PSA to block host field in probe/lifecycle handlers
2 parents 323c4de + 6d0e061 commit 849e2af

File tree

2 files changed

+425
-0
lines changed

2 files changed

+425
-0
lines changed
Lines changed: 399 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,399 @@
1+
# KEP-4940: Add Pod Security Admission (PSA) to block setting `.host` field from ProbeHandler and LifecycleHandler
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Goals](#goals)
7+
- [Non-Goals](#non-goals)
8+
- [Proposal](#proposal)
9+
- [Risks and Mitigations](#risks-and-mitigations)
10+
- [Design Details](#design-details)
11+
- [Test Plan](#test-plan)
12+
- [Prerequisite testing updates](#prerequisite-testing-updates)
13+
- [Unit tests](#unit-tests)
14+
- [Integration tests](#integration-tests)
15+
- [e2e tests](#e2e-tests)
16+
- [Graduation Criteria](#graduation-criteria)
17+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
18+
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
21+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
22+
- [Monitoring Requirements](#monitoring-requirements)
23+
- [Dependencies](#dependencies)
24+
- [Scalability](#scalability)
25+
- [Troubleshooting](#troubleshooting)
26+
- [Implementation History](#implementation-history)
27+
- [Drawbacks](#drawbacks)
28+
- [Alternatives](#alternatives)
29+
<!-- /toc -->
30+
31+
## Release Signoff Checklist
32+
33+
<!--
34+
**ACTION REQUIRED:** In order to merge code into a release, there must be an
35+
issue in [kubernetes/enhancements] referencing this KEP and targeting a release
36+
milestone **before the [Enhancement Freeze](https://git.k8s.io/sig-release/releases)
37+
of the targeted release**.
38+
39+
For enhancements that make changes to code or processes/procedures in core
40+
Kubernetes—i.e., [kubernetes/kubernetes], we require the following Release
41+
Signoff checklist to be completed.
42+
43+
Check these off as they are completed for the Release Team to track. These
44+
checklist items _must_ be updated for the enhancement to be released.
45+
-->
46+
47+
Items marked with (R) are required *prior to targeting to a milestone / release*.
48+
49+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
50+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
51+
- [ ] (R) Design details are appropriately documented
52+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
53+
- [ ] e2e Tests for all Beta API Operations (endpoints)
54+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
55+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
56+
- [ ] (R) Graduation criteria is in place
57+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
58+
- [ ] (R) Production readiness review completed
59+
- [ ] (R) Production readiness review approved
60+
- [ ] "Implementation History" section is up-to-date for milestone
61+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
62+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
63+
64+
<!--
65+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
66+
-->
67+
68+
[kubernetes.io]: https://kubernetes.io/
69+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
70+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
71+
[kubernetes/website]: https://git.k8s.io/website
72+
73+
## Summary
74+
75+
We have a `Host` field that can be set from `TCPSocketAction` and
76+
`HTTPGetAction` fields which are part of the `ProbeHandler` and
77+
`LifecycleHandler` structs in Kubernetes that are used in
78+
`InitContainers` and `Containers` structs of `PodSpec`.
79+
The `Host` field is used for allowing users to specify
80+
another entity other than the podIP (which is the default value) to
81+
which Kubelet should perform probes to.
82+
However this opens it up for security attacks since the `Host`
83+
field can be set to pretty much any value in the system including
84+
security sensitive external hosts or localhost on the node.
85+
Kubelet will be probing this set `Host` value which can
86+
lead to blind SSRF attacks.
87+
88+
### Goals
89+
90+
* Add Pod Security Admission (PSA) to enable admins to restrict
91+
users from creating probes with the `Host` field set.
92+
* The Baseline Pod Security Standard (PSS) will be updated to enforce
93+
blocking this field so that it helps with easier adoption for
94+
workload operators given this is a known issue we want to prevent.
95+
96+
### Non-Goals
97+
98+
* Removing `.Host` field from the API and dropping support (It is
99+
unsaid rule that nothing can get removed from core Kubernetes API)
100+
101+
## Proposal
102+
103+
There is a long term plan to deprecate the existing TCP and HTTP probe
104+
types in the API to replace them with ones with slightly different semantics.
105+
See [KEP-4559](https://github.com/kubernetes/enhancements/pull/4558) for more
106+
details. Given the unsolvable security problems with the Host field,
107+
we do not plan to offer it in the new types.
108+
109+
Meanwhile, the older API is never going to go away. So we also want to
110+
add PSA to allow admins to be able to restrict users from creating
111+
probes with the Host field set when using the (about to be deprecated) API.
112+
This is implemented by [kubernetes PR 125271](https://github.com/kubernetes/kubernetes/pull/125271)
113+
that does exactly that.
114+
115+
### Risks and Mitigations
116+
117+
There might be users who depend on the `Host` field in
118+
their existing probes which will continue to work and if
119+
newly created probes also need the `Host` field to point
120+
to an external destination then the admin can avoid enforcing
121+
the PSA to block it.
122+
123+
## Design Details
124+
125+
Add a Baseline APILevel Pod Security Admission policy to allow admins of the
126+
cluster to block users from setting `.host` field in:
127+
128+
* spec.containers[*].LivenessProbe.ProbeHandler.HTTPGet.Host
129+
* spec.containers[*].ReadinessProbe.ProbeHandler.HTTPGet.Host
130+
* spec.containers[*].StartupProbe.ProbeHandler.HTTPGet.Host
131+
* spec.containers[*].LivenessProbe.ProbeHandler.TCPSocket.Host
132+
* spec.containers[*].ReadinessProbe.ProbeHandler.TCPSocket.Host
133+
* spec.containers[*].StartupProbe.ProbeHandler.TCPSocket.Host
134+
* spec.containers[*].Lifecycle.PostStart.TCPSocket.Host // Deprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for backward compatibility.
135+
* spec.containers[*].Lifecycle.PreStop.TCPSocket.Host // Deprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for backward compatibility.
136+
* spec.containers[*].Lifecycle.PostStart.HTTPGet.Host
137+
* spec.containers[*].Lifecycle.PreStop.HTTPGet.Host
138+
* spec.initContainers[*].LivenessProbe.ProbeHandler.HTTPGet.Host
139+
* spec.initContainers[*].ReadinessProbe.ProbeHandler.HTTPGet.Host
140+
* spec.initContainers[*].StartupProbe.ProbeHandler.HTTPGet.Host
141+
* spec.initContainers[*].LivenessProbe.ProbeHandler.TCPSocket.Host
142+
* spec.initContainers[*].ReadinessProbe.ProbeHandler.TCPSocket.Host
143+
* spec.initContainers[*].StartupProbe.ProbeHandler.TCPSocket.Host
144+
* spec.initContainers[*].Lifecycle.PostStart.TCPSocket.Host // Deprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for backward compatibility.
145+
* spec.initContainers[*].Lifecycle.PreStop.TCPSocket.Host // Deprecated. TCPSocket is NOT supported as a LifecycleHandler and kept for backward compatibility.
146+
* spec.initContainers[*].Lifecycle.PostStart.HTTPGet.Host
147+
* spec.initContainers[*].Lifecycle.PreStop.HTTPGet.Host
148+
149+
### Test Plan
150+
151+
* Unit and E2E tests will be added to ensure the PSA works as expected
152+
153+
##### Prerequisite testing updates
154+
155+
None
156+
157+
##### Unit tests
158+
159+
Necessary unit tests will be added to the [PSA package] for
160+
testing the new code.
161+
Current test coverage status for the package is:
162+
- `k8s.io/pod-security-admission/policy`: `2025-05-06` - `89.9%`
163+
- `k8s.io/pod-security-admission/test`: `TBD` - `TBD`
164+
165+
[PSA package]: https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/pod-security-admission/policy
166+
167+
##### Integration tests
168+
169+
The following integration tests will be added to verify the PSA validation logic:
170+
171+
1. Test that pods with `.host` field set in probes are rejected when PSA is enabled with baseline level
172+
2. Test that pods without `.host` field set in probes are allowed when PSA is enabled with baseline level
173+
3. Test that existing pods with `.host` field set continue to work when PSA is enabled
174+
4. Test that pods with `.host` field set are allowed when PSA is disabled or using an older version
175+
176+
These tests will be added to:
177+
- `test/integration/auth/podsecurity_test.go`
178+
https://storage.googleapis.com/k8s-triage/index.html?test=TestPodSecurity
179+
180+
The integration tests will verify the PSA policy validation logic by:
181+
- Creating test cases for each probe type (HTTPGet, TCPSocket) in a pod
182+
- Testing each probe location (LivenessProbe, ReadinessProbe, StartupProbe, LifecycleHandler)
183+
- Verifying the PSA policy enforcement at the baseline level
184+
- Testing the behavior with different PSA configurations
185+
186+
##### e2e tests
187+
188+
There are no Pod Security specific E2E tests (we rely on integration test coverage instead),
189+
but the Pod Security admission controller is enabled in E2E clusters,
190+
and all E2E test namespaces are labeled with the enforcement label for Pod Security.
191+
192+
### Graduation Criteria
193+
194+
The PSA added will be done within a single release
195+
and given there will be no feature gates for that,
196+
there is no need for multi-release graduation criteria.
197+
All related code will land within the same single release
198+
199+
### Upgrade / Downgrade Strategy
200+
201+
Any older pods with this field set should not be affected
202+
with the above solution. Only newer pods getting created
203+
with the field will be alerted.
204+
205+
Users who are using this field can switch to using exec
206+
probes moving forward which should unblock them given exec
207+
probes can provide the same functionality.
208+
209+
210+
### Version Skew Strategy
211+
212+
N/A since its only within a single component: pod-security-admission
213+
and doesn't cross multiple components.
214+
215+
## Production Readiness Review Questionnaire
216+
217+
### Feature Enablement and Rollback
218+
219+
###### How can this feature be enabled / disabled in a live cluster?
220+
221+
We decided to not go with feature gates and use PSA versioning.
222+
So if the admin sets pod-security.kubernetes.io/enforce-version: v1.34
223+
along with pod-security.kubernetes.io/enforce: <LEVEL>
224+
on a namespace this feature will get enabled.
225+
226+
###### Does enabling the feature change any default behavior?
227+
228+
* There is no effect on clusters where PSA is not enabled OR an older
229+
PSA version is used.
230+
231+
* There is no effect on clusters where `.Host` probes are not used
232+
233+
* There is no effect on clusters where an older PSA versioning is being
234+
used
235+
236+
* If users create new pod with `.Host` probes field set and the admin
237+
has set baseline PSA level to `enforce` mode then the request will be
238+
actively blocked and rejected. Existing pods with `.Host` probes
239+
that are upgrading will not be impacted unless PSA level is set to
240+
`enforce` mode.
241+
242+
243+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
244+
245+
No
246+
247+
###### What happens if we reenable the feature if it was previously rolled back?
248+
249+
N/A
250+
251+
###### Are there any tests for feature enablement/disablement?
252+
253+
N/A since there is no feature gate
254+
255+
### Rollout, Upgrade and Rollback Planning
256+
257+
258+
###### How can a rollout or rollback fail? Can it impact already running workloads?
259+
260+
* Running workloads/deployments that have `.Host` probes set when upgraded to
261+
the latest version where they get rolled-out, if the PSA enforce label is
262+
placed on the namespace of the workload, then the workload will fail to get created.
263+
* If pod security label is not enabled on the namespace, then there is no
264+
impact on running workloads
265+
266+
###### What specific metrics should inform a rollback?
267+
268+
If your workloads are not rolling out due to the policy rejecting the request,
269+
then cluster admins can use the [PSA denial metrics]. Example, the `pod_security_evaluations_total`
270+
can indicate how many "deny" decisions were done based on number of policy evaluations that
271+
occurred.
272+
273+
[PSA denial metrics]: https://kubernetes.io/docs/concepts/security/pod-security-admission/#metrics
274+
275+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
276+
277+
N/A
278+
279+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
280+
281+
N/A
282+
283+
### Monitoring Requirements
284+
285+
N/A
286+
287+
###### How can an operator determine if the feature is in use by workloads?
288+
289+
If pods have probes with `.Host` field set and PSA label is set on that pod's namespace
290+
to a version where the new admission has been added, then it means the feature is enabled.
291+
292+
###### How can someone using this feature know that it is working for their instance?
293+
294+
Trying to create a pod with `.Host` field set in the probes will fail
295+
like this:
296+
```
297+
Error from server (Forbidden): error when creating "psa/fail-case-pod.yaml": pods "liveness-http-pass" is forbidden: violates PodSecurity "restricted:latest": probeHost (container "liveness" uses probeHost 135.45.63.4)
298+
```
299+
Trying to rollout a deployment with `.Host` field set in probes will fail with the following status:
300+
```
301+
- lastTransitionTime: "2025-06-17T06:17:36Z"
302+
lastUpdateTime: "2025-06-17T06:17:36Z"
303+
message: 'pods "hello-world-577c86d6dd-bs7nt" is forbidden: violates PodSecurity
304+
"restricted:latest": probeHost (container "hello-world" uses probeHost 135.45.63.4)'
305+
reason: FailedCreate
306+
status: "True"
307+
type: ReplicaFailure
308+
observedGeneration: 1
309+
unavailableReplicas: 1
310+
```
311+
312+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
313+
N/A
314+
315+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
316+
317+
cluster admins can use the [PSA denial metrics] to determine if something is
318+
wrong with their workloads and services are not serving properly due to policy
319+
enforcement.
320+
321+
[PSA denial metrics]: https://kubernetes.io/docs/concepts/security/pod-security-admission/#metrics
322+
323+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
324+
325+
N/A
326+
327+
### Dependencies
328+
329+
None
330+
331+
###### Does this feature depend on any specific services running in the cluster?
332+
333+
No
334+
335+
### Scalability
336+
337+
N/A
338+
339+
###### Will enabling / using this feature result in any new API calls?
340+
341+
No
342+
343+
###### Will enabling / using this feature result in introducing new API types?
344+
345+
No
346+
347+
###### Will enabling / using this feature result in any new calls to the cloud provider?
348+
349+
No
350+
351+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
352+
353+
No
354+
355+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
356+
357+
No
358+
359+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
360+
361+
No
362+
363+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
364+
365+
No
366+
367+
### Troubleshooting
368+
369+
###### How does this feature react if the API server and/or etcd is unavailable?
370+
371+
N/A
372+
373+
###### What are other known failure modes?
374+
375+
N/A
376+
377+
###### What steps should be taken if SLOs are not being met to determine the problem?
378+
379+
So if the admin sets `pod-security.kubernetes.io/enforce-version: v1.34`
380+
on a namespace this feature will get enabled and workloads rolling out
381+
with `.Host` probes set will be impacted. One of the remediation procedures to
382+
get workloads into a healthy state would be:
383+
384+
* To pin the the [PSA namespace label] to a version prior to the version where this
385+
field is introduced (example set it to v1.33)
386+
* Restart your workloads.
387+
388+
[PSA namespace label]: https://kubernetes.io/docs/concepts/security/pod-security-admission/#pod-security-admission-labels-for-namespaces
389+
390+
## Implementation History
391+
392+
## Drawbacks
393+
394+
N/A
395+
396+
## Alternatives
397+
398+
The alternative is to remove this field from the API after
399+
its deprecated, but that's not a supported API action.

0 commit comments

Comments
 (0)