Skip to content

Commit 3d565d0

Browse files
committed
KEP 2200: Block service ExternalIPs via admission
This KEP proposes to add a surgical admission controller (optional) to block use of the `Service.spec.externalIPs` misfeature. See CVE-2020-8554: Man in the middle using LoadBalancer or ExternalIPs (https://www.first.org/cvss/calculator/3.0#CVSS:3.0/AV:N/AC:L/PR:L/UI:N/S:U/C:L/I:L/A:L) for details.
1 parent d9384f5 commit 3d565d0

File tree

3 files changed

+322
-0
lines changed

3 files changed

+322
-0
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2200
2+
stable:
3+
approver: "@johnbelamaric"
Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
# KEP-2200: Deny use of ExternalIPs via admission control
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [User Stories (Optional)](#user-stories-optional)
11+
- [Risks and Mitigations](#risks-and-mitigations)
12+
- [Design Details](#design-details)
13+
- [Test Plan](#test-plan)
14+
- [Graduation Criteria](#graduation-criteria)
15+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
16+
- [Version Skew Strategy](#version-skew-strategy)
17+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
18+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
19+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
20+
- [Monitoring Requirements](#monitoring-requirements)
21+
- [Dependencies](#dependencies)
22+
- [Scalability](#scalability)
23+
- [Troubleshooting](#troubleshooting)
24+
- [Implementation History](#implementation-history)
25+
- [Drawbacks](#drawbacks)
26+
- [Alternatives](#alternatives)
27+
<!-- /toc -->
28+
29+
## Release Signoff Checklist
30+
31+
Items marked with (R) are required *prior to targeting to a milestone / release*.
32+
33+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
34+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
35+
- [X] (R) Design details are appropriately documented
36+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
37+
- [X] (R) Graduation criteria is in place
38+
- [ ] (R) Production readiness review completed
39+
- [ ] Production readiness review approved
40+
- [ ] "Implementation History" section is up-to-date for milestone
41+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
42+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
43+
44+
[kubernetes.io]: https://kubernetes.io/
45+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
46+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
47+
[kubernetes/website]: https://git.k8s.io/website
48+
49+
## Summary
50+
51+
This proposal is in response to CVE-2020-8554: "Man in the middle using
52+
LoadBalancer or ExternalIPs".
53+
54+
Fundamentally the `Service.spec.externalIPs[]` feature is bad. It predates
55+
`Service.spec.type=LoadBalancer` and, now that we have that, has very few
56+
use-cases. In short an unprivileged user can hijack an IP address via a
57+
Service spec. In contrast, `type=LoadBalancer` uses Service status, which most
58+
normal users should not be allowed to write.
59+
60+
This KEP proposes to block the use of ExternalIPs via a built-in admission
61+
controller. The justification for this, as opposed to a webhook, is that 99%
62+
of users will never use this feature, and making them ALL run a webhook seems
63+
terrible.
64+
65+
## Motivation
66+
67+
https://github.com/kubernetes/kubernetes/issues/97110
68+
69+
### Goals
70+
71+
Make it possible to disable an insecure feature for the vast majority of users
72+
very quickly.
73+
74+
### Non-Goals
75+
76+
* Make this the default (breaking change)
77+
* Make the feature safe to use.
78+
79+
## Proposal
80+
81+
This KEP proposes to add a built-in admission controller
82+
"DenyServiceExternalIPs", which rejects any CREATE or UPDATE operation which
83+
adds a new value to `Service.spec.externalIPs`. Existing values will be
84+
tolerated and may be removed.
85+
86+
The number of rejected operations will be exposed by the standard admission
87+
metrics (`apiserver_admission_controller_admission_duration_seconds_bucket{name="DenyServiceExternalIPs",rejected="true", ...}`).
88+
89+
### User Stories (Optional)
90+
91+
Alice the admin does not want her users using this insecure feature. She
92+
enabled this admission controller and knows no user can use it. She can then
93+
audit existing users and make them stop.
94+
95+
### Risks and Mitigations
96+
97+
Some installations may want to use this feature in a more controlled way. They
98+
can use a custom webhook admission controller or a policy controller to enforce
99+
their own rules.
100+
101+
This is a precedent we should not set lightly. In this case the VAST majority
102+
of users do not need this feature and this proposal is very surgical in nature.
103+
As far as we know, there are few other unprivileged fields with this much
104+
power anywhere in our API, and most of those already have some form of controls
105+
on them.
106+
107+
## Design Details
108+
109+
One simple admission controller should be enough to disable this misfeature.
110+
Unfortunately it can not be on by default (that would be breaking).
111+
112+
This means that platform-providers may need to expose an option to control
113+
this. While we generally try to avoid mixing knobs that cluster-users would
114+
set with knobs that cluster-providers own, it seems reasonable to close this as
115+
soon as possible and consider better answers when we have more cases to
116+
generalize from. See "Alternatives" below for more.
117+
118+
See "Proposal" above.
119+
120+
### Test Plan
121+
122+
* Unit tests to ensure CREATE and UPDATE operations are rejected when adding
123+
new `externalIPs`.
124+
* Unit tests to ensure UPDATE operations allow existing `externalIPs`.
125+
126+
### Graduation Criteria
127+
128+
This feature will debut as "GA", bypassing alpha and beta. It's already opt-in
129+
and very small scope.
130+
131+
### Upgrade / Downgrade Strategy
132+
133+
Cluster upgrades/downgrades should not be an issue.
134+
135+
### Version Skew Strategy
136+
137+
N/A
138+
139+
## Production Readiness Review Questionnaire
140+
141+
### Feature Enablement and Rollback
142+
143+
* **How can this feature be enabled / disabled in a live cluster?**
144+
- [X] Other flag
145+
- Flag name: --enable-admission-plugins (existing)
146+
147+
* **Does enabling the feature change any default behavior?**
148+
Yes. The `externalIPs` field will not be allowed to mutate, except to remove
149+
existing values.
150+
151+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
152+
Yes.
153+
154+
* **What happens if we reenable the feature if it was previously rolled back?**
155+
No problem.
156+
157+
* **Are there any tests for feature enablement/disablement?**
158+
Unit tests should suffice.
159+
160+
### Rollout, Upgrade and Rollback Planning
161+
162+
* **How can a rollout fail? Can it impact already running workloads?**
163+
It could start disallowing all Service operations, if the controller was
164+
buggy.
165+
166+
* **What specific metrics should inform a rollback?**
167+
`apiserver_admission_controller_admission_duration_seconds_bucket{name="DenyServiceExternalIPs",rejected="true", ...}`
168+
169+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
170+
Manual testing:
171+
* Create a service "extip" with 2 `externalIPs` values
172+
* Upgrade to new apiserver and enable new admission controller
173+
* Try to create a new service using `externalIPs` -> fail
174+
* Try to change the "extip" service in an unrelated way -> OK
175+
* Try to change the value of one `externalIPs` value in extip -> fail
176+
* Try to remove the [0] value of `externalIPs` -> OK
177+
* Try to add the removed value back -> fail
178+
* Remove the last `externalIPs` value -> OK
179+
* Try to add the removed value back -> fail
180+
* Revert to "standard" apiserver
181+
* Try to add the removed value back -> OK
182+
183+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?**
184+
No.
185+
186+
### Monitoring Requirements
187+
188+
* **How can an operator determine if the feature is in use by workloads?**
189+
There are two possible facets of this: 1) Is the admission control enabled?
190+
and 2) Are any users using externalIPs?
191+
192+
To point 1, admins can look at their admission control config
193+
(--enable-admission-plugins) and look for `DenyServiceExternalIPs` in that
194+
list.
195+
196+
To point 2, admins can look at all services in the cluster for use of
197+
the `externalIPs` field. Via kubectl:
198+
199+
```
200+
kubectl get svc --all-namespaces -o go-template='
201+
{{- range .items -}}
202+
{{if .spec.externalIPs -}}
203+
{{.metadata.namespace}}/{{.metadata.name}}: {{.spec.externalIPs}}{{"\n"}}
204+
{{- end}}
205+
{{- end -}}
206+
'
207+
```
208+
209+
* **What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
210+
N/A
211+
212+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
213+
N/A
214+
215+
* **Are there any missing metrics that would be useful to have to improve observability of this feature?**
216+
This proposes to use the existing
217+
`apiserver_admission_controller_admission_duration_seconds_bucket{name="DenyServiceExternalIPs", ...}` metrics.
218+
219+
### Dependencies
220+
221+
* **Does this feature depend on any specific services running in the cluster?**
222+
No.
223+
224+
### Scalability
225+
226+
* **Will enabling / using this feature result in any new API calls?**
227+
No.
228+
229+
* **Will enabling / using this feature result in introducing new API types?**
230+
No.
231+
232+
* **Will enabling / using this feature result in any new calls to the cloud provider?**
233+
No.
234+
235+
* **Will enabling / using this feature result in increasing size or count of the existing API objects?**
236+
No.
237+
238+
* **Will enabling / using this feature result in increasing time taken by any operations covered by [existing SLIs/SLOs]?**
239+
No.
240+
241+
* **Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?**
242+
No.
243+
244+
### Troubleshooting
245+
246+
* **How does this feature react if the API server and/or etcd is unavailable?**
247+
It is part of apiserver REST path.
248+
249+
* **What are other known failure modes?**
250+
None.
251+
252+
* **What steps should be taken if SLOs are not being met to determine the problem?**
253+
N/A
254+
255+
## Implementation History
256+
257+
* 2020-12-07: First draft
258+
* 2021-01-04: Edits to PRR section.
259+
* 2021-01-15: Edits from feedback.
260+
261+
## Drawbacks
262+
263+
It is a slippery-slope to other ad hoc policies. Counter: this is very
264+
surgical and overwhelmingly not a useful feature.
265+
266+
Users who REALLY need this feature can enable it and apply whatever bespoke
267+
admission policies they need (or not).
268+
269+
## Alternatives
270+
271+
* Force users to use policy controllers as webhooks. Forever.
272+
* Make a breaking API change and disable or rip-out the feature.
273+
* Add a new flag telling validation logic to dissallow this field.
274+
* Make a more complex API to define which namespaces can use this feature
275+
and/or which IPs they can use.
276+
* Make a new API that allows cluster-users to enable this sort of field-block
277+
without changing admission-control flags on apiserver.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
title: Block ExternalIPs via Admission Control
2+
kep-number: 2200
3+
authors:
4+
- "@thockin"
5+
owning-sig: sig-network
6+
participating-sigs:
7+
- sig-auth
8+
- sig-security
9+
- sig-api-machinery
10+
status: implementable
11+
creation-date: 2020-12-07
12+
reviewers:
13+
- "@IanColdwater"
14+
- "@tabbysable"
15+
approvers:
16+
- "@tallclair"
17+
- "@lavalamp"
18+
prr-approvers:
19+
- "@johnbelamaric"
20+
see-also:
21+
- "https://github.com/kubernetes/kubernetes/issues/97110"
22+
23+
# The target maturity stage in the current dev cycle for this KEP.
24+
stage: stable
25+
26+
# The most recent milestone for which work toward delivery of this KEP has been
27+
# done. This can be the current (upcoming) milestone, if it is being actively
28+
# worked on.
29+
latest-milestone: "v1.21"
30+
31+
# The milestone at which this feature was, or is targeted to be, at each stage.
32+
milestone:
33+
stable: "v1.21"
34+
35+
# The following PRR answers are required at alpha release
36+
# List the feature gate name and the components for which it must be enabled
37+
feature-gates: []
38+
disable-supported: true
39+
40+
# The following PRR answers are required at beta release
41+
metrics:
42+
- apiserver_admission_controller_admission_duration_seconds_bucket{name="DenyServiceExternalIPs", ...}

0 commit comments

Comments
 (0)