Skip to content

Commit c149454

Browse files
authored
KEP 2527: Clarify meaning of status (kubernetes#2537)
Since basically the beginning of Kubernetes, we've had this sort of "litmus test" for status fields: "If I erased the whole status struct, would everything in it be reconstructed (or at least be reconstructible) from observation?". The goal was to ensure that the delineation between "what I asked for" and "what it actually is" was clear and to encourage active reconciliation of state. Many of our APIs pass this test (sometimes we fudge it and say yes "in theory"), but not all of them do. This KEP proposes to clarify or remove this guidance, especially as it pertains to state that is not derived from observation. One of the emergent uses of the spec/status split is access control. It is assumed that, for most resources, users own (can write to) all of spec and controllers own all of status, and not the reverse. This allows patterns like Services which set `spec.type: LoadBalancer`, where the controller writes the LB's IP address to status, and kube-proxy can trust that IP address (because it came from a controller, not a user). Compare that with Services which use `spec.externalIPs`. The behavior is kube-proxy is roughly the same, but because non-trusted users can write to `spec.externalIPs` and that does not require a trusted controller to ACK, that behavior was declared a CVE. This KEP further proposes to add guidance for APIs that want to implement an "allocation" or "binding" pattern which requires trusted ACK.
1 parent 5042518 commit c149454

File tree

2 files changed

+329
-0
lines changed

2 files changed

+329
-0
lines changed
Lines changed: 288 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,288 @@
1+
# KEP-2527: Clarify if/how controllers can use status to track non-observable state
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
9+
- [Proposal](#proposal)
10+
- [Risks and Mitigations](#risks-and-mitigations)
11+
- [Design Details](#design-details)
12+
- [Part 1: Loosen and clarify <code>status</code>](#part-1-loosen-and-clarify-)
13+
- [Examples](#examples)
14+
- [Part 2: Clarify when it makes sense to use 2 objects](#part-2-clarify-when-it-makes-sense-to-use-2-objects)
15+
- [Test Plan](#test-plan)
16+
- [Graduation Criteria](#graduation-criteria)
17+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
18+
- [Version Skew Strategy](#version-skew-strategy)
19+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
20+
- [Implementation History](#implementation-history)
21+
- [Drawbacks](#drawbacks)
22+
- [Alternatives](#alternatives)
23+
- [Alternative 1: Add a new top-level stanza to spec/status resources](#alternative-1-add-a-new-top-level-stanza-to-specstatus-resources)
24+
- [Examples](#examples-1)
25+
- [Tradeoffs](#tradeoffs)
26+
- [Alternative 2: Sub-divide spec](#alternative-2-sub-divide-spec)
27+
- [Examples](#examples-2)
28+
- [Tradeoffs](#tradeoffs-1)
29+
- [Notes](#notes)
30+
<!-- /toc -->
31+
32+
## Release Signoff Checklist
33+
34+
Items marked with (R) are required *prior to targeting to a milestone / release*.
35+
36+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
37+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
38+
- [ ] (R) Design details are appropriately documented
39+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
40+
- [ ] (R) Graduation criteria is in place
41+
- [ ] (R) Production readiness review completed
42+
- [ ] (R) Production readiness review approved
43+
- [ ] "Implementation History" section is up-to-date for milestone
44+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
45+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
46+
47+
[kubernetes.io]: https://kubernetes.io/
48+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
49+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
50+
[kubernetes/website]: https://git.k8s.io/website
51+
52+
## Summary
53+
54+
Since basically the beginning of Kubernetes, we've had this sort of "litmus
55+
test" for status fields: "If I erased the whole status struct, would everything
56+
in it be reconstructed (or at least be reconstructible) from observation?". The
57+
goal was to ensure that the delineation between "what I asked for" and "what it
58+
actually is" was clear and to encourage active reconciliation of state.
59+
60+
Another reason for this split was an idea which, as far as I know, has never
61+
been implemented by anyone: that an object's spec and status blocks could be
62+
stored in different etcd instances and the status could have a TTL. At this
63+
point in the project, I expect that wiping status out like this would end in
64+
fireworks, and not the fun kind. Status is, effectively, as durable as spec.
65+
66+
Many of our APIs pass this test (sometimes we fudge it and say yes "in
67+
theory"), but not all of them do. This KEP proposes to clarify or remove this
68+
guidance, especially as it pertains to state that is not derived from
69+
observation.
70+
71+
One of the emergent uses of the spec/status split is access control. It is
72+
assumed that, for most resources, users own (can write to) all of spec and
73+
controllers own all of status, and not the reverse. This allows patterns like
74+
Services which set `spec.type: LoadBalancer`, where the controller writes the
75+
LB's IP address to status, and kube-proxy can trust that IP address (because it
76+
came from a controller, not a user). Compare that with Services which use
77+
`spec.externalIPs`. The behavior in kube-proxy is roughly the same, but
78+
because non-trusted users can write to `spec.externalIPs` and that does not
79+
require a trusted controller to ACK, that behavior was declared a CVE.
80+
81+
This KEP further proposes to add guidance for APIs that want to implement an
82+
"allocation" or "binding" pattern which requires trusted ACK.
83+
84+
## Motivation
85+
86+
As an API reviewer, I have seen many different patterns in use. I have shot
87+
down APIs that run afoul of the rebuild-from-observation test, and forced the
88+
implementations to be more complicated. I no longer find this to be useful to
89+
our APIs, and in fact I find it to be a detriment.
90+
91+
I suspect that several APIs would have come out differently if not for this.
92+
93+
### Goals
94+
95+
* Clarify or remove the from-observation rule to better match reality.
96+
* Provide guidance for APIs on how to save controller results.
97+
98+
### Non-Goals
99+
100+
* Retroactively apply this pattern to change existing GA APIs.
101+
* Necessarily apply this pattern to change pre-GA APIs (though they may choose
102+
to follow it).
103+
* Provide arbitrary storage for controller state.
104+
* Provide distinct access control for subsets of spec or status.
105+
106+
## Proposal
107+
108+
This KEP proposes to:
109+
110+
1) Get rid of the [if it got lost](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status)
111+
litmus test for status fields
112+
2) Acknowledge that spec/status are a useful and meaningful split for access control
113+
3) Document one or more patterns for APIs that need a trusted acknowledgement
114+
of the spec
115+
116+
Depending on feedback, there may be other approaches to solving the same
117+
problems.
118+
119+
### Risks and Mitigations
120+
121+
The current guidance exists for a reason. It does encourage a sort of
122+
ideological cleanliness. However, it is not universally adhered to and it
123+
neglects the reality of these request/acknowledge APIs. On the whole, this KEP
124+
posits that the current guidance is a net negative.
125+
126+
## Design Details
127+
128+
### Part 1: Loosen and clarify `status`
129+
130+
Remove the idea that status fields _must_ be from observation. Allow controllers
131+
to write values to status that represent allocations or acknowledged requests.
132+
Document that status fields are best when they represent observed state, but do
133+
not _require_ it.
134+
135+
This has [already happened](https://github.com/kubernetes/enhancements/pull/2308/files#r567809465)
136+
at least once.
137+
138+
Additionally, discuss the risks of using `status` for non-observation data -
139+
specifically how to think about write sequencing with regards to observers and
140+
how to think about idempotency. These problems are not new to this revised
141+
definition of status, but "status is from observation" tends to be more
142+
one-way, and relaxing that will likely lead to more variety of controller
143+
patterns.
144+
145+
#### Examples
146+
147+
Some of these are variants of existing APIs and some are hypothetical.
148+
149+
1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The
150+
cloud controller sets `Service.status.loadBalancer.ingress.ip` in response. If
151+
the user sets `Service.spec.loadBalancerIP` to a specific value, the cloud
152+
controller will either successfully use that IP and and set it as before (ACK),
153+
or it will not set any value in `Service.status.loadBalancer.ingress` (NAK).
154+
155+
2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]`
156+
to a new value. Kubelet sees this as a request and, if viable, sets
157+
`Pod.status.containerStatuses[0].resources.requests[cpu]` to the same value.
158+
159+
3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`.
160+
The bucket controller verifies that the namespace is allowed to use the
161+
artifacts-prod bucket and, if so, sets `Bucket.status.instance` to the same
162+
value.
163+
164+
### Part 2: Clarify when it makes sense to use 2 objects
165+
166+
Offer API developers guidance on when it makes sense to use 2 objects instead
167+
of status. The exact wording is TBD, but it has to consider authorization
168+
granularity vs. many small objects, cardinality, cohesion, and extensibility.
169+
170+
### Test Plan
171+
172+
N/A (docs)
173+
174+
### Graduation Criteria
175+
176+
N/A (docs)
177+
178+
### Upgrade / Downgrade Strategy
179+
180+
N/A (docs)
181+
182+
### Version Skew Strategy
183+
184+
N/A (docs)
185+
186+
## Production Readiness Review Questionnaire
187+
188+
N/A (docs)
189+
190+
## Implementation History
191+
192+
Feb 22, 2021: First draft
193+
Jun 16, 2021: Incorporate feedback, pick a preferred model
194+
195+
## Drawbacks
196+
197+
There is some value in keeping `status` purely from-observation. Unfortunately
198+
that ideal does not seem to be surviving contact with the real world.
199+
200+
Using `status` as an RBAC scope is coarse, and this makes that worse by
201+
allowing more uses of `status` (and thereby more controllers RBAC'ed to write
202+
to it).
203+
204+
## Alternatives
205+
206+
### Alternative 1: Add a new top-level stanza to spec/status resources
207+
208+
Keep and strengthen the idea that status fields must be from observation.
209+
Segregate controller-owned fields to a new stanza, parallel to `spec` and
210+
`status`, which can be RBAC'ed explicitly. For the sake of this doc, let's
211+
call it `control`.
212+
213+
To make this viable, we would need to add `control` as a "standard" subresource
214+
and apply similar rules to `spec` and `status` with regards to writes (can't
215+
write to `control` through the main resource, can't write to `spec` or `status`
216+
through the `control` subresource). We would also need to add it to CRDs as an
217+
available subresource.
218+
219+
#### Examples
220+
221+
1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The
222+
cloud controller sets `Service.control.loadBalancer.ingress.ip` in response. If
223+
the user sets `Service.spec.loadBalancerIP` to a specific value, the cloud
224+
controller will either successfully use that IP and and set it as before (ACK),
225+
or it will not set any value in `Service.control.loadBalancer.ingress` (NAK).
226+
227+
2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]`
228+
to a new value. Kubelet sees this as a request and, if viable, sets
229+
`Pod.control.containerStatuses[0].resources.requests[cpu]` to the same value.
230+
231+
3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`.
232+
The bucket controller verifies that the namespace is allowed to use the
233+
artifacts-prod bucket and, if so, sets `Bucket.control.instance` to the same
234+
value.
235+
236+
#### Tradeoffs
237+
238+
Pro: Clarifies the meaning of status.
239+
240+
Pro: Possibly clarifies the roles acting on a resource.
241+
242+
Con: Requires a lot of implementation and possibly steps on existing uses of the
243+
field name.
244+
245+
Con: Net-new concept requires new documentation and socialization.
246+
247+
Con: Incompatible with existing uses of status for this.
248+
249+
### Alternative 2: Sub-divide spec
250+
251+
Keep and strengthen the idea that status fields must be from observation.
252+
Segregate controller-owned fields to a sub-stanza of spec. Create a new
253+
access-control mechanism (or extend RBAC) to provide field-by-field access.
254+
255+
#### Examples
256+
257+
1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The
258+
cloud controller sets `Service.spec.control.loadBalancer.ingress.ip` in
259+
response. If the user sets `Service.spec.loadBalancerIP` to a specific value,
260+
the cloud controller will either successfully use that IP and and set it as
261+
before (ACK), or it will not set any value in
262+
`Service.spec.control.loadBalancer.ingress` (NAK).
263+
264+
2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]`
265+
to a new value. Kubelet sees this as a request and, if viable, sets
266+
`Pod.spec.control.containerStatuses[0].resources.requests[cpu]` to the same
267+
value.
268+
269+
3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`.
270+
The bucket controller verifies that the namespace is allowed to use the
271+
artifacts-prod bucket and, if so, sets `Bucket.spec.control.instance` to the same
272+
value.
273+
274+
#### Tradeoffs
275+
276+
Pro: Retains purity of status.
277+
278+
Con: Confuses the meaning of spec.
279+
280+
Con: Can collide with existing uses of the field name.
281+
282+
Con: Needs a whole new access model.
283+
284+
Con: Likely needs a new subresource.
285+
286+
#### Notes
287+
288+
This model is included for completeness. I do not expect ANYONE to endorse it.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
title: Clarify if/how controllers can use status to track non-observable state
2+
kep-number: 2527
3+
authors:
4+
- "@thockin"
5+
owning-sig: sig-architecture
6+
participating-sigs: []
7+
status: provisional
8+
creation-date: 2021-02-19
9+
reviewers:
10+
- "@lavalamp"
11+
- "@deads2k"
12+
- "@smarterclayton"
13+
approvers:
14+
- "@liggitt"
15+
prr-approvers:
16+
- "@johnbelamaric"
17+
see-also:
18+
- https://groups.google.com/g/kubernetes-sig-architecture/c/d96tt2Mw69s
19+
replaces: []
20+
21+
# The target maturity stage in the current dev cycle for this KEP.
22+
stage: alpha #beta|stable
23+
24+
# The most recent milestone for which work toward delivery of this KEP has been
25+
# done. This can be the current (upcoming) milestone, if it is being actively
26+
# worked on.
27+
latest-milestone: "v1.21"
28+
29+
# The milestone at which this feature was, or is targeted to be, at each stage.
30+
milestone:
31+
alpha: "v1.22"
32+
beta: "v1.22"
33+
stable: "v1.22"
34+
35+
# The following PRR answers are required at alpha release
36+
# List the feature gate name and the components for which it must be enabled
37+
feature-gates: []
38+
disable-supported: false
39+
40+
# The following PRR answers are required at beta release
41+
metrics: []

0 commit comments

Comments
 (0)