|
| 1 | +# KEP-2527: Clarify if/how controllers can use status to track non-observable state |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | + - [Non-Goals](#non-goals) |
| 9 | +- [Proposal](#proposal) |
| 10 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 11 | +- [Design Details](#design-details) |
| 12 | + - [Part 1: Loosen and clarify <code>status</code>](#part-1-loosen-and-clarify-) |
| 13 | + - [Examples](#examples) |
| 14 | + - [Part 2: Clarify when it makes sense to use 2 objects](#part-2-clarify-when-it-makes-sense-to-use-2-objects) |
| 15 | + - [Test Plan](#test-plan) |
| 16 | + - [Graduation Criteria](#graduation-criteria) |
| 17 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 18 | + - [Version Skew Strategy](#version-skew-strategy) |
| 19 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 20 | +- [Implementation History](#implementation-history) |
| 21 | +- [Drawbacks](#drawbacks) |
| 22 | +- [Alternatives](#alternatives) |
| 23 | + - [Alternative 1: Add a new top-level stanza to spec/status resources](#alternative-1-add-a-new-top-level-stanza-to-specstatus-resources) |
| 24 | + - [Examples](#examples-1) |
| 25 | + - [Tradeoffs](#tradeoffs) |
| 26 | + - [Alternative 2: Sub-divide spec](#alternative-2-sub-divide-spec) |
| 27 | + - [Examples](#examples-2) |
| 28 | + - [Tradeoffs](#tradeoffs-1) |
| 29 | + - [Notes](#notes) |
| 30 | +<!-- /toc --> |
| 31 | + |
| 32 | +## Release Signoff Checklist |
| 33 | + |
| 34 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 35 | + |
| 36 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 37 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 38 | +- [ ] (R) Design details are appropriately documented |
| 39 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 40 | +- [ ] (R) Graduation criteria is in place |
| 41 | +- [ ] (R) Production readiness review completed |
| 42 | +- [ ] (R) Production readiness review approved |
| 43 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 44 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 45 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 46 | + |
| 47 | +[kubernetes.io]: https://kubernetes.io/ |
| 48 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 49 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 50 | +[kubernetes/website]: https://git.k8s.io/website |
| 51 | + |
| 52 | +## Summary |
| 53 | + |
| 54 | +Since basically the beginning of Kubernetes, we've had this sort of "litmus |
| 55 | +test" for status fields: "If I erased the whole status struct, would everything |
| 56 | +in it be reconstructed (or at least be reconstructible) from observation?". The |
| 57 | +goal was to ensure that the delineation between "what I asked for" and "what it |
| 58 | +actually is" was clear and to encourage active reconciliation of state. |
| 59 | + |
| 60 | +Another reason for this split was an idea which, as far as I know, has never |
| 61 | +been implemented by anyone: that an object's spec and status blocks could be |
| 62 | +stored in different etcd instances and the status could have a TTL. At this |
| 63 | +point in the project, I expect that wiping status out like this would end in |
| 64 | +fireworks, and not the fun kind. Status is, effectively, as durable as spec. |
| 65 | + |
| 66 | +Many of our APIs pass this test (sometimes we fudge it and say yes "in |
| 67 | +theory"), but not all of them do. This KEP proposes to clarify or remove this |
| 68 | +guidance, especially as it pertains to state that is not derived from |
| 69 | +observation. |
| 70 | + |
| 71 | +One of the emergent uses of the spec/status split is access control. It is |
| 72 | +assumed that, for most resources, users own (can write to) all of spec and |
| 73 | +controllers own all of status, and not the reverse. This allows patterns like |
| 74 | +Services which set `spec.type: LoadBalancer`, where the controller writes the |
| 75 | +LB's IP address to status, and kube-proxy can trust that IP address (because it |
| 76 | +came from a controller, not a user). Compare that with Services which use |
| 77 | +`spec.externalIPs`. The behavior in kube-proxy is roughly the same, but |
| 78 | +because non-trusted users can write to `spec.externalIPs` and that does not |
| 79 | +require a trusted controller to ACK, that behavior was declared a CVE. |
| 80 | + |
| 81 | +This KEP further proposes to add guidance for APIs that want to implement an |
| 82 | +"allocation" or "binding" pattern which requires trusted ACK. |
| 83 | + |
| 84 | +## Motivation |
| 85 | + |
| 86 | +As an API reviewer, I have seen many different patterns in use. I have shot |
| 87 | +down APIs that run afoul of the rebuild-from-observation test, and forced the |
| 88 | +implementations to be more complicated. I no longer find this to be useful to |
| 89 | +our APIs, and in fact I find it to be a detriment. |
| 90 | + |
| 91 | +I suspect that several APIs would have come out differently if not for this. |
| 92 | + |
| 93 | +### Goals |
| 94 | + |
| 95 | +* Clarify or remove the from-observation rule to better match reality. |
| 96 | +* Provide guidance for APIs on how to save controller results. |
| 97 | + |
| 98 | +### Non-Goals |
| 99 | + |
| 100 | +* Retroactively apply this pattern to change existing GA APIs. |
| 101 | +* Necessarily apply this pattern to change pre-GA APIs (though they may choose |
| 102 | + to follow it). |
| 103 | +* Provide arbitrary storage for controller state. |
| 104 | +* Provide distinct access control for subsets of spec or status. |
| 105 | + |
| 106 | +## Proposal |
| 107 | + |
| 108 | +This KEP proposes to: |
| 109 | + |
| 110 | +1) Get rid of the [if it got lost](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#spec-and-status) |
| 111 | +litmus test for status fields |
| 112 | +2) Acknowledge that spec/status are a useful and meaningful split for access control |
| 113 | +3) Document one or more patterns for APIs that need a trusted acknowledgement |
| 114 | + of the spec |
| 115 | + |
| 116 | +Depending on feedback, there may be other approaches to solving the same |
| 117 | +problems. |
| 118 | + |
| 119 | +### Risks and Mitigations |
| 120 | + |
| 121 | +The current guidance exists for a reason. It does encourage a sort of |
| 122 | +ideological cleanliness. However, it is not universally adhered to and it |
| 123 | +neglects the reality of these request/acknowledge APIs. On the whole, this KEP |
| 124 | +posits that the current guidance is a net negative. |
| 125 | + |
| 126 | +## Design Details |
| 127 | + |
| 128 | +### Part 1: Loosen and clarify `status` |
| 129 | + |
| 130 | +Remove the idea that status fields _must_ be from observation. Allow controllers |
| 131 | +to write values to status that represent allocations or acknowledged requests. |
| 132 | +Document that status fields are best when they represent observed state, but do |
| 133 | +not _require_ it. |
| 134 | + |
| 135 | +This has [already happened](https://github.com/kubernetes/enhancements/pull/2308/files#r567809465) |
| 136 | +at least once. |
| 137 | + |
| 138 | +Additionally, discuss the risks of using `status` for non-observation data - |
| 139 | +specifically how to think about write sequencing with regards to observers and |
| 140 | +how to think about idempotency. These problems are not new to this revised |
| 141 | +definition of status, but "status is from observation" tends to be more |
| 142 | +one-way, and relaxing that will likely lead to more variety of controller |
| 143 | +patterns. |
| 144 | + |
| 145 | +#### Examples |
| 146 | + |
| 147 | +Some of these are variants of existing APIs and some are hypothetical. |
| 148 | + |
| 149 | +1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The |
| 150 | +cloud controller sets `Service.status.loadBalancer.ingress.ip` in response. If |
| 151 | +the user sets `Service.spec.loadBalancerIP` to a specific value, the cloud |
| 152 | +controller will either successfully use that IP and and set it as before (ACK), |
| 153 | +or it will not set any value in `Service.status.loadBalancer.ingress` (NAK). |
| 154 | + |
| 155 | +2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]` |
| 156 | +to a new value. Kubelet sees this as a request and, if viable, sets |
| 157 | +`Pod.status.containerStatuses[0].resources.requests[cpu]` to the same value. |
| 158 | + |
| 159 | +3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`. |
| 160 | +The bucket controller verifies that the namespace is allowed to use the |
| 161 | +artifacts-prod bucket and, if so, sets `Bucket.status.instance` to the same |
| 162 | +value. |
| 163 | + |
| 164 | +### Part 2: Clarify when it makes sense to use 2 objects |
| 165 | + |
| 166 | +Offer API developers guidance on when it makes sense to use 2 objects instead |
| 167 | +of status. The exact wording is TBD, but it has to consider authorization |
| 168 | +granularity vs. many small objects, cardinality, cohesion, and extensibility. |
| 169 | + |
| 170 | +### Test Plan |
| 171 | + |
| 172 | +N/A (docs) |
| 173 | + |
| 174 | +### Graduation Criteria |
| 175 | + |
| 176 | +N/A (docs) |
| 177 | + |
| 178 | +### Upgrade / Downgrade Strategy |
| 179 | + |
| 180 | +N/A (docs) |
| 181 | + |
| 182 | +### Version Skew Strategy |
| 183 | + |
| 184 | +N/A (docs) |
| 185 | + |
| 186 | +## Production Readiness Review Questionnaire |
| 187 | + |
| 188 | +N/A (docs) |
| 189 | + |
| 190 | +## Implementation History |
| 191 | + |
| 192 | +Feb 22, 2021: First draft |
| 193 | +Jun 16, 2021: Incorporate feedback, pick a preferred model |
| 194 | + |
| 195 | +## Drawbacks |
| 196 | + |
| 197 | +There is some value in keeping `status` purely from-observation. Unfortunately |
| 198 | +that ideal does not seem to be surviving contact with the real world. |
| 199 | + |
| 200 | +Using `status` as an RBAC scope is coarse, and this makes that worse by |
| 201 | +allowing more uses of `status` (and thereby more controllers RBAC'ed to write |
| 202 | +to it). |
| 203 | + |
| 204 | +## Alternatives |
| 205 | + |
| 206 | +### Alternative 1: Add a new top-level stanza to spec/status resources |
| 207 | + |
| 208 | +Keep and strengthen the idea that status fields must be from observation. |
| 209 | +Segregate controller-owned fields to a new stanza, parallel to `spec` and |
| 210 | +`status`, which can be RBAC'ed explicitly. For the sake of this doc, let's |
| 211 | +call it `control`. |
| 212 | + |
| 213 | +To make this viable, we would need to add `control` as a "standard" subresource |
| 214 | +and apply similar rules to `spec` and `status` with regards to writes (can't |
| 215 | +write to `control` through the main resource, can't write to `spec` or `status` |
| 216 | +through the `control` subresource). We would also need to add it to CRDs as an |
| 217 | +available subresource. |
| 218 | + |
| 219 | +#### Examples |
| 220 | + |
| 221 | +1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The |
| 222 | +cloud controller sets `Service.control.loadBalancer.ingress.ip` in response. If |
| 223 | +the user sets `Service.spec.loadBalancerIP` to a specific value, the cloud |
| 224 | +controller will either successfully use that IP and and set it as before (ACK), |
| 225 | +or it will not set any value in `Service.control.loadBalancer.ingress` (NAK). |
| 226 | + |
| 227 | +2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]` |
| 228 | +to a new value. Kubelet sees this as a request and, if viable, sets |
| 229 | +`Pod.control.containerStatuses[0].resources.requests[cpu]` to the same value. |
| 230 | + |
| 231 | +3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`. |
| 232 | +The bucket controller verifies that the namespace is allowed to use the |
| 233 | +artifacts-prod bucket and, if so, sets `Bucket.control.instance` to the same |
| 234 | +value. |
| 235 | + |
| 236 | +#### Tradeoffs |
| 237 | + |
| 238 | +Pro: Clarifies the meaning of status. |
| 239 | + |
| 240 | +Pro: Possibly clarifies the roles acting on a resource. |
| 241 | + |
| 242 | +Con: Requires a lot of implementation and possibly steps on existing uses of the |
| 243 | +field name. |
| 244 | + |
| 245 | +Con: Net-new concept requires new documentation and socialization. |
| 246 | + |
| 247 | +Con: Incompatible with existing uses of status for this. |
| 248 | + |
| 249 | +### Alternative 2: Sub-divide spec |
| 250 | + |
| 251 | +Keep and strengthen the idea that status fields must be from observation. |
| 252 | +Segregate controller-owned fields to a sub-stanza of spec. Create a new |
| 253 | +access-control mechanism (or extend RBAC) to provide field-by-field access. |
| 254 | + |
| 255 | +#### Examples |
| 256 | + |
| 257 | +1) User creates a Service and sets `Service.spec.type = LoadBalancer`. The |
| 258 | +cloud controller sets `Service.spec.control.loadBalancer.ingress.ip` in |
| 259 | +response. If the user sets `Service.spec.loadBalancerIP` to a specific value, |
| 260 | +the cloud controller will either successfully use that IP and and set it as |
| 261 | +before (ACK), or it will not set any value in |
| 262 | +`Service.spec.control.loadBalancer.ingress` (NAK). |
| 263 | + |
| 264 | +2) Given a Pod, user patches `Pod.spec.containers[0].resources.requests[cpu]` |
| 265 | +to a new value. Kubelet sees this as a request and, if viable, sets |
| 266 | +`Pod.spec.control.containerStatuses[0].resources.requests[cpu]` to the same |
| 267 | +value. |
| 268 | + |
| 269 | +3) User creates a `Bucket`, setting `Bucket.spec.instance = artifacts-prod`. |
| 270 | +The bucket controller verifies that the namespace is allowed to use the |
| 271 | +artifacts-prod bucket and, if so, sets `Bucket.spec.control.instance` to the same |
| 272 | +value. |
| 273 | + |
| 274 | +#### Tradeoffs |
| 275 | + |
| 276 | +Pro: Retains purity of status. |
| 277 | + |
| 278 | +Con: Confuses the meaning of spec. |
| 279 | + |
| 280 | +Con: Can collide with existing uses of the field name. |
| 281 | + |
| 282 | +Con: Needs a whole new access model. |
| 283 | + |
| 284 | +Con: Likely needs a new subresource. |
| 285 | + |
| 286 | +#### Notes |
| 287 | + |
| 288 | +This model is included for completeness. I do not expect ANYONE to endorse it. |
0 commit comments