|
| 1 | +<!-- |
| 2 | +**Note:** When your KEP is complete, all of these comment blocks should be removed. |
| 3 | +
|
| 4 | +To get started with this template: |
| 5 | +
|
| 6 | +- [X] **Pick a hosting SIG.** |
| 7 | + Make sure that the problem space is something the SIG is interested in taking |
| 8 | + up. KEPs should not be checked in without a sponsoring SIG. |
| 9 | +- [X] **Create an issue in kubernetes/enhancements** |
| 10 | + When filing an enhancement tracking issue, please ensure to complete all |
| 11 | + fields in that template. One of the fields asks for a link to the KEP. You |
| 12 | + can leave that blank until this KEP is filed, and then go back to the |
| 13 | + enhancement and add the link. |
| 14 | +- [X] **Make a copy of this template directory.** |
| 15 | + Copy this template into the owning SIG's directory and name it |
| 16 | + `NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no |
| 17 | + leading-zero padding) assigned to your enhancement above. |
| 18 | +- [X] **Fill out as much of the kep.yaml file as you can.** |
| 19 | + At minimum, you should fill in the "title", "authors", "owning-sig", |
| 20 | + "status", and date-related fields. |
| 21 | +- [X] **Fill out this file as best you can.** |
| 22 | + At minimum, you should fill in the "Summary", and "Motivation" sections. |
| 23 | + These should be easy if you've preflighted the idea of the KEP with the |
| 24 | + appropriate SIG(s). |
| 25 | +- [X] **Create a PR for this KEP.** |
| 26 | + Assign it to people in the SIG that are sponsoring this process. |
| 27 | +- [ ] **Merge early and iterate.** |
| 28 | + Avoid getting hung up on specific details and instead aim to get the goals of |
| 29 | + the KEP clarified and merged quickly. The best way to do this is to just |
| 30 | + start with the high-level sections and fill out details incrementally in |
| 31 | + subsequent PRs. |
| 32 | +--> |
| 33 | +# KEP-NNNN: Standard Topology Labels |
| 34 | + |
| 35 | +<!-- toc --> |
| 36 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 37 | +- [Summary](#summary) |
| 38 | +- [Motivation](#motivation) |
| 39 | + - [Goals](#goals) |
| 40 | + - [Non-Goals](#non-goals) |
| 41 | +- [Proposal](#proposal) |
| 42 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 43 | +- [Design Details](#design-details) |
| 44 | + - [Reserve a label prefix](#reserve-a-label-prefix) |
| 45 | + - [Defining the meaning of existing labels](#defining-the-meaning-of-existing-labels) |
| 46 | + - [Defining a third key (or not)](#defining-a-third-key-or-not) |
| 47 | + - [Followup work (or optionally part of this)](#followup-work-or-optionally-part-of-this) |
| 48 | + - [Test Plan](#test-plan) |
| 49 | + - [Graduation Criteria](#graduation-criteria) |
| 50 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 51 | + - [Version Skew Strategy](#version-skew-strategy) |
| 52 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 53 | + - [Feature enablement and rollback](#feature-enablement-and-rollback) |
| 54 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 55 | + - [Monitoring requirements](#monitoring-requirements) |
| 56 | + - [Dependencies](#dependencies) |
| 57 | + - [Scalability](#scalability) |
| 58 | + - [Troubleshooting](#troubleshooting) |
| 59 | +- [Implementation History](#implementation-history) |
| 60 | +- [Drawbacks](#drawbacks) |
| 61 | +- [Alternatives](#alternatives) |
| 62 | +<!-- /toc --> |
| 63 | + |
| 64 | +## Release Signoff Checklist |
| 65 | + |
| 66 | +- [ ] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 67 | +- [ ] KEP approvers have approved the KEP status as `implementable` |
| 68 | +- [ ] Design details are appropriately documented |
| 69 | +- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input |
| 70 | +- [ ] Graduation criteria is in place |
| 71 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 72 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 73 | +- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 74 | + |
| 75 | +[kubernetes.io]: https://kubernetes.io/ |
| 76 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 77 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 78 | +[kubernetes/website]: https://git.k8s.io/website |
| 79 | + |
| 80 | +## Summary |
| 81 | + |
| 82 | +Kubernetes has always taken the position that "topology is arbitrary", and |
| 83 | +designs dealing with topology have had to take that into account. Even so, the |
| 84 | +project has two commonly assumed labels - `topology.kubernetes.io/region` and |
| 85 | +`topology.kubernetes.io/zone` - which are used in many components, generally |
| 86 | +hard-coded and not extensible. Those labels have relatively well understood |
| 87 | +meanings, and (so far) have been sufficient to represent what most people need. |
| 88 | + |
| 89 | +This KEP proposes to declare those labels, and possibly one more, as "standard" |
| 90 | +and give them more well-defined meanings and semantics. APIs that handle |
| 91 | +topology can still handle arbitrary topology keys, but these common ones may be |
| 92 | +handled automatically. |
| 93 | + |
| 94 | +## Motivation |
| 95 | + |
| 96 | +As we consider problems like cross-zone network traffic being a chargeable |
| 97 | +resource in most public clouds, we started to build an API for topology in |
| 98 | +Services. We tried to think through how that API would map to existing |
| 99 | +load-balancer implementations which may already understand topology, and we |
| 100 | +realized 3 things. |
| 101 | + |
| 102 | + 1) Cloud-ish load-balancers do not have arbitrary topology APIs and can't |
| 103 | + easily adapt to that. |
| 104 | + 2) Other systems have standardized on two or three levels of topology (e.g. the [Envoy locality API]). |
| 105 | + 3) Nobody is really complaining about this. |
| 106 | + |
| 107 | +In trying to simplify the way Service topology might work, we are proposing |
| 108 | +that standardizing on a small set of well-defined topology concepts will be a |
| 109 | +net win for the project at almost no cost to what users are actually doing with |
| 110 | +Kubernetes. |
| 111 | + |
| 112 | +[Envoy locality API]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/base.proto#envoy-v3-api-msg-config-core-v3-locality |
| 113 | + |
| 114 | +### Goals |
| 115 | + |
| 116 | +The goals of this KEP are to: |
| 117 | + * build consensus that the two topology lables that ALREADY EXIST in Kubernetes are enough for most users |
| 118 | + * determine whether a third level of topology is required or not |
| 119 | + * produce short, descriptive, canonical documentation for theses labels |
| 120 | + |
| 121 | +### Non-Goals |
| 122 | + |
| 123 | +This KEP does NOT seek to: |
| 124 | + * add new functionality that uses topology |
| 125 | + * change existing functionality that uses topology |
| 126 | + * solve the service topology problem |
| 127 | + |
| 128 | +## Proposal |
| 129 | + |
| 130 | +Kubernetes has always taken the position that "topology is arbitrary", and |
| 131 | +designs dealing with topology have had to take that into account. Even so, the |
| 132 | +project has two commonly assumed labels - `topology.kubernetes.io/region` and |
| 133 | +`topology.kubernetes.io/zone` - which are used in many components, generally |
| 134 | +hard-coded and not extensible. Those labels have relatively well understood |
| 135 | +meanings, and (so far) have been sufficient to represent what most people need. |
| 136 | + |
| 137 | +This KEP proposes to document those labels as "standard" and give them more |
| 138 | +rigorous definitions. This also proposes that we discuss and decide whether a |
| 139 | +third level of topology is needed and if so, define it in the same manner as |
| 140 | +the existing labels. |
| 141 | + |
| 142 | +The resulting definitions should be specific enough that users and implementors |
| 143 | +understand what they mean, but not so rigid that they can not map them to the |
| 144 | +nearest constructs available in most environments. |
| 145 | + |
| 146 | +### Risks and Mitigations |
| 147 | + |
| 148 | +The primary risks here are: |
| 149 | + |
| 150 | +1) That we define these too loosely, such that users can not derive sufficient |
| 151 | +value from their use. |
| 152 | + |
| 153 | +2) That we define these too specifically, such that implementors can not use |
| 154 | +them to represent natural concepts in their environments. |
| 155 | + |
| 156 | +3) That we define these in a way that is incompatible with the ways they are |
| 157 | +alredy being used. |
| 158 | + |
| 159 | +4) That we preclude or design-out other uses of topology that users are using |
| 160 | +today. |
| 161 | + |
| 162 | +## Design Details |
| 163 | + |
| 164 | +### Reserve a label prefix |
| 165 | + |
| 166 | +Label prefixes allow us to group labels on common origin and meaning. We |
| 167 | +propose to document somewhere (TBD) that the prefix "topology.kuberntes.io" is |
| 168 | +explicitly reserved for use in defining metadata about the physical or logical |
| 169 | +connectivity and grouping of Kubernetes nodes (and other things), and the |
| 170 | +associated behavioral and failure properties of those groups. |
| 171 | + |
| 172 | +This prefix is already in use. This KEP just aims to formalize it. |
| 173 | + |
| 174 | +### Defining the meaning of existing labels |
| 175 | + |
| 176 | +This KEP proposes to define the meaning and semantics of the following labels: |
| 177 | + |
| 178 | + * topology.kubernetes.io/region |
| 179 | + * topology.kubernetes.io/zone |
| 180 | + |
| 181 | +The exact wording is TBD, but it must be specific enough to be useful to users |
| 182 | +and loose enough to allow implementors sufficient freedom. |
| 183 | + |
| 184 | +This will also include defining that "region" and "zone" are strictly |
| 185 | +hierarchical ("zones" are subsets of "regions") and that zone names are unique |
| 186 | +across regions. For example AWS documents "us-east-1a" as a zone under region |
| 187 | +"us-east-1". |
| 188 | + |
| 189 | +This will also define that, while labels are generally mutable, the topology |
| 190 | +labels should be assumed immutable and that any changes to them may be ignored |
| 191 | +by downstream consumers of topology. |
| 192 | + |
| 193 | +<<[UNRESOLVED]>> |
| 194 | +Should we also try to standardize "kubernetes.io/hostname" as "topology.kubernetes.io/node" ? |
| 195 | +<<[/UNRESOLVED]>> |
| 196 | + |
| 197 | +### Defining a third key (or not) |
| 198 | + |
| 199 | +Some systems define topology in two levels (e.g. public clouds) and others use three |
| 200 | +levels (e.g. Envoy adds "sub-zone"). This KEP proposes that we standardize on |
| 201 | +two levels for now, while reserving the right to expand that to three (or more) |
| 202 | +if and when we have strong demand. |
| 203 | + |
| 204 | +### Followup work (or optionally part of this) |
| 205 | + |
| 206 | +For a Pod to know its own topology today, it must be authorized to look at |
| 207 | +Nodes. This is somewhat tedious, when we have downward-API support for labels |
| 208 | +already, and we know that these topology labels are not likely to change at |
| 209 | +run-time. |
| 210 | + |
| 211 | +If we standardize topology keys, it would be reasonable to copy those |
| 212 | +well-known keys from the Node to the Pod at startup, so Pods could extract that |
| 213 | +information without bouncing through a Node object. |
| 214 | + |
| 215 | +As long as "topology is arbitrary", we need more information about which keys to |
| 216 | +copy, which makes this feature request less feasible. |
| 217 | + |
| 218 | +### Test Plan |
| 219 | + |
| 220 | +NOT APPLICABLE. |
| 221 | + |
| 222 | +This KEP does not plan to change code, just documentation. |
| 223 | + |
| 224 | +### Graduation Criteria |
| 225 | + |
| 226 | +<!-- |
| 227 | +**Note:** *Not required until targeted at a release.* |
| 228 | +
|
| 229 | +Define graduation milestones. |
| 230 | +
|
| 231 | +These may be defined in terms of API maturity, or as something else. The KEP |
| 232 | +should keep this high-level with a focus on what signals will be looked at to |
| 233 | +determine graduation. |
| 234 | +
|
| 235 | +Consider the following in developing the graduation criteria for this enhancement: |
| 236 | +- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels] |
| 237 | +- [Deprecation policy][deprecation-policy] |
| 238 | +
|
| 239 | +Clearly define what graduation means by either linking to the [API doc |
| 240 | +definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning), |
| 241 | +or by redefining what graduation means. |
| 242 | +
|
| 243 | +In general, we try to use the same stages (alpha, beta, GA), regardless how the |
| 244 | +functionality is accessed. |
| 245 | +
|
| 246 | +[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions |
| 247 | +[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/ |
| 248 | +
|
| 249 | +Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels]. |
| 250 | +
|
| 251 | +#### Alpha -> Beta Graduation |
| 252 | +
|
| 253 | +- Gather feedback from developers and surveys |
| 254 | +- Complete features A, B, C |
| 255 | +- Tests are in Testgrid and linked in KEP |
| 256 | +
|
| 257 | +#### Beta -> GA Graduation |
| 258 | +
|
| 259 | +- N examples of real world usage |
| 260 | +- N installs |
| 261 | +- More rigorous forms of testing e.g., downgrade tests and scalability tests |
| 262 | +- Allowing time for feedback |
| 263 | +
|
| 264 | +**Note:** Generally we also wait at least 2 releases between beta and |
| 265 | +GA/stable, since there's no opportunity for user feedback, or even bug reports, |
| 266 | +in back-to-back releases. |
| 267 | +
|
| 268 | +#### Removing a deprecated flag |
| 269 | +
|
| 270 | +- Announce deprecation and support policy of the existing flag |
| 271 | +- Two versions passed since introducing the functionality which deprecates the flag (to address version skew) |
| 272 | +- Address feedback on usage/changed behavior, provided on GitHub issues |
| 273 | +- Deprecate the flag |
| 274 | +
|
| 275 | +**For non-optional features moving to GA, the graduation criteria must include [conformance tests].** |
| 276 | +
|
| 277 | +[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md |
| 278 | +--> |
| 279 | + |
| 280 | +### Upgrade / Downgrade Strategy |
| 281 | + |
| 282 | +NOT APPLICABLE. |
| 283 | + |
| 284 | +This KEP does not plan to change code, just documentation. |
| 285 | + |
| 286 | +### Version Skew Strategy |
| 287 | + |
| 288 | +NOT APPLICABLE. |
| 289 | + |
| 290 | +This KEP does not plan to change code, just documentation. |
| 291 | + |
| 292 | +## Production Readiness Review Questionnaire |
| 293 | + |
| 294 | +### Feature enablement and rollback |
| 295 | + |
| 296 | +NOT APPLICABLE. |
| 297 | + |
| 298 | +This KEP does not plan to change code, just documentation. |
| 299 | + |
| 300 | +### Rollout, Upgrade and Rollback Planning |
| 301 | + |
| 302 | +NOT APPLICABLE. |
| 303 | + |
| 304 | +This KEP does not plan to change code, just documentation. |
| 305 | + |
| 306 | +### Monitoring requirements |
| 307 | + |
| 308 | +NOT APPLICABLE. |
| 309 | + |
| 310 | +This KEP does not plan to change code, just documentation. |
| 311 | + |
| 312 | +### Dependencies |
| 313 | + |
| 314 | +NOT APPLICABLE. |
| 315 | + |
| 316 | +This KEP does not plan to change code, just documentation. |
| 317 | + |
| 318 | +### Scalability |
| 319 | + |
| 320 | +NOT APPLICABLE. |
| 321 | + |
| 322 | +This KEP does not plan to change code, just documentation. |
| 323 | + |
| 324 | +### Troubleshooting |
| 325 | + |
| 326 | +NOT APPLICABLE. |
| 327 | + |
| 328 | +This KEP does not plan to change code, just documentation. |
| 329 | +## Implementation History |
| 330 | + |
| 331 | +* 2020-03-31: First draft |
| 332 | + |
| 333 | +## Drawbacks |
| 334 | + |
| 335 | +Topology being arbitrary has a certain abstract elegance to it, and it forces |
| 336 | +consumers of topology to be flexible in their designs. Moving away from that |
| 337 | +brings risks of over-specifying and missing the mark for some users. |
| 338 | + |
| 339 | +## Alternatives |
| 340 | + |
| 341 | +The main alternative is status quo - topology is arbitrary. The main drivers |
| 342 | +for abandoning this are described above under "Motivation". |
0 commit comments