Skip to content

Commit 0d40180

Browse files
committed
Add KEP 1659 - standard topology labels
1 parent 689f7c6 commit 0d40180

File tree

2 files changed

+371
-0
lines changed

2 files changed

+371
-0
lines changed
Lines changed: 342 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,342 @@
1+
<!--
2+
**Note:** When your KEP is complete, all of these comment blocks should be removed.
3+
4+
To get started with this template:
5+
6+
- [X] **Pick a hosting SIG.**
7+
Make sure that the problem space is something the SIG is interested in taking
8+
up. KEPs should not be checked in without a sponsoring SIG.
9+
- [X] **Create an issue in kubernetes/enhancements**
10+
When filing an enhancement tracking issue, please ensure to complete all
11+
fields in that template. One of the fields asks for a link to the KEP. You
12+
can leave that blank until this KEP is filed, and then go back to the
13+
enhancement and add the link.
14+
- [X] **Make a copy of this template directory.**
15+
Copy this template into the owning SIG's directory and name it
16+
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
17+
leading-zero padding) assigned to your enhancement above.
18+
- [X] **Fill out as much of the kep.yaml file as you can.**
19+
At minimum, you should fill in the "title", "authors", "owning-sig",
20+
"status", and date-related fields.
21+
- [X] **Fill out this file as best you can.**
22+
At minimum, you should fill in the "Summary", and "Motivation" sections.
23+
These should be easy if you've preflighted the idea of the KEP with the
24+
appropriate SIG(s).
25+
- [X] **Create a PR for this KEP.**
26+
Assign it to people in the SIG that are sponsoring this process.
27+
- [ ] **Merge early and iterate.**
28+
Avoid getting hung up on specific details and instead aim to get the goals of
29+
the KEP clarified and merged quickly. The best way to do this is to just
30+
start with the high-level sections and fill out details incrementally in
31+
subsequent PRs.
32+
-->
33+
# KEP-NNNN: Standard Topology Labels
34+
35+
<!-- toc -->
36+
- [Release Signoff Checklist](#release-signoff-checklist)
37+
- [Summary](#summary)
38+
- [Motivation](#motivation)
39+
- [Goals](#goals)
40+
- [Non-Goals](#non-goals)
41+
- [Proposal](#proposal)
42+
- [Risks and Mitigations](#risks-and-mitigations)
43+
- [Design Details](#design-details)
44+
- [Reserve a label prefix](#reserve-a-label-prefix)
45+
- [Defining the meaning of existing labels](#defining-the-meaning-of-existing-labels)
46+
- [Defining a third key (or not)](#defining-a-third-key-or-not)
47+
- [Followup work (or optionally part of this)](#followup-work-or-optionally-part-of-this)
48+
- [Test Plan](#test-plan)
49+
- [Graduation Criteria](#graduation-criteria)
50+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
51+
- [Version Skew Strategy](#version-skew-strategy)
52+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
53+
- [Feature enablement and rollback](#feature-enablement-and-rollback)
54+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
55+
- [Monitoring requirements](#monitoring-requirements)
56+
- [Dependencies](#dependencies)
57+
- [Scalability](#scalability)
58+
- [Troubleshooting](#troubleshooting)
59+
- [Implementation History](#implementation-history)
60+
- [Drawbacks](#drawbacks)
61+
- [Alternatives](#alternatives)
62+
<!-- /toc -->
63+
64+
## Release Signoff Checklist
65+
66+
- [ ] Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
67+
- [ ] KEP approvers have approved the KEP status as `implementable`
68+
- [ ] Design details are appropriately documented
69+
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
70+
- [ ] Graduation criteria is in place
71+
- [ ] "Implementation History" section is up-to-date for milestone
72+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
73+
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
74+
75+
[kubernetes.io]: https://kubernetes.io/
76+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
77+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
78+
[kubernetes/website]: https://git.k8s.io/website
79+
80+
## Summary
81+
82+
Kubernetes has always taken the position that "topology is arbitrary", and
83+
designs dealing with topology have had to take that into account. Even so, the
84+
project has two commonly assumed labels - `topology.kubernetes.io/region` and
85+
`topology.kubernetes.io/zone` - which are used in many components, generally
86+
hard-coded and not extensible. Those labels have relatively well understood
87+
meanings, and (so far) have been sufficient to represent what most people need.
88+
89+
This KEP proposes to declare those labels, and possibly one more, as "standard"
90+
and give them more well-defined meanings and semantics. APIs that handle
91+
topology can still handle arbitrary topology keys, but these common ones may be
92+
handled automatically.
93+
94+
## Motivation
95+
96+
As we consider problems like cross-zone network traffic being a chargeable
97+
resource in most public clouds, we started to build an API for topology in
98+
Services. We tried to think through how that API would map to existing
99+
load-balancer implementations which may already understand topology, and we
100+
realized 3 things.
101+
102+
1) Cloud-ish load-balancers do not have arbitrary topology APIs and can't
103+
easily adapt to that.
104+
2) Other systems have standardized on two or three levels of topology (e.g. the [Envoy locality API]).
105+
3) Nobody is really complaining about this.
106+
107+
In trying to simplify the way Service topology might work, we are proposing
108+
that standardizing on a small set of well-defined topology concepts will be a
109+
net win for the project at almost no cost to what users are actually doing with
110+
Kubernetes.
111+
112+
[Envoy locality API]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/base.proto#envoy-v3-api-msg-config-core-v3-locality
113+
114+
### Goals
115+
116+
The goals of this KEP are to:
117+
* build consensus that the two topology lables that ALREADY EXIST in Kubernetes are enough for most users
118+
* determine whether a third level of topology is required or not
119+
* produce short, descriptive, canonical documentation for theses labels
120+
121+
### Non-Goals
122+
123+
This KEP does NOT seek to:
124+
* add new functionality that uses topology
125+
* change existing functionality that uses topology
126+
* solve the service topology problem
127+
128+
## Proposal
129+
130+
Kubernetes has always taken the position that "topology is arbitrary", and
131+
designs dealing with topology have had to take that into account. Even so, the
132+
project has two commonly assumed labels - `topology.kubernetes.io/region` and
133+
`topology.kubernetes.io/zone` - which are used in many components, generally
134+
hard-coded and not extensible. Those labels have relatively well understood
135+
meanings, and (so far) have been sufficient to represent what most people need.
136+
137+
This KEP proposes to document those labels as "standard" and give them more
138+
rigorous definitions. This also proposes that we discuss and decide whether a
139+
third level of topology is needed and if so, define it in the same manner as
140+
the existing labels.
141+
142+
The resulting definitions should be specific enough that users and implementors
143+
understand what they mean, but not so rigid that they can not map them to the
144+
nearest constructs available in most environments.
145+
146+
### Risks and Mitigations
147+
148+
The primary risks here are:
149+
150+
1) That we define these too loosely, such that users can not derive sufficient
151+
value from their use.
152+
153+
2) That we define these too specifically, such that implementors can not use
154+
them to represent natural concepts in their environments.
155+
156+
3) That we define these in a way that is incompatible with the ways they are
157+
alredy being used.
158+
159+
4) That we preclude or design-out other uses of topology that users are using
160+
today.
161+
162+
## Design Details
163+
164+
### Reserve a label prefix
165+
166+
Label prefixes allow us to group labels on common origin and meaning. We
167+
propose to document somewhere (TBD) that the prefix "topology.kuberntes.io" is
168+
explicitly reserved for use in defining metadata about the physical or logical
169+
connectivity and grouping of Kubernetes nodes (and other things), and the
170+
associated behavioral and failure properties of those groups.
171+
172+
This prefix is already in use. This KEP just aims to formalize it.
173+
174+
### Defining the meaning of existing labels
175+
176+
This KEP proposes to define the meaning and semantics of the following labels:
177+
178+
* topology.kubernetes.io/region
179+
* topology.kubernetes.io/zone
180+
181+
The exact wording is TBD, but it must be specific enough to be useful to users
182+
and loose enough to allow implementors sufficient freedom.
183+
184+
This will also include defining that "region" and "zone" are strictly
185+
hierarchical ("zones" are subsets of "regions") and that zone names are unique
186+
across regions. For example AWS documents "us-east-1a" as a zone under region
187+
"us-east-1".
188+
189+
This will also define that, while labels are generally mutable, the topology
190+
labels should be assumed immutable and that any changes to them may be ignored
191+
by downstream consumers of topology.
192+
193+
<<[UNRESOLVED]>>
194+
Should we also try to standardize "kubernetes.io/hostname" as "topology.kubernetes.io/node" ?
195+
<<[/UNRESOLVED]>>
196+
197+
### Defining a third key (or not)
198+
199+
Some systems define topology in two levels (e.g. public clouds) and others use three
200+
levels (e.g. Envoy adds "sub-zone"). This KEP proposes that we standardize on
201+
two levels for now, while reserving the right to expand that to three (or more)
202+
if and when we have strong demand.
203+
204+
### Followup work (or optionally part of this)
205+
206+
For a Pod to know its own topology today, it must be authorized to look at
207+
Nodes. This is somewhat tedious, when we have downward-API support for labels
208+
already, and we know that these topology labels are not likely to change at
209+
run-time.
210+
211+
If we standardize topology keys, it would be reasonable to copy those
212+
well-known keys from the Node to the Pod at startup, so Pods could extract that
213+
information without bouncing through a Node object.
214+
215+
As long as "topology is arbitrary", we need more information about which keys to
216+
copy, which makes this feature request less feasible.
217+
218+
### Test Plan
219+
220+
NOT APPLICABLE.
221+
222+
This KEP does not plan to change code, just documentation.
223+
224+
### Graduation Criteria
225+
226+
<!--
227+
**Note:** *Not required until targeted at a release.*
228+
229+
Define graduation milestones.
230+
231+
These may be defined in terms of API maturity, or as something else. The KEP
232+
should keep this high-level with a focus on what signals will be looked at to
233+
determine graduation.
234+
235+
Consider the following in developing the graduation criteria for this enhancement:
236+
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
237+
- [Deprecation policy][deprecation-policy]
238+
239+
Clearly define what graduation means by either linking to the [API doc
240+
definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning),
241+
or by redefining what graduation means.
242+
243+
In general, we try to use the same stages (alpha, beta, GA), regardless how the
244+
functionality is accessed.
245+
246+
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
247+
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
248+
249+
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
250+
251+
#### Alpha -> Beta Graduation
252+
253+
- Gather feedback from developers and surveys
254+
- Complete features A, B, C
255+
- Tests are in Testgrid and linked in KEP
256+
257+
#### Beta -> GA Graduation
258+
259+
- N examples of real world usage
260+
- N installs
261+
- More rigorous forms of testing e.g., downgrade tests and scalability tests
262+
- Allowing time for feedback
263+
264+
**Note:** Generally we also wait at least 2 releases between beta and
265+
GA/stable, since there's no opportunity for user feedback, or even bug reports,
266+
in back-to-back releases.
267+
268+
#### Removing a deprecated flag
269+
270+
- Announce deprecation and support policy of the existing flag
271+
- Two versions passed since introducing the functionality which deprecates the flag (to address version skew)
272+
- Address feedback on usage/changed behavior, provided on GitHub issues
273+
- Deprecate the flag
274+
275+
**For non-optional features moving to GA, the graduation criteria must include [conformance tests].**
276+
277+
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
278+
-->
279+
280+
### Upgrade / Downgrade Strategy
281+
282+
NOT APPLICABLE.
283+
284+
This KEP does not plan to change code, just documentation.
285+
286+
### Version Skew Strategy
287+
288+
NOT APPLICABLE.
289+
290+
This KEP does not plan to change code, just documentation.
291+
292+
## Production Readiness Review Questionnaire
293+
294+
### Feature enablement and rollback
295+
296+
NOT APPLICABLE.
297+
298+
This KEP does not plan to change code, just documentation.
299+
300+
### Rollout, Upgrade and Rollback Planning
301+
302+
NOT APPLICABLE.
303+
304+
This KEP does not plan to change code, just documentation.
305+
306+
### Monitoring requirements
307+
308+
NOT APPLICABLE.
309+
310+
This KEP does not plan to change code, just documentation.
311+
312+
### Dependencies
313+
314+
NOT APPLICABLE.
315+
316+
This KEP does not plan to change code, just documentation.
317+
318+
### Scalability
319+
320+
NOT APPLICABLE.
321+
322+
This KEP does not plan to change code, just documentation.
323+
324+
### Troubleshooting
325+
326+
NOT APPLICABLE.
327+
328+
This KEP does not plan to change code, just documentation.
329+
## Implementation History
330+
331+
* 2020-03-31: First draft
332+
333+
## Drawbacks
334+
335+
Topology being arbitrary has a certain abstract elegance to it, and it forces
336+
consumers of topology to be flexible in their designs. Moving away from that
337+
brings risks of over-specifying and missing the mark for some users.
338+
339+
## Alternatives
340+
341+
The main alternative is status quo - topology is arbitrary. The main drivers
342+
for abandoning this are described above under "Motivation".
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
title: Standard Topology Labels
2+
kep-number: 1659
3+
authors:
4+
- "@thockin"
5+
owning-sig: sig-architecture
6+
participating-sigs:
7+
- sig-cloud-provider
8+
- sig-network
9+
- sig-scheduling
10+
status: provisional
11+
creation-date: 2020-03-31
12+
reviewers:
13+
- "@andrewsykim"
14+
approvers:
15+
- "@ahg-g"
16+
see-also: []
17+
replaces: []
18+
19+
prr-approvers:
20+
- "@johnbelamaric"
21+
stage: stable
22+
latest-milestone: "v1.19"
23+
milestone:
24+
alpha: "v1.19"
25+
beta: "v1.19"
26+
stable: "v1.19"
27+
feature-gates: []
28+
disable-supported: false
29+
metrics: []

0 commit comments

Comments
 (0)