Skip to content

Commit 41e5084

Browse files
committed
add kube-apiserver identity kep
1 parent 6928d9f commit 41e5084

File tree

2 files changed

+320
-0
lines changed

2 files changed

+320
-0
lines changed
Lines changed: 281 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,281 @@
1+
# KEP-1965: kube-apiserver identity
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Proposal](#proposal)
8+
- [Caveats](#caveats)
9+
- [Risks and Mitigations](#risks-and-mitigations)
10+
- [Design Details](#design-details)
11+
- [Test Plan](#test-plan)
12+
- [Graduation Criteria](#graduation-criteria)
13+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
14+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
15+
- [Version Skew Strategy](#version-skew-strategy)
16+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
17+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
18+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
19+
- [Monitoring Requirements](#monitoring-requirements)
20+
- [Dependencies](#dependencies)
21+
- [Scalability](#scalability)
22+
- [Implementation History](#implementation-history)
23+
- [Alternatives](#alternatives)
24+
- [Alternative 1: new API + storage TTL](#alternative-1-new-api--storage-ttl)
25+
- [Alternative 2: using storage interface directly](#alternative-2-using-storage-interface-directly)
26+
- [Alternative 3: storage interface + Lease API](#alternative-3-storage-interface--lease-api)
27+
- [Alternative 4: storage interface + new API](#alternative-4-storage-interface--new-api)
28+
<!-- /toc -->
29+
30+
## Release Signoff Checklist
31+
32+
Items marked with (R) are required *prior to targeting to a milestone / release*.
33+
34+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
35+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
36+
- [ ] (R) Design details are appropriately documented
37+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
38+
- [ ] (R) Graduation criteria is in place
39+
- [ ] (R) Production readiness review completed
40+
- [ ] Production readiness review approved
41+
- [ ] "Implementation History" section is up-to-date for milestone
42+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
43+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
44+
45+
[kubernetes.io]: https://kubernetes.io/
46+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
47+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
48+
[kubernetes/website]: https://git.k8s.io/website
49+
50+
## Summary
51+
52+
In a HA cluster, each kube-apiserver has an ID. Controllers have access to the
53+
list of IDs for living kube-apiservers in the cluster.
54+
55+
## Motivation
56+
57+
The [dynamic coordinated storage version API](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190802-dynamic-coordinated-storage-version.md#curating-a-list-of-participating-api-servers-in-ha-master)
58+
needs such a list to garbage collect stale records. The
59+
[API priority and fairness feature](https://github.com/kubernetes/kubernetes/pull/91389)
60+
needs a unique identifier for an apiserver reporting its concurrency limit.
61+
62+
Currently, such a list is already maintained in the “kubernetes” endpoints,
63+
where the kube-apiservers’ advertised IP addresses are the IDs. However it is
64+
not working in all flavors of Kubernetes deployments. For example, if there is a
65+
load balancer for the cluster, where the advertise IP address is set to the IP
66+
address of the load balancer, all three kube-apiservers will have the same
67+
advertise IP address.
68+
69+
## Proposal
70+
71+
We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID.
72+
73+
Similar to the [node heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md),
74+
a kube-apiserver will store its ID in a Lease object. All kube-apiserver Leases
75+
will be stored in a special namespace “kube-apiserver-lease”. A controller will
76+
garbage collect expired Leases.
77+
78+
### Caveats
79+
80+
In this proposal we focus on kube-apiservers. Aggregated apiservers don’t have
81+
the same problem, because their record is already exposed via the service. By
82+
listing the pods selected by the service, an aggregated server can learn the
83+
list of living servers with distinct podIPs. A server can get its own IDs via
84+
downward API.
85+
86+
We prefer false positives over false negatives, because false negatives are more
87+
harmful. In the storage version API scenario, if a kube-apiserver accidentally
88+
missed a heartbeat and got its Lease garbage collected, its StorageVersion can
89+
be falsely garbage collected as a consequence. In this case, the storage
90+
migrator won’t be able to migrate the storage, unless this kube-aipserver gets
91+
restarted and re-registers its StorageVersion. On the other hand, if a
92+
kube-apiserver is gone and its Lease still stays around for an hour or two, it
93+
will only delay the storage migration for the same period of time.
94+
95+
### Risks and Mitigations
96+
97+
A new namespace will be reserved for storing kube-apiserver identity Lease
98+
objects. There is a chance that existing clusters may already be using the
99+
namespace. We mitigate the risk by documenting the namespace in the release
100+
note and use a feature gate to disable the behavior in alpha release.
101+
102+
## Design Details
103+
104+
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md)
105+
logic [already written](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/nodelease)
106+
will be re-used. The heartbeat controller will be added to kube-apiserver in a
107+
post-start hook.
108+
109+
Each kube-apiserver will refresh its Lease every 10s by default. A GC controller
110+
will watch the Lease API using an informer, and periodically resync its local
111+
cache. On processing an item, the controller will delete the Lease if the last
112+
`renewTime` was more than `leaseDurationSeconds` ago (default to 1h). The
113+
default `leaseDurationSeconds` is chosen to be way longer than the default
114+
refresh period, to tolerate clock skew and/or accidental refresh failure. The
115+
default resync period is 1h. By default, assuming negligible clock skew, a Lease
116+
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
117+
hours.
118+
119+
The refresh rate, lease duration will be configurable through kube-apiserver
120+
flags. The resync period will be configurable through a kube-controller-manager
121+
flag.
122+
123+
### Test Plan
124+
125+
- integration test for creating the Namespace and the Lease on kube-apiserver
126+
startup
127+
- integration test for not creating the StorageVersions after creating the
128+
Lease
129+
- integration test for garbage collecting a Lease that isn't refreshed
130+
- integration test for not garbage collecting a Lease that is refreshed
131+
132+
### Graduation Criteria
133+
134+
Alpha should provide basic functionality covered with tests described above.
135+
136+
#### Alpha -> Beta Graduation
137+
138+
- Appropriate metrics are agreed on and implemented
139+
140+
#### Beta -> GA Graduation
141+
142+
- Conformance tests are agreed on and implemented
143+
144+
**For non-optional features moving to GA, the graduation criteria must include
145+
[conformance tests].**
146+
147+
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
148+
149+
### Version Skew Strategy
150+
151+
- This feature is proposed for the control plane internal use. Master-node skew is
152+
not considered.
153+
- During a rolling update, an HA cluster may have old and new masters. Old masters
154+
won't create Leases, nor garbage collect Leases.
155+
156+
## Production Readiness Review Questionnaire
157+
158+
### Feature Enablement and Rollback
159+
160+
* **How can this feature be enabled / disabled in a live cluster?**
161+
- [x] Feature gate (also fill in values in `kep.yaml`)
162+
- Feature gate name: APIServerIdentity
163+
- Components depending on the feature gate: kube-apiserver
164+
165+
* **Does enabling the feature change any default behavior?**
166+
A namespace "kube-apiserver-lease" will be used to store kube-apiserver
167+
identity Leases.
168+
169+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
170+
the enablement)?**
171+
Yes. Stale Lease objects will remain stale (`renewTime` won't get updated)
172+
173+
* **What happens if we reenable the feature if it was previously rolled back?**
174+
Stale Lease objects will be garbage collected.
175+
176+
### Rollout, Upgrade and Rollback Planning
177+
178+
_This section must be completed when targeting beta graduation to a release._
179+
180+
### Monitoring Requirements
181+
182+
_This section must be completed when targeting beta graduation to a release._
183+
184+
### Dependencies
185+
186+
_This section must be completed when targeting beta graduation to a release._
187+
188+
### Scalability
189+
190+
* **Will enabling / using this feature result in any new API calls?**
191+
Describe them, providing:
192+
- API call type (e.g. PATCH pods): UPDATE leases
193+
- estimated throughput:
194+
- originating component(s) (e.g. Kubelet, Feature-X-controller):
195+
kube-apiserver
196+
197+
focusing mostly on:
198+
- components listing and/or watching resources they didn't before:
199+
kube-controller-manager
200+
- periodic API calls to reconcile state (e.g. periodic fetching state,
201+
heartbeats, leader election, etc.): kube-apiserver heartbeat every 10s
202+
203+
* **Will enabling / using this feature result in increasing size or count of
204+
the existing API objects?**
205+
Describe them, providing:
206+
- API type(s): leases
207+
- Estimated amount of new objects: one per living kube-apiserver
208+
209+
* **Will enabling / using this feature result in increasing time taken by any
210+
operations covered by [existing SLIs/SLOs]?**
211+
No.
212+
213+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
214+
215+
## Implementation History
216+
217+
## Alternatives
218+
219+
### Alternative 1: new API + storage TTL
220+
221+
We define a new API for kube-apiserver identity. Similar to [Event](https://github.com/kubernetes/kubernetes/blob/9062c43b76c8562062e454a190a948f1370f8eb3/pkg/registry/core/rest/storage_core.go#L128),
222+
we make the storage path for the new object type [tack on the TTL](https://github.com/kubernetes/kubernetes/blob/9062c43b76c8562062e454a190a948f1370f8eb3/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L1173).
223+
Etcd will delete objects who don’t get their TTL refreshed in time.
224+
225+
- Pros:
226+
- We don’t need to write a controller to garbage collect expired records, nor
227+
worry about client-server clock skew.
228+
- We can extend the API in future to include more information (e.g. version,
229+
feature, config)
230+
- Cons:
231+
- We need a new dedicated API
232+
233+
Note that the proposed solution doesn't prevent us from switching to a new API
234+
in future. Similar to node heartbeats switched from node status to leases.
235+
236+
### Alternative 2: using storage interface directly
237+
238+
The existing “kubernetes” Endpoints [mechanism](https://github.com/kubernetes/community/pull/939)
239+
can be inherited to solve the kube-apiserver identity problem. There are two
240+
parts of the mechanism:
241+
1. Each kube-apiserver periodically writes a lease of its ID (address) with a
242+
TTL to etcd through the storage interface. The lease object itself is an
243+
Endpoints. Leases will be deleted by etcd for servers who fail to refresh the
244+
TTL in time.
245+
2. A controller reads the leases through the storage interface, to collect the
246+
list of IP addresses. The controller updates the “kubernetes” Endpoints to
247+
match the IP address list.
248+
249+
We inherit the first part of the existing mechanism (the etcd TTL lease), but
250+
change the key and value. The key will be the new ID. All the keys will be
251+
stored under a special prefix “/apiserverleases/” (similar to the [existing mechanism](https://github.com/kubernetes/kubernetes/blob/14a11060a0775ed609f0810898ebdbe737c59441/pkg/master/master.go#L265)).
252+
The value will be a Lease object. A kube-apiserver obtains the list of IDs by
253+
directly listing/watching the leases through the storage interface.
254+
255+
- Cons:
256+
- We depend on a side-channel API, which is against Kubernetes philosophy
257+
- Clients like the kube-controller-manager cannot access the storage
258+
interface. For the storage version API, if we put the garbage collector in
259+
kube-apiserver instead of kube-controller-manager, the lack of leader
260+
election may cause update conflicts.
261+
262+
### Alternative 3: storage interface + Lease API
263+
264+
The kube-apiservers still write the master leases to etcd, but a controller will
265+
watch the master leases and update an existing public API (e.g. store it in a
266+
defined way in a Lease). Note that we cannot use the endpoints API like the
267+
“kubernetes” endpoints, because the endpoints API is designed to store a list of
268+
addresses, but our IDs are not IP addresses.
269+
270+
- Cons:
271+
- We depend on a side-channel API, which is against Kubernetes philosophy
272+
273+
### Alternative 4: storage interface + new API
274+
275+
Similar to Alternative 1, the kube-apiservers write the master leases to etcd,
276+
and a controller watches the master leases, but updates a new public API
277+
specifically designed to host information about the API servers, including its
278+
ID, enabled feature gates, etc.
279+
280+
- Cons:
281+
- We depend on a side-channel API, which is against Kubernetes philosophy
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
title: kube-apiserver identity
2+
kep-number: 1965
3+
authors:
4+
- "@roycaihw"
5+
owning-sig: sig-api-machinery
6+
status: provisional
7+
creation-date: 2020-09-02
8+
reviewers:
9+
- "@caesarxuchao"
10+
- "@lavalamp"
11+
- "@MikeSpreitzer"
12+
- "@deads2k"
13+
approvers:
14+
- "@lavalamp"
15+
- "@deads2k"
16+
see-also:
17+
- "https://docs.google.com/document/d/1ed7miqlFY7-9lZxE7gzoyx_MFQCtFEDqtcKMpaAmHys/edit?usp=sharing"
18+
19+
# The target maturity stage in the current dev cycle for this KEP.
20+
stage: alpha
21+
22+
# The most recent milestone for which work toward delivery of this KEP has been
23+
# done. This can be the current (upcoming) milestone, if it is being actively
24+
# worked on.
25+
latest-milestone: "v1.20"
26+
27+
# The milestone at which this feature was, or is targeted to be, at each stage.
28+
milestone:
29+
alpha: "v1.20"
30+
beta: "v1.21"
31+
stable: "v1.22"
32+
33+
# The following PRR answers are required at alpha release
34+
# List the feature gate name and the components for which it must be enabled
35+
feature-gates:
36+
- name: APIServerIdentity
37+
components:
38+
- kube-apiserver
39+
disable-supported: true

0 commit comments

Comments
 (0)