Skip to content

Commit 37b65e1

Browse files
authored
Merge pull request kubernetes#1966 from roycaihw/kube-apiserver-identity
Adding KEP for kube-apiserver identity
2 parents c7d895d + 0ab9e87 commit 37b65e1

File tree

2 files changed

+317
-0
lines changed

2 files changed

+317
-0
lines changed
Lines changed: 278 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,278 @@
1+
# KEP-1965: kube-apiserver identity
2+
3+
<!-- toc -->
4+
- [Release Signoff Checklist](#release-signoff-checklist)
5+
- [Summary](#summary)
6+
- [Motivation](#motivation)
7+
- [Proposal](#proposal)
8+
- [Caveats](#caveats)
9+
- [Design Details](#design-details)
10+
- [Test Plan](#test-plan)
11+
- [Graduation Criteria](#graduation-criteria)
12+
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
13+
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
14+
- [Version Skew Strategy](#version-skew-strategy)
15+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
16+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
17+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
18+
- [Monitoring Requirements](#monitoring-requirements)
19+
- [Dependencies](#dependencies)
20+
- [Scalability](#scalability)
21+
- [Implementation History](#implementation-history)
22+
- [Alternatives](#alternatives)
23+
- [Alternative 1: new API + storage TTL](#alternative-1-new-api--storage-ttl)
24+
- [Alternative 2: using storage interface directly](#alternative-2-using-storage-interface-directly)
25+
- [Alternative 3: storage interface + Lease API](#alternative-3-storage-interface--lease-api)
26+
- [Alternative 4: storage interface + new API](#alternative-4-storage-interface--new-api)
27+
<!-- /toc -->
28+
29+
## Release Signoff Checklist
30+
31+
Items marked with (R) are required *prior to targeting to a milestone / release*.
32+
33+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
34+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
35+
- [ ] (R) Design details are appropriately documented
36+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
37+
- [ ] (R) Graduation criteria is in place
38+
- [ ] (R) Production readiness review completed
39+
- [ ] Production readiness review approved
40+
- [ ] "Implementation History" section is up-to-date for milestone
41+
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
42+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
43+
44+
[kubernetes.io]: https://kubernetes.io/
45+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
46+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
47+
[kubernetes/website]: https://git.k8s.io/website
48+
49+
## Summary
50+
51+
In a HA cluster, each kube-apiserver has an ID. Controllers have access to the
52+
list of IDs for living kube-apiservers in the cluster.
53+
54+
## Motivation
55+
56+
The [dynamic coordinated storage version API](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190802-dynamic-coordinated-storage-version.md#curating-a-list-of-participating-api-servers-in-ha-master)
57+
needs such a list to garbage collect stale records. The
58+
[API priority and fairness feature](https://github.com/kubernetes/kubernetes/pull/91389)
59+
needs a unique identifier for an apiserver reporting its concurrency limit.
60+
61+
Currently, such a list is already maintained in the “kubernetes” endpoints,
62+
where the kube-apiservers’ advertised IP addresses are the IDs. However it is
63+
not working in all flavors of Kubernetes deployments. For example, if there is a
64+
load balancer for the cluster, where the advertise IP address is set to the IP
65+
address of the load balancer, all three kube-apiservers will have the same
66+
advertise IP address.
67+
68+
## Proposal
69+
70+
We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID.
71+
72+
Similar to the [node heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md),
73+
a kube-apiserver will store its ID in a Lease object. All kube-apiserver Leases
74+
will be stored in a special namespace “kube-apiserver-lease”. A controller will
75+
garbage collect expired Leases.
76+
77+
### Caveats
78+
79+
In this proposal we focus on kube-apiservers. Aggregated apiservers don’t have
80+
the same problem, because their record is already exposed via the service. By
81+
listing the pods selected by the service, an aggregated server can learn the
82+
list of living servers with distinct podIPs. A server can get its own IDs via
83+
downward API.
84+
85+
We prefer that expired Leases remain for a longer duration as opposed to
86+
collecting them quickly, because in the latter case, if a Lease is falsely
87+
collected by accident, it can do more damage than the former case. Take the
88+
storage version API scenario as an example, if a kube-apiserver accidentally
89+
missed a heartbeat and got its Lease garbage collected, its StorageVersion can
90+
be falsely garbage collected as a consequence. In this case, the storage
91+
migrator won’t be able to migrate the storage, unless this kube-aipserver gets
92+
restarted and re-registers its StorageVersion. On the other hand, if a
93+
kube-apiserver is gone and its Lease still stays around for an hour or two, it
94+
will only delay the storage migration for the same period of time.
95+
96+
## Design Details
97+
98+
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md)
99+
logic [already written](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/nodelease)
100+
will be re-used. The heartbeat controller will be added to kube-apiserver in a
101+
post-start hook.
102+
103+
Each kube-apiserver will refresh its Lease every 10s by default. A GC controller
104+
will watch the Lease API using an informer, and periodically resync its local
105+
cache. On processing an item, the controller will delete the Lease if the last
106+
`renewTime` was more than `leaseDurationSeconds` ago (default to 1h). The
107+
default `leaseDurationSeconds` is chosen to be way longer than the default
108+
refresh period, to tolerate clock skew and/or accidental refresh failure. The
109+
default resync period is 1h. By default, assuming negligible clock skew, a Lease
110+
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
111+
hours. The GC controller will run in kube-controller-manager, to leverage leader
112+
election and reduce conflicts.
113+
114+
The refresh rate, lease duration will be configurable through kube-apiserver
115+
flags. The resync period will be configurable through a kube-controller-manager
116+
flag.
117+
118+
### Test Plan
119+
120+
- integration test for creating the Namespace and the Lease on kube-apiserver
121+
startup
122+
- integration test for not creating the StorageVersions after creating the
123+
Lease
124+
- integration test for garbage collecting a Lease that isn't refreshed
125+
- integration test for not garbage collecting a Lease that is refreshed
126+
127+
### Graduation Criteria
128+
129+
Alpha should provide basic functionality covered with tests described above.
130+
131+
#### Alpha -> Beta Graduation
132+
133+
- Appropriate metrics are agreed on and implemented
134+
- An e2e test plan is agreed and implemented (e.g. chaosmonkey in a regional
135+
cluster)
136+
137+
#### Beta -> GA Graduation
138+
139+
- Conformance tests are agreed on and implemented
140+
141+
**For non-optional features moving to GA, the graduation criteria must include
142+
[conformance tests].**
143+
144+
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
145+
146+
### Version Skew Strategy
147+
148+
- This feature is proposed for the control plane internal use. Master-node skew is
149+
not considered.
150+
- During a rolling update, an HA cluster may have old and new masters. Old masters
151+
won't create Leases, nor garbage collect Leases.
152+
153+
## Production Readiness Review Questionnaire
154+
155+
### Feature Enablement and Rollback
156+
157+
* **How can this feature be enabled / disabled in a live cluster?**
158+
- [x] Feature gate (also fill in values in `kep.yaml`)
159+
- Feature gate name: APIServerIdentity
160+
- Components depending on the feature gate: kube-apiserver
161+
162+
* **Does enabling the feature change any default behavior?**
163+
A namespace "kube-apiserver-lease" will be used to store kube-apiserver
164+
identity Leases.
165+
166+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
167+
the enablement)?**
168+
Yes. Stale Lease objects will remain stale (`renewTime` won't get updated)
169+
170+
* **What happens if we reenable the feature if it was previously rolled back?**
171+
Stale Lease objects will be garbage collected.
172+
173+
### Rollout, Upgrade and Rollback Planning
174+
175+
_This section must be completed when targeting beta graduation to a release._
176+
177+
### Monitoring Requirements
178+
179+
_This section must be completed when targeting beta graduation to a release._
180+
181+
### Dependencies
182+
183+
_This section must be completed when targeting beta graduation to a release._
184+
185+
### Scalability
186+
187+
* **Will enabling / using this feature result in any new API calls?**
188+
Describe them, providing:
189+
- API call type (e.g. PATCH pods): UPDATE leases
190+
- estimated throughput:
191+
- originating component(s) (e.g. Kubelet, Feature-X-controller):
192+
kube-apiserver
193+
194+
focusing mostly on:
195+
- components listing and/or watching resources they didn't before:
196+
kube-controller-manager
197+
- periodic API calls to reconcile state (e.g. periodic fetching state,
198+
heartbeats, leader election, etc.): kube-apiserver heartbeat every 10s
199+
200+
* **Will enabling / using this feature result in increasing size or count of
201+
the existing API objects?**
202+
Describe them, providing:
203+
- API type(s): leases
204+
- Estimated amount of new objects: one per living kube-apiserver
205+
206+
* **Will enabling / using this feature result in increasing time taken by any
207+
operations covered by [existing SLIs/SLOs]?**
208+
No.
209+
210+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
211+
212+
## Implementation History
213+
214+
## Alternatives
215+
216+
### Alternative 1: new API + storage TTL
217+
218+
We define a new API for kube-apiserver identity. Similar to [Event](https://github.com/kubernetes/kubernetes/blob/9062c43b76c8562062e454a190a948f1370f8eb3/pkg/registry/core/rest/storage_core.go#L128),
219+
we make the storage path for the new object type [tack on the TTL](https://github.com/kubernetes/kubernetes/blob/9062c43b76c8562062e454a190a948f1370f8eb3/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L1173).
220+
Etcd will delete objects who don’t get their TTL refreshed in time.
221+
222+
- Pros:
223+
- We don’t need to write a controller to garbage collect expired records, nor
224+
worry about client-server clock skew.
225+
- We can extend the API in future to include more information (e.g. version,
226+
feature, config)
227+
- Cons:
228+
- We need a new dedicated API
229+
230+
Note that the proposed solution doesn't prevent us from switching to a new API
231+
in future. Similar to node heartbeats switched from node status to leases.
232+
233+
### Alternative 2: using storage interface directly
234+
235+
The existing “kubernetes” Endpoints [mechanism](https://github.com/kubernetes/community/pull/939)
236+
can be inherited to solve the kube-apiserver identity problem. There are two
237+
parts of the mechanism:
238+
1. Each kube-apiserver periodically writes a lease of its ID (address) with a
239+
TTL to etcd through the storage interface. The lease object itself is an
240+
Endpoints. Leases will be deleted by etcd for servers who fail to refresh the
241+
TTL in time.
242+
2. A controller reads the leases through the storage interface, to collect the
243+
list of IP addresses. The controller updates the “kubernetes” Endpoints to
244+
match the IP address list.
245+
246+
We inherit the first part of the existing mechanism (the etcd TTL lease), but
247+
change the key and value. The key will be the new ID. All the keys will be
248+
stored under a special prefix “/apiserverleases/” (similar to the [existing mechanism](https://github.com/kubernetes/kubernetes/blob/14a11060a0775ed609f0810898ebdbe737c59441/pkg/master/master.go#L265)).
249+
The value will be a Lease object. A kube-apiserver obtains the list of IDs by
250+
directly listing/watching the leases through the storage interface.
251+
252+
- Cons:
253+
- We depend on a side-channel API, which is against Kubernetes philosophy
254+
- Clients like the kube-controller-manager cannot access the storage
255+
interface. For the storage version API, if we put the garbage collector in
256+
kube-apiserver instead of kube-controller-manager, the lack of leader
257+
election may cause update conflicts.
258+
259+
### Alternative 3: storage interface + Lease API
260+
261+
The kube-apiservers still write the master leases to etcd, but a controller will
262+
watch the master leases and update an existing public API (e.g. store it in a
263+
defined way in a Lease). Note that we cannot use the endpoints API like the
264+
“kubernetes” endpoints, because the endpoints API is designed to store a list of
265+
addresses, but our IDs are not IP addresses.
266+
267+
- Cons:
268+
- We depend on a side-channel API, which is against Kubernetes philosophy
269+
270+
### Alternative 4: storage interface + new API
271+
272+
Similar to Alternative 1, the kube-apiservers write the master leases to etcd,
273+
and a controller watches the master leases, but updates a new public API
274+
specifically designed to host information about the API servers, including its
275+
ID, enabled feature gates, etc.
276+
277+
- Cons:
278+
- We depend on a side-channel API, which is against Kubernetes philosophy
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
title: kube-apiserver identity
2+
kep-number: 1965
3+
authors:
4+
- "@roycaihw"
5+
owning-sig: sig-api-machinery
6+
status: provisional
7+
creation-date: 2020-09-02
8+
reviewers:
9+
- "@caesarxuchao"
10+
- "@lavalamp"
11+
- "@MikeSpreitzer"
12+
- "@deads2k"
13+
approvers:
14+
- "@lavalamp"
15+
- "@deads2k"
16+
see-also:
17+
- "https://docs.google.com/document/d/1ed7miqlFY7-9lZxE7gzoyx_MFQCtFEDqtcKMpaAmHys/edit?usp=sharing"
18+
19+
# The target maturity stage in the current dev cycle for this KEP.
20+
stage: alpha
21+
22+
# The most recent milestone for which work toward delivery of this KEP has been
23+
# done. This can be the current (upcoming) milestone, if it is being actively
24+
# worked on.
25+
latest-milestone: "v1.20"
26+
27+
# The milestone at which this feature was, or is targeted to be, at each stage.
28+
milestone:
29+
alpha: "v1.20"
30+
beta: "v1.21"
31+
stable: "v1.22"
32+
33+
# The following PRR answers are required at alpha release
34+
# List the feature gate name and the components for which it must be enabled
35+
feature-gates:
36+
- name: APIServerIdentity
37+
components:
38+
- kube-apiserver
39+
disable-supported: true

0 commit comments

Comments
 (0)