Skip to content

Commit 3c01537

Browse files
andrewsykimenj
andcommitted
KEP-1965: update with Beta criteria/milestone and PRR questions answered
Signed-off-by: Andrew Sy Kim <[email protected]> Co-authored-by: Monis Khan <[email protected]>
1 parent b6222e7 commit 3c01537

File tree

3 files changed

+179
-59
lines changed

3 files changed

+179
-59
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 1965
2+
beta:
3+
approver: "@deads2k"

keps/sig-api-machinery/1965-kube-apiserver-identity/README.md

Lines changed: 166 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,16 @@
44
- [Release Signoff Checklist](#release-signoff-checklist)
55
- [Summary](#summary)
66
- [Motivation](#motivation)
7+
- [Goals](#goals)
8+
- [Non-Goals](#non-goals)
79
- [Proposal](#proposal)
810
- [Caveats](#caveats)
911
- [Design Details](#design-details)
1012
- [Test Plan](#test-plan)
13+
- [Prerequisite testing updates](#prerequisite-testing-updates)
14+
- [Unit tests](#unit-tests)
15+
- [Integration tests](#integration-tests)
16+
- [e2e tests](#e2e-tests)
1117
- [Graduation Criteria](#graduation-criteria)
1218
- [Alpha -&gt; Beta Graduation](#alpha---beta-graduation)
1319
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
@@ -18,6 +24,7 @@
1824
- [Monitoring Requirements](#monitoring-requirements)
1925
- [Dependencies](#dependencies)
2026
- [Scalability](#scalability)
27+
- [Troubleshooting](#troubleshooting)
2128
- [Implementation History](#implementation-history)
2229
- [Alternatives](#alternatives)
2330
- [Alternative 1: new API + storage TTL](#alternative-1-new-api--storage-ttl)
@@ -30,17 +37,25 @@
3037

3138
Items marked with (R) are required *prior to targeting to a milestone / release*.
3239

33-
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
34-
- [x] (R) KEP approvers have approved the KEP status as `implementable`
35-
- [x] (R) Design details are appropriately documented
36-
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
37-
- [x] (R) Graduation criteria is in place
40+
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
41+
- [X] (R) KEP approvers have approved the KEP status as `implementable`
42+
- [X] (R) Design details are appropriately documented
43+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
44+
- [ ] e2e Tests for all Beta API Operations (endpoints)
45+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
46+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
47+
- [X] (R) Graduation criteria is in place
48+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
3849
- [ ] (R) Production readiness review completed
39-
- [ ] Production readiness review approved
50+
- [ ] (R) Production readiness review approved
4051
- [ ] "Implementation History" section is up-to-date for milestone
4152
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
4253
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4354

55+
<!--
56+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
57+
-->
58+
4459
[kubernetes.io]: https://kubernetes.io/
4560
[kubernetes/enhancements]: https://git.k8s.io/enhancements
4661
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
@@ -65,14 +80,24 @@ load balancer for the cluster, where the advertise IP address is set to the IP
6580
address of the load balancer, all three kube-apiservers will have the same
6681
advertise IP address.
6782

83+
### Goals
84+
85+
* Provide a mechanism in which controllers can uniquely identify kube-apiserver's in a cluster.
86+
87+
### Non-Goals
88+
89+
* improving the availability of kube-apiserver
90+
6891
## Proposal
6992

7093
We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID.
7194

72-
Similar to the [node heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md),
95+
Similar to the [node heartbeats](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats),
7396
a kube-apiserver will store its ID in a Lease object. All kube-apiserver Leases
74-
will be stored in a special namespace “kube-apiserver-lease”. A controller will
75-
garbage collect expired Leases.
97+
will be stored in a special namespace `kube-apiserver-lease`. The Lease creation
98+
and heart beat will be managed by a controller that is started in kube-apiserver's
99+
post startup hook. A separate controller in kube-controller-manager will be responsible
100+
for garbaging collecting expired Leases.
76101

77102
### Caveats
78103

@@ -95,7 +120,7 @@ will only delay the storage migration for the same period of time.
95120

96121
## Design Details
97122

98-
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md)
123+
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats)
99124
logic [already written](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/nodelease)
100125
will be re-used. The heartbeat controller will be added to kube-apiserver in a
101126
post-start hook.
@@ -117,12 +142,30 @@ flag.
117142

118143
### Test Plan
119144

120-
- integration test for creating the Namespace and the Lease on kube-apiserver
121-
startup
122-
- integration test for not creating the StorageVersions after creating the
123-
Lease
124-
- integration test for garbage collecting a Lease that isn't refreshed
125-
- integration test for not garbage collecting a Lease that is refreshed
145+
[X] I/we understand the owners of the involved components may require updates to
146+
existing tests to make this code solid enough prior to committing the changes necessary
147+
to implement this enhancement.
148+
149+
##### Prerequisite testing updates
150+
151+
##### Unit tests
152+
153+
- `staging/src/k8s.io/apiserver/pkg/endpoints`
154+
155+
##### Integration tests
156+
157+
[apiserver_identity_test.go](https://github.com/kubernetes/kubernetes/blob/24238425492227fdbb55c687fd4e94c8b58c1ee3/test/integration/controlplane/apiserver_identity_test.go)
158+
- integration test for creating the Namespace and the Lease on kube-apiserver startup
159+
- integration test for not creating the StorageVersions after creating the Lease
160+
- integration test for garbage collecting a Lease that isn't refreshed
161+
- integration test for not garbage collecting a Lease that is refreshed
162+
163+
##### e2e tests
164+
165+
Proposed e2e tests:
166+
- an e2e test that validates the existence of the Lease objects per kube-apiserver
167+
- an e2e test that restarts a kube-apiserver and validates that a new Lease is created
168+
with a newly generated ID and the old lease is garbage collected
126169

127170
### Graduation Criteria
128171

@@ -131,14 +174,14 @@ Alpha should provide basic functionality covered with tests described above.
131174
#### Alpha -> Beta Graduation
132175

133176
- Appropriate metrics are agreed on and implemented
134-
- An e2e test plan is agreed and implemented (e.g. chaosmonkey in a regional
135-
cluster)
177+
- Sufficient integration tests covering basic functionality of this enhancement.
178+
- e2e tests outlined in the test plan are implemented
136179

137180
#### Beta -> GA Graduation
138181

139-
- Conformance tests are agreed on and implemented
182+
N/A
140183

141-
**For non-optional features moving to GA, the graduation criteria must include
184+
**For non-optional features moving to GA, the graduation criteria must include
142185
[conformance tests].**
143186

144187
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
@@ -154,64 +197,132 @@ Alpha should provide basic functionality covered with tests described above.
154197

155198
### Feature Enablement and Rollback
156199

157-
* **How can this feature be enabled / disabled in a live cluster?**
158-
- [x] Feature gate (also fill in values in `kep.yaml`)
159-
- Feature gate name: APIServerIdentity
160-
- Components depending on the feature gate: kube-apiserver
200+
###### How can this feature be enabled / disabled in a live cluster?
161201

162-
* **Does enabling the feature change any default behavior?**
163-
A namespace "kube-apiserver-lease" will be used to store kube-apiserver
164-
identity Leases.
202+
- [X] Feature gate (also fill in values in `kep.yaml`)
203+
- Feature gate name: APIServerIdentity
204+
- Components depending on the feature gate: kube-apiserver
165205

166-
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
167-
the enablement)?**
168-
Yes. Stale Lease objects will remain stale (`renewTime` won't get updated)
206+
###### Does enabling the feature change any default behavior?
169207

170-
* **What happens if we reenable the feature if it was previously rolled back?**
171-
Stale Lease objects will be garbage collected.
208+
A namespace "kube-apiserver-lease" will be used to store kube-apiserver identity Leases.
209+
210+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
211+
212+
Yes. Stale Lease objects will remain stale (renewTime won't get updated)
213+
214+
###### What happens if we reenable the feature if it was previously rolled back?
215+
216+
Stale Lease objects will be garbage collected.
217+
218+
###### Are there any tests for feature enablement/disablement?
219+
220+
Yes, see [apiserver_identity_test.go](https://github.com/kubernetes/kubernetes/blob/24238425492227fdbb55c687fd4e94c8b58c1ee3/test/integration/controlplane/apiserver_identity_test.go).
172221

173222
### Rollout, Upgrade and Rollback Planning
174223

175-
_This section must be completed when targeting beta graduation to a release._
224+
###### How can a rollout or rollback fail? Can it impact already running workloads?
225+
226+
Existing workloads should not be impacteded by this feature, unless they were
227+
looking for Lease objects in the `kube-apiserver-lease` namespace.
228+
229+
###### What specific metrics should inform a rollback?
230+
231+
Recently added [healthcheck metrics for apiserver](https://github.com/kubernetes/kubernetes/pull/112741), which includes
232+
the health of the post start hook can be used to inform rollback, specifically `kubernetes_healthcheck{poststarthook/start-kube-apiserver-identity-lease-controller}`
233+
234+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
235+
236+
Manual testing for upgrade/rollback will be done prior to Beta. Steps taken for manual tests will be updated here.
237+
238+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
239+
240+
No.
176241

177242
### Monitoring Requirements
178243

179-
_This section must be completed when targeting beta graduation to a release._
244+
###### How can an operator determine if the feature is in use by workloads?
245+
246+
The existence of the `kube-apiserver-lease` namespace and Lease objects in the namespace
247+
will determine if the feature is working. Operators can check for clients that are accessing
248+
the Lease object to see if workloads or other controllers are relying on this feature.
249+
250+
###### How can someone using this feature know that it is working for their instance?
251+
252+
- [ ] Events
253+
- Event Reason:
254+
- [X] API .status
255+
- Condition name:
256+
- Other field:
257+
- [X] Other (treat as last resort)
258+
- Details: audit logs for clients that are reading the Lease objects
259+
260+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
261+
262+
A rough SLO here is that kube-apiserver updates leases at the same frequency as kubelet node heart beats,
263+
since the same mechanism is being used.
264+
265+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
266+
267+
- [X] Metrics
268+
- Metric name: kubernetes_healthcheck
269+
- [Optional] Aggregation method: name="poststarthook/start-kube-apiserver-identity-lease-controller"
270+
- Components exposing the metric: kube-apiserver
271+
272+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
273+
274+
Yes, heart beat latency could be useful.
180275

181276
### Dependencies
182277

183-
_This section must be completed when targeting beta graduation to a release._
278+
###### Does this feature depend on any specific services running in the cluster?
279+
280+
No
184281

185282
### Scalability
186283

187-
* **Will enabling / using this feature result in any new API calls?**
188-
Describe them, providing:
189-
- API call type (e.g. PATCH pods): UPDATE leases
190-
- estimated throughput:
191-
- originating component(s) (e.g. Kubelet, Feature-X-controller):
192-
kube-apiserver
284+
###### Will enabling / using this feature result in any new API calls?
285+
286+
Yes, kube-apiserver will be making new API calls as part of the lease controller.
287+
288+
###### Will enabling / using this feature result in introducing new API types?
289+
290+
No, the feature will use the existing Lease API.
291+
292+
###### Will enabling / using this feature result in any new calls to the cloud provider?
293+
294+
No
295+
296+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
297+
298+
Yes, it will increase the number of Leases in a cluster by the number of control plane VMs.
299+
300+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
301+
302+
No.
303+
304+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
305+
306+
The lease controller may use additional resources in kube-apiserver, but it is likely negligible.
307+
308+
### Troubleshooting
309+
310+
###### How does this feature react if the API server and/or etcd is unavailable?
193311

194-
focusing mostly on:
195-
- components listing and/or watching resources they didn't before:
196-
kube-controller-manager
197-
- periodic API calls to reconcile state (e.g. periodic fetching state,
198-
heartbeats, leader election, etc.): kube-apiserver heartbeat every 10s
312+
Lease objects for a given kube-apiserver may become stale if the kube-apiserver or etcd is non-responsive. Clients should
313+
be able to respond accordingly by checking the lease expiration.
199314

200-
* **Will enabling / using this feature result in increasing size or count of
201-
the existing API objects?**
202-
Describe them, providing:
203-
- API type(s): leases
204-
- Estimated amount of new objects: one per living kube-apiserver
315+
###### What are other known failure modes?
205316

206-
* **Will enabling / using this feature result in increasing time taken by any
207-
operations covered by [existing SLIs/SLOs]?**
208-
No.
317+
* lease objects can become stale if etcd is unavailable and clients do not check lease expiration.
318+
* kube-apiserver heart beats consuming too many resources (unlikely but possible)
209319

210-
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
320+
###### What steps should be taken if SLOs are not being met to determine the problem?
211321

212322
## Implementation History
213323

214324
- 2020-09-18: KEP introduced
325+
- 2022-10-05: KEP updated with Beta criteria and all PRR questions answered.
215326

216327
## Alternatives
217328

keps/sig-api-machinery/1965-kube-apiserver-identity/kep.yaml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@ title: kube-apiserver identity
22
kep-number: 1965
33
authors:
44
- "@roycaihw"
5+
- "@andrewsykim"
6+
- "@enj"
57
owning-sig: sig-api-machinery
68
status: implementable
79
creation-date: 2020-09-02
@@ -17,18 +19,17 @@ see-also:
1719
- "https://docs.google.com/document/d/1ed7miqlFY7-9lZxE7gzoyx_MFQCtFEDqtcKMpaAmHys/edit?usp=sharing"
1820

1921
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
22+
stage: beta
2123

2224
# The most recent milestone for which work toward delivery of this KEP has been
2325
# done. This can be the current (upcoming) milestone, if it is being actively
2426
# worked on.
25-
latest-milestone: "v1.20"
27+
latest-milestone: "v1.26"
2628

2729
# The milestone at which this feature was, or is targeted to be, at each stage.
2830
milestone:
2931
alpha: "v1.20"
30-
beta: "v1.21"
31-
stable: "v1.22"
32+
beta: "v1.26"
3233

3334
# The following PRR answers are required at alpha release
3435
# List the feature gate name and the components for which it must be enabled
@@ -37,3 +38,8 @@ feature-gates:
3738
components:
3839
- kube-apiserver
3940
disable-supported: true
41+
42+
metrics:
43+
- kubernetes_healthcheck{name="poststarthook/start-kube-apiserver-identity-lease-controller",type="healthz"}
44+
- kubernetes_healthcheck{name="poststarthook/start-kube-apiserver-identity-lease-controller",type="readyz"}
45+
- kubernetes_healthcheck{name="poststarthook/start-kube-apiserver-identity-lease-controller",type="livez"}

0 commit comments

Comments
 (0)