You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Alternative 1: new API + storage TTL](#alternative-1-new-api--storage-ttl)
@@ -30,17 +37,25 @@
30
37
31
38
Items marked with (R) are required *prior to targeting to a milestone / release*.
32
39
33
-
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
34
-
-[x] (R) KEP approvers have approved the KEP status as `implementable`
35
-
-[x] (R) Design details are appropriately documented
36
-
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
37
-
-[x] (R) Graduation criteria is in place
40
+
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
41
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
42
+
-[X] (R) Design details are appropriately documented
43
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
44
+
-[ ] e2e Tests for all Beta API Operations (endpoints)
45
+
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
46
+
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
47
+
-[X] (R) Graduation criteria is in place
48
+
-[ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
38
49
-[ ] (R) Production readiness review completed
39
-
-[ ] Production readiness review approved
50
+
-[ ](R) Production readiness review approved
40
51
-[ ] "Implementation History" section is up-to-date for milestone
41
52
-[ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
42
53
-[ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
43
54
55
+
<!--
56
+
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
will be re-used. The heartbeat controller will be added to kube-apiserver in a
101
126
post-start hook.
102
127
103
-
Each kube-apiserver will refresh its Lease every 10s by default. A GC controller
104
-
will watch the Lease API using an informer, and periodically resync its local
105
-
cache. On processing an item, the controller will delete the Lease if the last
106
-
`renewTime` was more than `leaseDurationSeconds` ago (default to 1h). The
107
-
default `leaseDurationSeconds` is chosen to be way longer than the default
128
+
Each kube-apiserver will run a lease controller in a post-start-hook to refresh
129
+
its Lease every 10s by default. A separate controller named [storageversiongc](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go)
130
+
running in kube-controller-manager will watch the Lease API using an informer, and
131
+
periodically resync its local cache. On processing an item, the `storageversiongc` controller
132
+
will delete the Lease if the last `renewTime` was more than `leaseDurationSeconds` ago (default to 1h).
133
+
The default `leaseDurationSeconds` is chosen to be way longer than the default
108
134
refresh period, to tolerate clock skew and/or accidental refresh failure. The
109
135
default resync period is 1h. By default, assuming negligible clock skew, a Lease
110
136
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
111
-
hours. The GC controller will run in kube-controller-manager, to leverage leader
137
+
hours. The `storageversiongc` controller will run in kube-controller-manager, to leverage leader
112
138
election and reduce conflicts.
113
139
114
140
The refresh rate, lease duration will be configurable through kube-apiserver
@@ -117,12 +143,30 @@ flag.
117
143
118
144
### Test Plan
119
145
120
-
- integration test for creating the Namespace and the Lease on kube-apiserver
121
-
startup
122
-
- integration test for not creating the StorageVersions after creating the
123
-
Lease
124
-
- integration test for garbage collecting a Lease that isn't refreshed
125
-
- integration test for not garbage collecting a Lease that is refreshed
146
+
[X] I/we understand the owners of the involved components may require updates to
147
+
existing tests to make this code solid enough prior to committing the changes necessary
@@ -154,64 +200,138 @@ Alpha should provide basic functionality covered with tests described above.
154
200
155
201
### Feature Enablement and Rollback
156
202
157
-
***How can this feature be enabled / disabled in a live cluster?**
158
-
-[x] Feature gate (also fill in values in `kep.yaml`)
159
-
- Feature gate name: APIServerIdentity
160
-
- Components depending on the feature gate: kube-apiserver
203
+
###### How can this feature be enabled / disabled in a live cluster?
204
+
205
+
-[X] Feature gate (also fill in values in `kep.yaml`)
206
+
- Feature gate name: APIServerIdentity
207
+
- Components depending on the feature gate: kube-apiserver, kube-controller-manager
161
208
162
-
***Does enabling the feature change any default behavior?**
163
-
A namespace "kube-apiserver-lease" will be used to store kube-apiserver
164
-
identity Leases.
209
+
###### Does enabling the feature change any default behavior?
165
210
166
-
***Can the feature be disabled once it has been enabled (i.e. can we roll back
167
-
the enablement)?**
168
-
Yes. Stale Lease objects will remain stale (`renewTime` won't get updated)
211
+
A namespace `kube-apiserver-lease` will be created to store kube-apiserver identity Leases.
212
+
Old leases will be actively garbage collected by kube-controller-manager.
169
213
170
-
***What happens if we reenable the feature if it was previously rolled back?**
171
-
Stale Lease objects will be garbage collected.
214
+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
215
+
216
+
Yes. Stale Lease objects will remain stale (renewTime won't get updated)
217
+
218
+
###### What happens if we reenable the feature if it was previously rolled back?
219
+
220
+
Stale Lease objects will be garbage collected.
221
+
222
+
###### Are there any tests for feature enablement/disablement?
223
+
224
+
There are some tests that require enabling the feature gate in [apiserver_identity_test.go](https://github.com/kubernetes/kubernetes/blob/24238425492227fdbb55c687fd4e94c8b58c1ee3/test/integration/controlplane/apiserver_identity_test.go).
225
+
However, there are no tests validating feature enablement/disablement based on the gate. These tests should be added prior to Beta.
172
226
173
227
### Rollout, Upgrade and Rollback Planning
174
228
175
-
_This section must be completed when targeting beta graduation to a release._
229
+
###### How can a rollout or rollback fail? Can it impact already running workloads?
230
+
231
+
Existing workloads should not be impacteded by this feature, unless they were
232
+
looking for Lease objects in the `kube-apiserver-lease` namespace.
233
+
234
+
###### What specific metrics should inform a rollback?
235
+
236
+
Recently added [healthcheck metrics for apiserver](https://github.com/kubernetes/kubernetes/pull/112741), which includes
237
+
the health of the post start hook can be used to inform rollback, specifically `kubernetes_healthcheck{poststarthook/start-kube-apiserver-identity-lease-controller}`
238
+
239
+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
240
+
241
+
Manual testing for upgrade/rollback will be done prior to Beta. Steps taken for manual tests will be updated here.
242
+
243
+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
244
+
245
+
No.
176
246
177
247
### Monitoring Requirements
178
248
179
-
_This section must be completed when targeting beta graduation to a release._
249
+
###### How can an operator determine if the feature is in use by workloads?
250
+
251
+
The existence of the `kube-apiserver-lease` namespace and Lease objects in the namespace
252
+
will determine if the feature is working. Operators can check for clients that are accessing
253
+
the Lease object to see if workloads or other controllers are relying on this feature.
254
+
255
+
###### How can someone using this feature know that it is working for their instance?
256
+
257
+
-[ ] Events
258
+
- Event Reason:
259
+
-[X] API .status
260
+
- Condition name:
261
+
- Other field: `.spec.holderIdentity`, `.spec.acquireTime`, `.spec.renewTime`, `.spec.leaseTransitions`
262
+
-[X] Other (treat as last resort)
263
+
- Details: audit logs for clients that are reading the Lease objects
264
+
265
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
266
+
267
+
Some reasonable SLOs could be:
268
+
* Number of (non-expired) Leases in `kube-apiserver-leases` is equal to the number of expected kube-apiservers 95% of the time.
269
+
* kube-apiservers hold a lease which is not older than 2 times the frequency of the lease heart beat 95% of time.
270
+
271
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
0 commit comments