Skip to content

Commit e3365ff

Browse files
authored
Merge pull request kubernetes#3621 from andrewsykim/kep-1965
KEP-1965: update kube-apiserver identity KEP to reflect current state
2 parents c77da27 + 0bbed62 commit e3365ff

File tree

1 file changed

+33
-31
lines changed
  • keps/sig-api-machinery/1965-kube-apiserver-identity

1 file changed

+33
-31
lines changed

keps/sig-api-machinery/1965-kube-apiserver-identity/README.md

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -90,14 +90,14 @@ advertise IP address.
9090

9191
## Proposal
9292

93-
We will use “hostname+PID+random suffix (e.g. 6 base58 digits)” as the ID.
94-
9593
Similar to the [node heartbeats](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats),
9694
a kube-apiserver will store its ID in a Lease object. All kube-apiserver Leases
97-
will be stored in a special namespace `kube-apiserver-lease`. The Lease creation
98-
and heart beat will be managed by a controller that is started in kube-apiserver's
99-
post startup hook. A separate controller in kube-controller-manager will be responsible
100-
for garbaging collecting expired Leases.
95+
will be stored in the `kube-system` namespace.
96+
97+
The lease creation and heart beat
98+
will be managed by the `start-kube-apiserver-identity-lease-controller` post start hook
99+
and expired leases will be garbage collected by the `start-kube-apiserver-identity-lease-garbage-collector`
100+
post start hook in kube-apiserver.
101101

102102
### Caveats
103103

@@ -122,24 +122,22 @@ will only delay the storage migration for the same period of time.
122122

123123
The [kubelet heartbeat](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/589-efficient-node-heartbeats)
124124
logic [already written](https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/nodelease)
125-
will be re-used. The heartbeat controller will be added to kube-apiserver in a
126-
post-start hook.
127-
128-
Each kube-apiserver will run a lease controller in a post-start-hook to refresh
129-
its Lease every 10s by default. A separate controller named [storageversiongc](https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/storageversiongc/gc_controller.go)
130-
running in kube-controller-manager will watch the Lease API using an informer, and
131-
periodically resync its local cache. On processing an item, the `storageversiongc` controller
132-
will delete the Lease if the last `renewTime` was more than `leaseDurationSeconds` ago (default to 1h).
133-
The default `leaseDurationSeconds` is chosen to be way longer than the default
134-
refresh period, to tolerate clock skew and/or accidental refresh failure. The
135-
default resync period is 1h. By default, assuming negligible clock skew, a Lease
136-
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
137-
hours. The `storageversiongc` controller will run in kube-controller-manager, to leverage leader
138-
election and reduce conflicts.
139-
140-
The refresh rate, lease duration will be configurable through kube-apiserver
141-
flags. The resync period will be configurable through a kube-controller-manager
142-
flag.
125+
will be re-used. The lease creation and heart beat will be managed by the `start-kube-apiserver-identity-lease-controller`
126+
post-start-hook and expired leases will be garbage collected by the `start-kube-apiserver-identity-lease-garbage-collector`
127+
post-start-hook in kube-apiserver. The refresh rate, lease duration will be configurable through kube-apiserver
128+
flags
129+
130+
The format of the lease ID will be `kube-apiserver-<UUID>`. The UUID is newly generated on every start-up. This ID format is preferred
131+
for the following reasons:
132+
* No two kube-apiservers on the same host can share the same lease identity.
133+
* Revealing the hostname of kube-apiserver may not be desirable for some Kubernetes platforms.
134+
* The kube-apiserver version may change between restarts, which can trigger a storage version migration (see KEP on StorageVersionAPI)
135+
136+
In some cases it can be desirable to use a predictable ID format (e.g. kube-apiserver-<hostname>). We may consider providing
137+
a flag in `kube-apiserver` to override the lease identity.
138+
139+
All kube-apiserver leases will also have a component label `k8s.io/component=kube-apiserver`.
140+
143141

144142
### Test Plan
145143

@@ -208,8 +206,8 @@ Alpha should provide basic functionality covered with tests described above.
208206

209207
###### Does enabling the feature change any default behavior?
210208

211-
A namespace `kube-apiserver-lease` will be created to store kube-apiserver identity Leases.
212-
Old leases will be actively garbage collected by kube-controller-manager.
209+
kube-apiserver will store identity Leases in the `kube-system` namespace.
210+
Expired leases will be actively garbage collected by a post-start-hook in kube-apiserver.
213211

214212
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
215213

@@ -229,12 +227,14 @@ However, there are no tests validating feature enablement/disablement based on t
229227
###### How can a rollout or rollback fail? Can it impact already running workloads?
230228

231229
Existing workloads should not be impacteded by this feature, unless they were
232-
looking for Lease objects in the `kube-apiserver-lease` namespace.
230+
looking for kube-apiserver Lease objects in the `kube-system` namespace, which can be
231+
found using the `k8s.io/component=kube-apiserver` label.
233232

234233
###### What specific metrics should inform a rollback?
235234

236235
Recently added [healthcheck metrics for apiserver](https://github.com/kubernetes/kubernetes/pull/112741), which includes
237236
the health of the post start hook can be used to inform rollback, specifically `kubernetes_healthcheck{poststarthook/start-kube-apiserver-identity-lease-controller}`
237+
and `kubernetes_healthcheck{poststarthook/start-kube-apiserver-identity-lease-garbage-collector}`
238238

239239
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
240240

@@ -248,7 +248,7 @@ No.
248248

249249
###### How can an operator determine if the feature is in use by workloads?
250250

251-
The existence of the `kube-apiserver-lease` namespace and Lease objects in the namespace
251+
The existence of kube-apiserver Lease objects in the `kube-system` namespace
252252
will determine if the feature is working. Operators can check for clients that are accessing
253253
the Lease object to see if workloads or other controllers are relying on this feature.
254254

@@ -265,22 +265,24 @@ the Lease object to see if workloads or other controllers are relying on this fe
265265
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
266266

267267
Some reasonable SLOs could be:
268-
* Number of (non-expired) Leases in `kube-apiserver-leases` is equal to the number of expected kube-apiservers 95% of the time.
268+
* Number of (non-expired) Leases in `kube-system` is equal to the number of expected kube-apiservers 95% of the time.
269269
* kube-apiservers hold a lease which is not older than 2 times the frequency of the lease heart beat 95% of time.
270270

271+
All leases owned by kube-apiservers can be found using the `k8s.io/component=kube-apiserver` label.
272+
271273
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
272274

273275
- [X] Metrics
274276
- Metric name: kubernetes_healthcheck
275-
- [Optional] Aggregation method: name="poststarthook/start-kube-apiserver-identity-lease-controller"
277+
- [Optional] Aggregation method: name="poststarthook/start-kube-apiserver-identity-lease-controller", name="poststarthook/start-kube-apiserver-identity-lease-garbage-collector"
276278
- Components exposing the metric: kube-apiserver
277279

278280
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
279281

280282
A metric measuring the last updated time for a lease could be useful, but it could introduce cardinality problems
281283
since the lease is changed on every restart of kube-apiserver.
282284

283-
We may consider adding a metric exposing the count of leases in `kube-apiserver-lease`.
285+
We may consider adding a metric exposing the count of leases in `kube-system`.
284286

285287
### Dependencies
286288

0 commit comments

Comments
 (0)