You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`test/integration/apiserver/coordinatedleaderelection`: New file
765
+
766
+
-`test/integration/apiserver/coordinatedleaderelection/*`: New directory
764
767
765
768
##### e2e tests
766
769
@@ -841,14 +844,31 @@ in back-to-back releases.
841
844
-->
842
845
843
846
#### Alpha
847
+
844
848
- Feature implemented behind a feature flag
845
-
- The strategy `MinimumCompatibilityVersionStrategy` is implemented
849
+
- The strategy `OldestEmulationVersion` is implemented
850
+
851
+
#### Beta
852
+
853
+
- e2e & integration tests for coordinated leader election on various scenarios
854
+
+ single leasecandidate
855
+
+ multiple leasecandidates
856
+
+ lease is preempted when another more suitable candidate is found
857
+
+ Components that don't know about coordination mixed with those who do
858
+
+ Downgrade to components that do not know about coordination
859
+
+ Custom third party strategy controller
860
+
- Lease pings are parallelized
861
+
- Tests are included for third party strategies
862
+
- Tests for disablement of the feature gate
863
+
864
+
#### GA
865
+
866
+
- Load test Coordinated Leader Election
867
+
- Feature is enabled by default
846
868
847
869
### Upgrade / Downgrade Strategy
848
870
849
-
If the `--leader-elect-resource-lock=coordinatedleases` flag is set and a
850
-
component is downgraded from beta to alpha, it will need to either remove the
851
-
flag or enable the alpha feature. All other upgrades and downgrades are safe.
871
+
Upgrading requires enabling the feature gate `CoordinatedLeaderElection` and the group version `coordination.k8s.io/v1alpha2`. Downgrading will revert to the old leader election mechanism, but may have extra data in etcd for `LeaseCandidate` objects under the `coordination.k8s.io/v1alpha2` group version.
852
872
853
873
<!--
854
874
If applicable, how will the component be upgraded and downgraded? Make sure
@@ -930,16 +950,12 @@ well as the [existing list] of feature gates.
930
950
- kube-apiserver
931
951
- kube-controller-manager
932
952
- kube-scheduler
933
-
-[ ] Other
934
-
- Describe the mechanism:
935
-
- Will enabling / disabling the feature require downtime of the control plane?
936
-
- Will enabling / disabling the feature require downtime or reprovisioning of
937
-
a node?
938
953
939
954
###### Does enabling the feature change any default behavior?
940
955
941
-
No, even when the feature is enabled, a component must be configured with
942
-
`--leader-elect-resource-lock=coordinatedleases` to use the feature.
956
+
Yes, kube-scheduler and kube-controller-manager will use coordinated leader
957
+
election instead of the default leader election mechanism if the feature is
958
+
enabled.
943
959
944
960
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
945
961
@@ -991,13 +1007,27 @@ rollout. Similarly, consider large clusters and how enablement/disablement
991
1007
will rollout across nodes.
992
1008
-->
993
1009
1010
+
Rollouts and rollbacks can fail in many ways. During the first rollout of the
1011
+
feature, there will be a mixed state of control planes using and not using
1012
+
coordinated leader election. Components not using CLE will race to obtain the
1013
+
best leader while the ones using CLE will defer the CLE controller to assign
1014
+
themselves as leader. We cannot guarantee the best leader is elected during
1015
+
mixed version states, but leader election will still be done.
1016
+
1017
+
If the CLE controller has bugs, it may fail to or incorrectly select a leader
1018
+
and could lead to disruptions.
1019
+
1020
+
If LeaseCandidate objects have incorrect version information, CLE controller may make an incorrect leader selection and potentially lead to version skew violations.
1021
+
994
1022
###### What specific metrics should inform a rollback?
995
1023
996
1024
<!--
997
1025
What signals should users be paying attention to when the feature is young
998
1026
that might indicate a serious problem?
999
1027
-->
1000
1028
1029
+
If leases fail to renew that would be a sign for rollback.
1030
+
1001
1031
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
1002
1032
1003
1033
<!--
@@ -1006,12 +1036,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
1006
1036
are missing a bunch of machinery and tooling and can't do that now.
1007
1037
-->
1008
1038
1039
+
Integration tests include testing for skew scenarios.
1040
+
1009
1041
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
1010
1042
1011
1043
<!--
1012
1044
Even if applying deprecation policies, they may still surprise some users.
1013
1045
-->
1014
1046
1047
+
No.
1048
+
1015
1049
### Monitoring Requirements
1016
1050
1017
1051
<!--
@@ -1029,6 +1063,11 @@ checking if there are objects with field X set) may be a last resort. Avoid
1029
1063
logs or events for this purpose.
1030
1064
-->
1031
1065
1066
+
LeaseCandidate resource will be enabled and feature gate
1067
+
`CoordinatedLeaderElection` will be enabled. On the Lease object, a new field
1068
+
`Strategy` will be populated indicating the strategy used by coordinated leader
1069
+
election for selecting the most suitable leader.
1070
+
1032
1071
###### How can someone using this feature know that it is working for their instance?
1033
1072
1034
1073
<!--
@@ -1040,13 +1079,10 @@ and operation of this feature.
1040
1079
Recall that end users cannot usually observe component logs or access metrics.
1041
1080
-->
1042
1081
1043
-
-[ ] Events
1044
-
- Event Reason:
1045
-
-[ ] API .status
1046
-
- Condition name:
1047
-
- Other field:
1048
-
-[ ] Other (treat as last resort)
1049
-
- Details:
1082
+
- LeaseCandidate objects will exist for leader elected components, and the
1083
+
`RenewTime` and `PingTime` fields will be recent (within 30 minutes).
1084
+
- Lease objects for leader elected components will be assigned and actively
1085
+
renewing.
1050
1086
1051
1087
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
1052
1088
@@ -1065,18 +1101,22 @@ These goals will help you determine what you need to measure (SLIs) in the next
1065
1101
question.
1066
1102
-->
1067
1103
1104
+
When leader elected components are in the cluster, the leader must be timely
1105
+
selected and propagated via the Lease object. The lease must be actively
1106
+
renewed.
1107
+
1068
1108
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
1082
1122
@@ -1085,6 +1125,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
1085
1125
implementation difficulties, etc.).
1086
1126
-->
1087
1127
1128
+
n/a.
1129
+
1088
1130
### Dependencies
1089
1131
1090
1132
<!--
@@ -1108,6 +1150,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
1108
1150
- Impact of its degraded performance or high-error rates on the feature:
1109
1151
-->
1110
1152
1153
+
No.
1154
+
1111
1155
### Scalability
1112
1156
1113
1157
<!--
@@ -1135,6 +1179,17 @@ Focusing mostly on:
1135
1179
heartbeats, leader election, etc.)
1136
1180
-->
1137
1181
1182
+
Yes.
1183
+
1184
+
- API call type: PUT
1185
+
- estimated throughput: Steady state is 3 requests per leader elected component
1186
+
every 30 minutes to renew the LeaseCandidate. If there is churn in the control
1187
+
plane, an extra 2N requests are performed on every change per leader elected
1188
+
component, N representing the number of available control planes. The number
1189
+
is 2N because N requests will be sent by the apiserver to ping all candidates,
1190
+
and every request should be ack'd by the client.
1191
+
- watch on LeaseCandidate resources
1192
+
1138
1193
###### Will enabling / using this feature result in introducing new API types?
1139
1194
1140
1195
<!--
@@ -1144,6 +1199,9 @@ Describe them, providing:
1144
1199
- Supported number of objects per namespace (for namespace-scoped objects)
1145
1200
-->
1146
1201
1202
+
- coordination.k8s.io/LeaseCandidate
1203
+
- One candidate will exist for each leader elected component for each control plane. Total amount is `# leader elected components` * `# control plane instances`
1204
+
1147
1205
###### Will enabling / using this feature result in any new calls to the cloud provider?
1148
1206
1149
1207
<!--
@@ -1152,6 +1210,8 @@ Describe them, providing:
1152
1210
- Estimated increase:
1153
1211
-->
1154
1212
1213
+
No.
1214
+
1155
1215
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
1156
1216
1157
1217
<!--
@@ -1161,6 +1221,8 @@ Describe them, providing:
1161
1221
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
1162
1222
-->
1163
1223
1224
+
An additional `Strategy` field will be populated on all leases elected by CLE. This is a string enum.
1225
+
1164
1226
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
1165
1227
1166
1228
<!--
@@ -1172,6 +1234,8 @@ Think about adding additional work or introducing new steps in between
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
1188
1254
1189
1255
<!--
@@ -1196,6 +1262,8 @@ Are there any tests that were run/should be run to understand performance charac
1196
1262
and validate the declared limits?
1197
1263
-->
1198
1264
1265
+
This is a control plane feature and does not affect node.
1266
+
1199
1267
### Troubleshooting
1200
1268
1201
1269
<!--
@@ -1211,6 +1279,15 @@ details). For now, we leave it here.
1211
1279
1212
1280
###### How does this feature react if the API server and/or etcd is unavailable?
1213
1281
1282
+
If the API server becomes unavailable, the CLE cannot function as it is built on
1283
+
top of the API server. It cannot monitor LeaseCandidates, update Leases, or
1284
+
elect new leaders. Existing leaders will continue to function until their Leases
1285
+
expire, but no new leaders will be elected until the API server recovers.
1286
+
1287
+
If etcd is unavailable, similar issues arise. The underlying lease mechanism
1288
+
relies on etcd for storage and coordination. Without etcd, Leases cannot be
1289
+
created, renewed, or monitored.
1290
+
1214
1291
###### What are other known failure modes?
1215
1292
1216
1293
<!--
@@ -1226,8 +1303,21 @@ For each of them, fill in the following information by copying the below templat
1226
1303
- Testing: Are there any tests for failure mode? If not, describe why.
1227
1304
-->
1228
1305
1306
+
- Leader election controller fails to elect a leader
1307
+
- Detection: Via metrics `apiserver_coordinated_leader_election_failures_total` increasing and absence of leader in lease object
1308
+
- Mitigations: Operators can disable feature gate.
1309
+
- Diagnostics: Check kube-apiserver logs for messages on failing to elect the leader. Look at the lease object renewal times and holder, along with leasecandidate objects for the particular component.
1310
+
- Testing: Integration test exists that prevents write access for the CLE controller and ensures that another controller takes over.
1311
+
1229
1312
###### What steps should be taken if SLOs are not being met to determine the problem?
1230
1313
1314
+
Check whether the CLE controller is operating properly, check if API server is
1315
+
not overloaded, and in the worst case disable the feature by explicitly setting
1316
+
the feature gate to false. This information can be found in the controller and
1317
+
API server logs `kube-apiserver.log`. Additionally, looking through the `lease`
1318
+
and `leasecandidate` objects will provide insight on whether the leases and
0 commit comments