6
6
- [ Motivation] ( #motivation )
7
7
- [ Proposal] ( #proposal )
8
8
- [ Caveats] ( #caveats )
9
- - [ Risks and Mitigations] ( #risks-and-mitigations )
10
9
- [ Design Details] ( #design-details )
11
10
- [ Test Plan] ( #test-plan )
12
11
- [ Graduation Criteria] ( #graduation-criteria )
@@ -83,22 +82,17 @@ listing the pods selected by the service, an aggregated server can learn the
83
82
list of living servers with distinct podIPs. A server can get its own IDs via
84
83
downward API.
85
84
86
- We prefer false positives over false negatives, because false negatives are more
87
- harmful. In the storage version API scenario, if a kube-apiserver accidentally
85
+ We prefer that expired Leases remain for a longer duration as opposed to
86
+ collecting them quickly, because in the latter case, if a Lease is falsely
87
+ collected by accident, it can do more damage than the former case. Take the
88
+ storage version API scenario as an example, if a kube-apiserver accidentally
88
89
missed a heartbeat and got its Lease garbage collected, its StorageVersion can
89
90
be falsely garbage collected as a consequence. In this case, the storage
90
91
migrator won’t be able to migrate the storage, unless this kube-aipserver gets
91
92
restarted and re-registers its StorageVersion. On the other hand, if a
92
93
kube-apiserver is gone and its Lease still stays around for an hour or two, it
93
94
will only delay the storage migration for the same period of time.
94
95
95
- ### Risks and Mitigations
96
-
97
- A new namespace will be reserved for storing kube-apiserver identity Lease
98
- objects. There is a chance that existing clusters may already be using the
99
- namespace. We mitigate the risk by documenting the namespace in the release
100
- note and use a feature gate to disable the behavior in alpha release.
101
-
102
96
## Design Details
103
97
104
98
The [ kubelet heartbeat] ( https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0009-node-heartbeat.md )
@@ -114,7 +108,8 @@ default `leaseDurationSeconds` is chosen to be way longer than the default
114
108
refresh period, to tolerate clock skew and/or accidental refresh failure. The
115
109
default resync period is 1h. By default, assuming negligible clock skew, a Lease
116
110
will be deleted if the kube-apiserver fails to refresh its Lease for one to two
117
- hours.
111
+ hours. The GC controller will run in kube-controller-manager, to leverage leader
112
+ election and reduce conflicts.
118
113
119
114
The refresh rate, lease duration will be configurable through kube-apiserver
120
115
flags. The resync period will be configurable through a kube-controller-manager
@@ -136,6 +131,8 @@ Alpha should provide basic functionality covered with tests described above.
136
131
#### Alpha -> Beta Graduation
137
132
138
133
- Appropriate metrics are agreed on and implemented
134
+ - An e2e test plan is agreed and implemented (e.g. chaosmonkey in a regional
135
+ cluster)
139
136
140
137
#### Beta -> GA Graduation
141
138
0 commit comments