You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-apps/1591-daemonset-surge/README.md
+15-7Lines changed: 15 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,11 @@ DaemonSet pods are slightly more constrained than Deployments when it comes to s
96
96
97
97
In order to reduce confusion for new users, we will start by rejecting HostPort use in daemonset when MaxSurge is non-zero. A user will not be able to update a daemonset to MaxSurge != 0 if HostPort is set, or update a HostPort if MaxSurge is set, without receiving a validation error. If the MaxSurge feature gate is off, the validation rule is bypassed, and a user who turns off the gate, sets both fields, and then enables the gate will have failing pods but will be able to update their daemonset to either remove surge or remove the host port safely.
98
98
99
+
A user who uses HostNetwork but does not declare HostPorts and attempts to use MaxSurge with processes that listen on the host network should see errors from the network stack when their process attempts to bind a port (such as `cannot bind to address: port in use`) and the new pod will crash and go into a crashloop. Users should expect to see these failures as they would any other "my application does not start on Kubernetes" error via pod status, daemonset status conditions, and pod logs.
100
+
101
+
Building a daemonset that hands off between two host level processes with any degree of coordination is an advanced topic and is up to the workload author. The simplest daemonsets may use pod network without any host level sharing and will benefit significantly from maxSurge during updates by reducing downtime at the cost of extra resources. As more complex sharing (host network, disk resources, unix domain sockets, configuration) is needed, the author is expected to leverage custom readiness probes, process start conditions, and process coordination mechanisms (like disks, networking, or shared memory) across pods. Debugging those interactions will be in the domain of the workload author.
102
+
103
+
99
104
### Workload Implications
100
105
101
106
There are three main workload types that seek to minimize disruption:
@@ -170,8 +175,8 @@ you need any help or guidance.
170
175
171
176
***How can this feature be enabled / disabled in a live cluster?**
172
177
-[x] Feature gate (also fill in values in `kep.yaml`)
173
-
- Feature gate name:
174
-
- Components depending on the feature gate:
178
+
- Feature gate name:`DaemonSetUpdateSurge`
179
+
- Components depending on the feature gate:`kube-apiserver`, `kube-controller-manager`
175
180
-[ ] Other
176
181
- Describe the mechanism:
177
182
- Will enabling / disabling the feature require downtime of the control
@@ -186,15 +191,18 @@ you need any help or guidance.
186
191
***Can the feature be disabled once it has been enabled (i.e. can we roll back
187
192
the enablement)?**
188
193
189
-
Yes, when the feature gate is disabled the field is ignored and can be cleared.
190
-
A workload using this alpha feature would no longer be able to surge and would
191
-
fall back to the default MaxUnavailable value (which is minimum 1).
194
+
Yes, when the feature gate is disabled the field is ignored and can be cleared by
195
+
an end user. A workload using this alpha feature would no longer be able to surge
196
+
and would fall back to the default MaxUnavailable value (which is minimum 1).
192
197
193
198
***What happens if we reenable the feature if it was previously rolled back?**
194
199
195
200
The field would become active and whatever new values were present would cause
196
-
the surge feature to become active. If the field were changed the user would have
197
-
to use the new alpha field.
201
+
the surge feature to become active. If the field name were changed old values
202
+
would be lost and the controller would default to using maxUnavailable 1.
203
+
204
+
To clear the field from etcd, disable the gate and perform a no-op PUT on every
205
+
daemonset.
198
206
199
207
***Are there any tests for feature enablement/disablement?**
0 commit comments