You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -209,14 +209,14 @@ This enhancement extends the scope of those and other conditions to other contro
209
209
### Goals
210
210
211
211
The current status of a workload can be depicted via its conditions. It can be a subset of:
212
-
- Progressing
213
-
- Available
214
-
- ReplicaFailure
215
-
- Suspended
216
-
- Complete
217
-
- Failed (to progress)
218
-
- Waiting (Job only)
219
-
- Running (Job only)
212
+
- Progressing: designates the state of the latest rollout.
213
+
- Available: designates if the required number of available replicas is `available`.
214
+
- ReplicaFailure: ReplicaSet failed to create/delete a Pod.
215
+
- Suspended: execution of a Job is suspended.
216
+
- Complete: Job run to a completion, or rollout completed (via Progressing condition).
217
+
- Failed: Job failed to complete, or rollout failed to progress (via Progressing condition).
218
+
- Waiting (Job only): waiting for at least one Pod to run.
219
+
- Running (Job only): at least one Pod of a Job is running.
220
220
221
221
Workload controllers should have above conditions (when applicable) to reflect their states.
222
222
@@ -228,10 +228,11 @@ and make progress.
228
228
-->
229
229
230
230
- Modifying the existing states of deployment controller
231
-
- Changing the definition of States
231
+
- Changing the definition of statuses
232
232
- Introduce new api for existing conditions
233
-
- To properly implement Progressing condition,`.spec.progressDeadlineSeconds` field has to be introduced in
233
+
- To properly implement Progressing condition.`.spec.progressDeadlineSeconds` field has to be introduced as part of an additional KEP in
234
234
DaemonSet and StatefulSet to describe the time when the controllers should declare the workload as `failed`.
235
+
- consider adding Conditions field to CronJob
235
236
236
237
## Proposal
237
238
@@ -256,11 +257,11 @@ bogged down.
256
257
-->
257
258
258
259
#### Story 1
259
-
As an end-user of Kubernetes, I'd like all my workload controllers to have consistent states
260
+
As an end-user of Kubernetes, I'd like all my workload controllers to have consistent statuses.
260
261
261
262
#### Story 2
262
263
As an developer building Kubernetes Operators, I'd like all my operators deployed to have
263
-
consistent states.
264
+
consistent statuses.
264
265
265
266
266
267
### Overview of all conditions
@@ -280,6 +281,14 @@ The following table gives an overview on what conditions each of the workload re
280
281
281
282
**\*\* CronJob does not even have Conditions field in its Status**
282
283
284
+
### Notes/Constraints/Caveats (Optional)
285
+
286
+
As observed in some issues (https://github.com/kubernetes/website/pull/31226) and talking to the users, there is a misunderstanding about the meaning of the `Progressing` condition. These include:
287
+
- Thinking that the `Progressing` condition reflects the state of the current Deployment instead of the last rollout. Which includes expectation that the `Progressing` condition will keep checking availability of replicas and revert to `progressing`/`failed` state even after the `complete` state is reached. And that the progressing condition will thus also reflect any changes in scaling.
288
+
- Confusion that ProgressDeadlineExceeded does not occur after the Deployment rollout completes when the availability of pods degrades before the `.spec.progressDeadlineSeconds` times out.
289
+
290
+
To summarize, the name of the `Progressing` condition doesn't indicate its true meaning that its main responsibility is the indication of rollouts, and it confuses the users.
291
+
283
292
### Proposed Conditions
284
293
285
294
<!--
@@ -322,7 +331,7 @@ Kubernetes marks a DaemonSet, ReplicaSet or Stateful as `complete` when it has t
322
331
- No old or mischeduled replicas/pods for the DaemonSet, ReplicaSet or Stateful are running.
323
332
324
333
#### Failed
325
-
In order to introduce this condition we need to come up with a new field called `.spec.progressDeadlineSeconds` which denotes the number of seconds the controller waits before indicating(in the workload controller status) that the controller progress has stalled.
334
+
In order to introduce this condition we need to come up with a new field called `.spec.progressDeadlineSeconds`(additional KEP) which denotes the number of seconds the controller waits before indicating(in the workload controller status) that the controller progress has stalled.
326
335
327
336
There are many factors that can cause failure to happen like:
328
337
- Insufficient quota
@@ -332,7 +341,7 @@ There are many factors that can cause failure to happen like:
332
341
333
342
See the [Kubernetes API Conventions](https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties) for more information on status conditions
334
343
335
-
Because of the number of changes that are involved as part of this effort, we are thinking of a phased approach where we introduce these conditions to DaemonSet controller behind a featuregate flag first in one release and then make similar changes to ReplicaSet and StatefulSet controller.
344
+
Because of the number of changes that are involved as part of this effort, we are thinking of a phased approach where we introduce these conditions to DaemonSet controller first and then make similar changes to ReplicaSet and StatefulSet controller. We would graduate ExtraWorkloadConditions to beta once all the features and `progressDeadlineSeconds` KEP are implemented.
336
345
This also needs some code refactoring of existing conditions for Deployment controller.
337
346
338
347
#### Available
@@ -353,21 +362,12 @@ Batch workloads behaviour does not properly map to the other workloads that are
353
362
Kubernetes marks a Job as `waiting` if one of the following conditions is true:
354
363
355
364
- There are no Pods with phase `Running` and there is at least one Pod with phase `Pending`.
356
-
- The Job is suspended.
357
365
358
366
Kubernetes marks a Job as `running` if there is at least one Pod with phase `Running`.
359
367
360
368
This KEP does not introduce CronJob conditions as it is difficult to define conditions that would describe CronJob behaviour in useful manner.
361
369
In case the user is interested if there are any running Jobs, `.status.active` field should be sufficient.
362
370
363
-
### Notes/Constraints/Caveats (Optional)
364
-
365
-
As observed in some issues (https://github.com/kubernetes/website/pull/31226) and talking to the users, there is a misunderstanding about the meaning of the `Progressing` condition. These include:
366
-
- Thinking that the `Progressing` condition reflects the state of the current Deployment instead of the last rollout. Which includes expectation that the `Progressing` condition will keep checking availability of replicas and revert to `progressing`/`failed` state even after the `complete` state is reached. And that the progressing condition will thus also reflect any changes in scaling.
367
-
- Confusion that ProgressDeadlineExceeded does not occur after the Deployment rollout completes when the availability of pods degrades before the `.spec.progressDeadlineSeconds` times out.
368
-
369
-
To summarize, the name of the `Progressing` condition doesn't indicate its true meaning that its main responsibility is the indication of rollouts, and it confuses the users.
370
-
371
371
### Risks and Mitigations
372
372
373
373
<!--
@@ -381,7 +381,7 @@ How will UX be reviewed, and by whom?
381
381
382
382
Consider including folks who also work outside the SIG or subproject.
383
383
-->
384
-
We are proposing a new field called `.spec.progressDeadlineSeconds` to DaemonSet and StatefulSet whose default value will be set to the max value of int32 (i.e. 2147483647) by default, which means "no deadline".
384
+
We are proposing a new field called `.spec.progressDeadlineSeconds` to DaemonSet and StatefulSet as part of a additional KEP whose default value will be set to the max value of int32 (i.e. 2147483647) by default, which means "no deadline".
385
385
In this mode, DaemonSet and StatefulSet controllers will behave exactly like its current behavior but with no `Failed` mode as they're either `Progressing` or `Complete`.
386
386
This is to ensure backward compatibility with current behavior. We will default to reasonable values for the controllers in a future release.
387
387
Since a DaemonSet can make progress no faster than "healthy but not ready nodes", the default value for `progressDeadlineSeconds` can be set to 30 minutes but this value can vary depending on the node count in the cluster.
@@ -391,12 +391,9 @@ It is possible that we introduce a bug in the implementation. The bug can cause:
391
391
- DaemonSet and StatefulSet controllers can be marked `Failed` even though rollout is in progress
392
392
- The states could be misrepresented, for example a DaemonSet or StatefulSet can be marked `Progressing` when actually it is complete
393
393
394
-
The mitigation currently is that these features will be in Alpha stage behind featuregates for people to try out and give feedback. In Beta phase when
394
+
The mitigation currently is that these features will be in Alpha stage behind `ExtraWorkloadConditions` featuregate for people to try out and give feedback. In Beta phase when
395
395
these are enabled by default, people will only see issues or bugs when `progressDeadlineSeconds` is set to something greater than 0 and we choose default values for that field.
396
396
Since people would have tried this feature in Alpha, we would have had time to fix issues.
397
-
The featuregates we use are `DaemonSetConditions` for DaemonSet controller, `StatefulSetConditions` for StatefulSet controller.
398
-
399
-
400
397
401
398
402
399
## Design Details
@@ -410,7 +407,7 @@ proposal will be implemented, this is the place to discuss them.
410
407
411
408
### Test Plan
412
409
Unit and E2E tests will be added to cover the
413
-
API validation, behavioral change of DaemonSet and StatefulSet with feature gate enabled and disabled.
410
+
API validation, behavioral change of controllers with feature gate enabled and disabled.
414
411
415
412
- Validating all possible states of old and new conditions. Checking that the changes in underlying Pod statuses correspond to the conditions.
416
413
- Testing `progressDeadlineSeconds` and feature gates.
@@ -442,6 +439,7 @@ when drafting this test plan.
442
439
#### Alpha -> Beta Graduation
443
440
- Gather feedback from end users
444
441
- Tests are in Testgrid and linked in KEP
442
+
- all new features in the following controllers should be implemented: ReplicaSet & ReplicationController, StatefulSet, DaemonSet and Job. To fully support `failed` state of a progressing condition, `progressDeadlineSeconds` KEP should be also fully implemented.
445
443
446
444
#### Beta -> GA Graduation
447
445
- 2 examples of end users using this field
@@ -519,20 +517,11 @@ enhancement:
519
517
cluster required to make on upgrade, in order to make use of the enhancement?
520
518
-->
521
519
- Upgrades
522
-
When upgrading from a release without this feature, to a release with
523
-
`progressDeadlineSeconds`, we will set `progressDeadlineSeconds` to max value of int32 (i.e. 2147483647). This would give users
524
-
the same default behavior.
520
+
When upgrading from a release without this feature, to a release with `ExtraWorkloadConditions` feature,
521
+
we will set new conditions on the mentioned workloads.
525
522
- Downgrades
526
-
When downgrading from a release with this feature, to a release without
527
-
`progressDeadlineSeconds`, there are two cases
528
-
- If `progressDeadlineSeconds` is greater than 0 -- in this case kube-apiserver
529
-
clears the `progressDeadlineSeconds` and the existing StatefulSet and DaemonSet controllers wouldn't honor
530
-
`progressDeadlineSeconds` which is expected.
531
-
- If `progressDeadlineSeconds` is equal to 0 -- in this case user wont see any
532
-
difference in behavior.
533
-
534
-
We will ensure that the `progressDeadlineSeconds` field is properly validated
535
-
before persisting. The validation includes checking for positive number for the `progressDeadlineSeconds` field.
523
+
When downgrading from a release with this feature, to a release without,
524
+
we will remove the extra conditions from workload objects.
536
525
537
526
### Version Skew Strategy
538
527
@@ -588,15 +577,12 @@ Pick one of these and delete the rest.
588
577
-->
589
578
590
579
-[x] Feature gate (also fill in values in `kep.yaml`)
0 commit comments