@@ -885,7 +885,9 @@ the pod is actually in the terminal phase (`Failed`), to ensure their state is
885
885
not modified while Job controller matches them against the pod failure policy.
886
886
887
887
However, there are scenarios in which a pod gets stuck in a non-terminal phase,
888
- but is doomed to be failed, as it is terminating (has `deletionTimestamp` set).
888
+ but is doomed to be failed, as it is terminating (has `deletionTimestamp` set, also
889
+ known as the `DELETING` state, see :
890
+ [The API Object Lifecycle](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/object-lifecycle.md)).
889
891
In order to workaround this issue, Job controller, when pod failure policy is
890
892
disabled, considers any terminating pod that is in a non-terminal phase as failed.
891
893
Note that, it is important that when Job controller considers such pods as failed
@@ -974,7 +976,7 @@ spec:
974
976
rules: []
975
977
backoffLimit: 0
976
978
` ` `
977
- 2. delete the pod with `k delete pods -l job-name=invalid-image`
979
+ 2. delete the pod with `kubectl delete pods -l job-name=invalid-image`
978
980
979
981
The relevant fields of the pod :
980
982
@@ -1047,7 +1049,7 @@ spec:
1047
1049
rules: []
1048
1050
backoffLimit: 0
1049
1051
` ` `
1050
- 2. delete the pod with `k delete pods -l job-name=invalid-configmap-ref`
1052
+ 2. delete the pod with `kubectl delete pods -l job-name=invalid-configmap-ref`
1051
1053
1052
1054
The relevant fields of the pod :
1053
1055
@@ -1099,12 +1101,12 @@ spec:
1099
1101
- name: huge-image
1100
1102
image: sagemathinc/cocalc # this is around 20GB
1101
1103
command: ["bash"]
1102
- args: ["-c", 'echo "Hello world"']
1104
+ args: ["-c", 'sleep 60 && echo "Hello world"']
1103
1105
podFailurePolicy:
1104
1106
rules: []
1105
1107
backoffLimit: 0
1106
1108
` ` `
1107
- 2. delete the pod with `k delete pods -l job-name=huge-image`
1109
+ 2. delete the pod with `kubectl delete pods -l job-name=huge-image`
1108
1110
1109
1111
The relevant fields of the pod :
1110
1112
@@ -1131,9 +1133,10 @@ The relevant fields of the pod:
1131
1133
1132
1134
Here, the pod is not stuck, however it transitions to `Running` and fails
1133
1135
soon after, making the interim transition to `Running` unnecessary. Also, there
1134
- is a race condition, in some situations the running pod may complete with the
1135
- ` Succeeded` status before its containers are killed and in transitions in the
1136
- ` Failed` phase. This is already problematic for the Job controller, which might
1136
+ is a race condition, if the container succeeds before the graceful period for
1137
+ pod termination (if not for the `sleep 60` in the example above) the running pod may complete with the
1138
+ ` Succeeded` status before its containers are killed (and it transitions in the
1139
+ ` Failed` phase). This is already problematic for the Job controller, which might
1137
1140
count the pod as failed, despite the pod eventually succeeding. With the proposed
1138
1141
change, in the scenario, the pod transitions directly from the `Pending` phase
1139
1142
to `Failed`.
@@ -2209,6 +2212,19 @@ This through this both in small and large cases, again with respect to the
2209
2212
[supported limits] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
2210
2213
-->
2211
2214
2215
+ # ##### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
2216
+
2217
+ No. This feature does not introduce any resource exhaustive operations.
2218
+
2219
+ <!--
2220
+ Focus not just on happy cases, but primarily on more pathological cases
2221
+ (e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
2222
+ If any of the resources can be exhausted, how this is mitigated with the existing limits
2223
+ (e.g. pods per node) or new limits added by this KEP?
2224
+ Are there any tests that were run/should be run to understand performance characteristics better
2225
+ and validate the declared limits?
2226
+ -->
2227
+
2212
2228
# ## Troubleshooting
2213
2229
2214
2230
<!--
0 commit comments