You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- setting of the `reason` field to equal `OOMKilled` is not standardized. We have
822
-
started an effort to standardize the handling of handling OOM killed containers
821
+
- setting the `reason` field to `OOMKilled` is not standardized, either. We have
822
+
started an effort to standardize the handling of OOM killed containers
823
823
(see: [Documentation for the CRI API reason field to standardize the field for containers terminated by OOM killer](https://github.com/kubernetes/kubernetes/pull/112977)).
824
-
However, in the process it turned out that on some configurations
824
+
However, in the process it turned out that in some configurations
825
825
(for example the CRI-O with cgroupv2, see:
826
826
[Add e2e_node test for oom killed container reason](https://github.com/kubernetes/kubernetes/pull/113205)),
827
827
the container's `reason` field is not set to `OOMKilled`.
828
828
- OOM killer might get invoked not only when container's limits are exceeded,
829
-
but also when the system is running low on memory. In such a scenario there
829
+
but also when the system is running low on memory. In such scenario there
830
830
can be race conditions in which both the `DisruptionTarget` condition and the
831
831
`ResourceExhausted`could be added.
832
832
833
833
Thus, we decide not to annotate the scenarios with the `ResourceExhausted`
834
834
condition. While there are not known issues with detection of the exceeding of
835
-
Pod's ephemeral storage limits we prefer to avoid future extension of the
835
+
Pod's ephemeral storage limits, we prefer to avoid future extension of the
836
836
semantics of the new condition type. Alternatively, we could introduce a pair of
837
837
dedicated pod condition types: `OOMKilled`and `EphemeralStorageLimitExceeded`.
838
838
This approach, however, could create an unnecessary proliferation of the pod
839
839
condition types.
840
840
841
841
Finally, we would like to first hear user feedback on the preferred approach
842
842
and also on how important it is to cover the resource limits exceeded scenarios.
843
-
We believe to be able to collect this feedback after users start to use the
844
-
feature - using job failure policies based on the `DisruptionTarget` condition
845
-
and container exit codes.
846
843
847
844
#### JobSpec API alternatives
848
845
@@ -996,7 +993,7 @@ When the failure is initiated by a component which deletes the pod, then the API
996
993
call to append the condition will be issued as a pod status update call
997
994
before the Pod delete request (not necessarily as the last update request before
998
995
the actual delete). For Kubelet, which does not delete the pod itself, the pod
999
-
condition is added in the same API request as to change the phase to failed.
996
+
condition is added in the same API request as the phase change to failed.
1000
997
This way the Job controller will be able to see the condition, and match it
1001
998
against the pod failure policy, when handling a failed pod.
1002
999
@@ -2067,6 +2064,19 @@ first iteration of the feature, we intend to provide a user-friendly API
2067
2064
targeting the known use-cases. A more flexible API can be considered as a future
2068
2065
improvement.
2069
2066
2067
+
### Possible future extensions
2068
+
2069
+
As one possible direction of extending the feature is adding pod failure
2070
+
conditions in the following scenarios (see links for discussions on the factors
0 commit comments