Skip to content

Commit e3f3fae

Browse files
mimowosoltysh
andcommitted
Cleanup the document by applying the CR remarks
Co-authored-by: Maciej Szulik <[email protected]>
1 parent 77d24ee commit e3f3fae

File tree

1 file changed

+32
-22
lines changed
  • keps/sig-apps/3329-retriable-and-non-retriable-failures

1 file changed

+32
-22
lines changed

keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,7 @@ tags, and then generate with `hack/update-toc.sh`.
146146
- [Using Pod status.reason field](#using-pod-statusreason-field)
147147
- [Using of various PodCondition types](#using-of-various-podcondition-types)
148148
- [More nodeAffinity-like JobSpec API](#more-nodeaffinity-like-jobspec-api)
149+
- [Possible future extensions](#possible-future-extensions)
149150
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
150151
<!-- /toc -->
151152

@@ -785,29 +786,28 @@ future if there is a good motivating use-case.
785786

786787
##### Admission failures
787788

788-
In some scenarios a pod admission failure could be restarted and is likely to
789-
succeed on another node (for example a pod scheduled to a node with resource
790-
pressure, see: [Pod admission error](#pod-admission-error)). However, in some
791-
situations isn't clear as the failure can be caused by incompatible pod and node
792-
configurations. As node configurations are often the same within a cluster it is
793-
likely that the pod would fail if restarted on any node in the cluster. In that
794-
case, adding `DisruptionTarget` condition could cause a never-ending loop of
795-
retries if the pod failure policy was configured to ignore such failures. Thus,
796-
we decide not to add any pod condition for such failures. If there is a
797-
sufficient motivating use-case a dedicated pod condition might be introduced to
798-
annotate some of the admission failure scenarios.
789+
In some scenarios a pod admission failure could result in a successful pod restart on another
790+
node (for example a pod scheduled to a node with resource pressure, see:
791+
[Pod admission error](#pod-admission-error)). However, in other situations it won't be as clear,
792+
since the failure can be caused by incompatible pod and node configurations. Node configurations
793+
are often the same within a cluster, so it is likely that the pod would fail if restarted on any
794+
other node in the cluster. In that case, adding `DisruptionTarget` condition could cause
795+
a never-ending loop of retries, if the pod failure policy was configured to ignore such failures.
796+
Given the above, we decide not to add any pod condition for such failures. If there is a sufficient
797+
motivating use-case, a dedicated pod condition might be introduced to annotate some of the
798+
admission failure scenarios.
799799

800800
##### Resource limits exceeded
801801

802-
A Pod failure-initiated by Kubelet can be caused by exceeding pod's
802+
A Pod failure initiated by Kubelet can be caused by exceeding pod's
803803
(or container's) resource (memory or ephemeral-storage) limits. We have considered
804804
(and prepared an initial implementation, see PR
805805
[Add ResourceExhausted pod condition for oom killer and exceeding of local storage limits](https://github.com/kubernetes/kubernetes/pull/113436))
806806
introduction of a dedicated Pod failure condition `ResourceExhausted` to
807807
annotate pod failures due to the above scenarios.
808808

809809
However, it turned out, that there are complications with detection of exceeding
810-
of memory limits:
810+
memory limits:
811811
- the approach we considered is to rely on the Out-Of-Memory (OOM) killer.
812812
In particular, we could detect that a pod was terminated due to OOM killer based
813813
on the container's `reason` field being equal to `OOMKilled`. This value is set
@@ -818,31 +818,28 @@ for event handling and
818818
for the constant definition)
819819
and CRI-O (see
820820
[here](https://github.com/cri-o/cri-o/blob/edf889bd277ae9a8aa699c354f12baaef3d9b71d/server/container_status.go#L88-L89)).
821-
- setting of the `reason` field to equal `OOMKilled` is not standardized. We have
822-
started an effort to standardize the handling of handling OOM killed containers
821+
- setting the `reason` field to `OOMKilled` is not standardized, either. We have
822+
started an effort to standardize the handling of OOM killed containers
823823
(see: [Documentation for the CRI API reason field to standardize the field for containers terminated by OOM killer](https://github.com/kubernetes/kubernetes/pull/112977)).
824-
However, in the process it turned out that on some configurations
824+
However, in the process it turned out that in some configurations
825825
(for example the CRI-O with cgroupv2, see:
826826
[Add e2e_node test for oom killed container reason](https://github.com/kubernetes/kubernetes/pull/113205)),
827827
the container's `reason` field is not set to `OOMKilled`.
828828
- OOM killer might get invoked not only when container's limits are exceeded,
829-
but also when the system is running low on memory. In such a scenario there
829+
but also when the system is running low on memory. In such scenario there
830830
can be race conditions in which both the `DisruptionTarget` condition and the
831831
`ResourceExhausted` could be added.
832832

833833
Thus, we decide not to annotate the scenarios with the `ResourceExhausted`
834834
condition. While there are not known issues with detection of the exceeding of
835-
Pod's ephemeral storage limits we prefer to avoid future extension of the
835+
Pod's ephemeral storage limits, we prefer to avoid future extension of the
836836
semantics of the new condition type. Alternatively, we could introduce a pair of
837837
dedicated pod condition types: `OOMKilled` and `EphemeralStorageLimitExceeded`.
838838
This approach, however, could create an unnecessary proliferation of the pod
839839
condition types.
840840

841841
Finally, we would like to first hear user feedback on the preferred approach
842842
and also on how important it is to cover the resource limits exceeded scenarios.
843-
We believe to be able to collect this feedback after users start to use the
844-
feature - using job failure policies based on the `DisruptionTarget` condition
845-
and container exit codes.
846843

847844
#### JobSpec API alternatives
848845

@@ -996,7 +993,7 @@ When the failure is initiated by a component which deletes the pod, then the API
996993
call to append the condition will be issued as a pod status update call
997994
before the Pod delete request (not necessarily as the last update request before
998995
the actual delete). For Kubelet, which does not delete the pod itself, the pod
999-
condition is added in the same API request as to change the phase to failed.
996+
condition is added in the same API request as the phase change to failed.
1000997
This way the Job controller will be able to see the condition, and match it
1001998
against the pod failure policy, when handling a failed pod.
1002999

@@ -2067,6 +2064,19 @@ first iteration of the feature, we intend to provide a user-friendly API
20672064
targeting the known use-cases. A more flexible API can be considered as a future
20682065
improvement.
20692066

2067+
### Possible future extensions
2068+
2069+
As one possible direction of extending the feature is adding pod failure
2070+
conditions in the following scenarios (see links for discussions on the factors
2071+
that made us not to cover the scenarios in Beta):
2072+
- [active deadline timeout exceeded](#active-deadline-timeout-exceeded)
2073+
- [admission failures](#admission-failures)
2074+
- [resource limits exceeded](#resource-limits-exceeded).
2075+
2076+
We are going to re-evaluate the decisions based on the user feedback after users
2077+
start to use the feature - using job failure policies based on the `DisruptionTarget` condition
2078+
and container exit codes.
2079+
20702080
<!--
20712081
What other approaches did you consider, and why did you rule them out? These do
20722082
not need to be as detailed as the proposal, but should include enough

0 commit comments

Comments
 (0)