kubernetes
diff --git a/‎keps/prod-readiness/sig-node/5593.yaml‎
Lines changed: 5 additions & 0 deletions b/‎keps/prod-readiness/sig-node/5593.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎keps/sig-node/4603-tune-crashloopbackoff/README.md‎
Lines changed: 19 additions & 20 deletions b/‎keps/sig-node/4603-tune-crashloopbackoff/README.md‎
Lines changed: 19 additions & 20 deletions
@@ -0,0 +1,5 @@
+kep-number: 5593
+alpha:
+  approver: "@soltysh"
+beta:
+  approver: "@soltysh"
@@ -219,7 +219,7 @@ are considered too conservative, especially in cases where the exit code was 0
 (Success) and the pod is transitioned into a "Completed" state or the expected
 length of the pod run is less than 10 minutes.
 
-This KEP proposes the following changes:
+This KEP proposes the following change:
 * Provide an alpha-gated change to get feedback and periodic scalability tests
   on changes to the global initial backoff to 1s and maximum backoff to 1 minute
 
@@ -228,6 +228,10 @@ CrashLoopBackOffBehavior of today, with the proposed new default, and with the
 proposed minimum per node configuration](./restarts-vs-elapsed-all.png "KEP-4603
 CrashLoopBackoff proposal comparison")
 
+Originally, this KEP included a proposal to lower the maximum CrashLoopBackOff
+duration. This has been split into [KEP-5593: Configure the max CrashLoopBackOff
+delay](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5593-configure-the-max-crashloopbackoff-delay).
+
 ## Motivation
 
 <!--
@@ -488,24 +492,15 @@ additional load, the 2x increase in apiserver cpu usage is probably not a
 particularly useful metric. Might be worth mentioning the raw numbers here
 instead.>> <<[/UNRESOLVED]>>
 
-For both of these changes, by passing these changes through the existing
-SIG-scalability tests, while pursuing manual and more detailed periodic
-benchmarking during the alpha period, we can increase the confidence in the
-changes and explore the possibility of reducing the values further in the
-future.
+By passing the proposed changes through the existing SIG-scalability tests,
+while pursuing manual and more detailed periodic benchmarking during the alpha
+period, we can increase the confidence in the changes and explore the
+possibility of reducing the values further in the future.
 
 In the meantime, during alpha, naturally the first line of defense is that the
-enhancements, even the reduced "default" baseline curve for CrashLoopBackoff,
-are not usable by default and must be opted into. In this specific case they are
-opted into separately with different alpha feature gates, so clusters will only
-be affected by each risk if the cluster operator enables the new features during
-the alpha period.
-
-Beyond this, there are two main mitigations during alpha: conservativism in
-changes to the default behavior based on prior stress testing, and limiting any
-further overrides to be opt-in per Node, and only by users with the permissions
-to modify the kubelet configuration -- in other words, a cluster operator
-persona.
+enhancement is not usable by default and must be opted into. Further mitigation
+is conservativism in changes to the default behavior based on prior stress
+testing.
 
 The alpha changes to the _default_ backoff curve were chosen because they meet
 emerging use cases and user sentiment from the canonical feature request issue
@@ -1549,8 +1544,11 @@ Think about adding additional work or introducing new steps in between
 Maybe! As containers will be restarting more, this may affect "Startup latency
 of schedulable stateless pods", "Startup latency of schedule stateful pods".
 This is directly the type of SLI impact that a) the split between the default
-behavior change and the per node opt in is trying to mitigate, and b) one of the
-targets of the benchmarking period during alpha.
+behavior change and the per node max CrashLoopBackOff delay configuration
+proposed in [KEP-5593 - Configure the max CrashLoopBackOff
+delay](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5593-configure-the-max-crashloopbackoff-delay/README.md)
+is trying to mitigate, and b) one of the targets of the benchmarking period
+during alpha.
 
 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
 
@@ -1569,7 +1567,8 @@ initial manual benchmarking tests, CPU usage of kubelet increased 2x on nodes
 saturated with 110 instantly crashing single-container pods. During the alpha
 benchmarking period, we will be quantifying that amount in fully and partially
 saturated nodes with both the new default backoff curve and the minimum per node
-backoff curve.
+backoff curve proposed in [KEP-5593 - Configure the max CrashLoopBackOff
+delay](https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/5593-configure-the-max-crashloopbackoff-delay/README.md).
 
 ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?