Skip to content

Commit ee72b80

Browse files
committed
Transfer other comments
Signed-off-by: Laura Lorenz <[email protected]>
1 parent 767def9 commit ee72b80

File tree

1 file changed

+22
-2
lines changed
  • keps/sig-node/4603-tune-crashloopbackoff

1 file changed

+22
-2
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,7 @@ know that this has succeeded?
279279
node stability
280280
* Provide a simple UX that does not require changes for the majority of
281281
workloads
282+
* <<[UNRESOLVED]>> Must work for Jobs and sidecar containers <<[/UNRESOLVED]>>
282283

283284
### Non-Goals
284285

@@ -610,9 +611,26 @@ does during pod restarts.
610611
> What conditions lead to a re-download of an image? I wonder if we can eliminate this, or if that's too much of a behavior change.
611612
> Similar question for image downloads. Although in this case, I think the kubelet should have an informer for any secrets or configmaps used, so it should just pull from cache. Is that true for EnvVarFrom values?
612613
>Does this [old container cleanup using containerd] include cleaning up the image filesystem? There might be room for some optimization here, if we can reuse the RO layers.
614+
613615
<<[/UNRESOLVED]>>
614616
```
615617

618+
```
619+
<<[UNRESOLVED>>
620+
It's because of the way we handle backoff:
621+
https://github.com/kubernetes/kubernetes/blob/a7ca13ea29ba5b3c91fd293cdbaec8fb5b30cee2/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1336-L1349
622+
623+
So the first time the container exits, there is no backoff delay recorded, but then it adds a backoff key at line 1348.
624+
625+
So the actual (current) backoff implementation is:
626+
627+
0 seconds delay for first restart
628+
10 seconds for second restart
629+
10 * 2^(restart_count - 2) for subsequent restarts
630+
But those numbers are all delayed up to 10s due to kubernetes/kubernetes#123602
631+
<<[UNRESOLVED>>
632+
```
633+
616634
### Benchmarking
617635

618636
Again, let it be known that by definition this KEP will cause pods to restart
@@ -628,7 +646,8 @@ terminating pods, and crashing pods whose `restartPolicy: Always`:
628646
* what is the load and rate of Pod restart related API requests to the API
629647
server?
630648
* what are the performance (memory, CPU, and pod start latency) effects on the
631-
kubelet component?
649+
kubelet component? Considering the effects of different plugins (e.g. CSI,
650+
CNI)
632651

633652
Today there are alpha SLIs in Kubernetes that can observe that impact in
634653
aggregate:
@@ -825,7 +844,8 @@ heterogenity between "Succeeded" terminating pods, and crashing pods whose
825844
* what is the load and rate of Pod restart related API requests to the API
826845
server?
827846
* what are the performance (memory, CPU, and pod start latency) effects on the
828-
kubelet component?
847+
kubelet component? Considering the effects of different plugins (e.g. CSI,
848+
CNI)
829849

830850
### Graduation Criteria
831851

0 commit comments

Comments
 (0)