You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/4603-tune-crashloopbackoff/README.md
+21-17Lines changed: 21 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -528,6 +528,10 @@ of ~5 QPS when deploying 110 mass crashing pods for our tests, even with
528
528
instantly crashing pods and intantaneously restarting CrashLoopBackOff behavior,
529
529
`/pods` API requests quickly normalized to ~2 QPS. In the same tests, runtime
530
530
CPU usage increased by x10 and the API server CPU usage increased by 2x.
531
+
<<[UNRESOLVED non blocking]If you were testing this on a small cluster without a lot of
532
+
additional load, the 2x increase in apiserver cpu usage is probably not a
533
+
particularly useful metric. Might be worth mentioning the raw numbers here
534
+
instead.>> <<[/UNRESOLVED]>>
531
535
532
536
For both of these changes, by passing these changes through the existing
533
537
SIG-scalability tests, while pursuing manual and more detailed periodic
@@ -587,8 +591,8 @@ excess restarts every 5 minutes after that; each crashing pod would be
587
591
contributing an excess of ~1550 pod state transition API requests, and fully
588
592
saturated node with a full 110 crashing pods would be adding 170,500 new pod
589
593
transition API requests every five minutes, which is an an excess of ~568
590
-
requests/10s. <<[!UNRESOLVED kubernetes default for the kubelet client rate
591
-
limit and how this changes by machine size]>> <<[UNRESOLVED]>>
594
+
requests/10s. <<[!UNRESOLVED non blocking: kubernetes default for the kubelet
595
+
client rate limit and how this changes by machine size]>> <<[UNRESOLVED]>>
592
596
593
597
594
598
## Design Details
@@ -866,7 +870,7 @@ behaviors common to all pod restarts"](code-diagram-for-restarts.png "Kubelet
866
870
and Container Runtime restart code paths")
867
871
868
872
```
869
-
<<[UNRESOLVED answer these question from original PR]>>
873
+
<<[UNRESOLVED non blocking answer these question from original PR or make new bugs]>>
870
874
>Does this [old container cleanup using containerd] include cleaning up the image filesystem? There might be room for some optimization here, if we can reuse the RO layers.
871
875
to answer question: looks like it is per runtime. need to check about leasees. also part of the value of this is to restart the sandbox.
872
876
```
@@ -1089,11 +1093,9 @@ extending the production code to implement this enhancement.
1089
1093
-->
1090
1094
1091
1095
1092
-
- <<[UNRESOLVED whats up with this]>>
1093
-
`kubelet/kuberuntime/kuberuntime_manager_test`: **could not find a successful
1096
+
- `kubelet/kuberuntime/kuberuntime_manager_test`: **could not find a successful
0 commit comments