|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.27: updates on speeding up Pod startup" |
| 4 | +date: 2023-05-15T00:00:00+0000 |
| 5 | +slug: speed-up-pod-startup |
| 6 | +--- |
| 7 | + |
| 8 | +**Authors**: Paco Xu (DaoCloud), Sergey Kanzhelev (Google), Ruiwen Zhao (Google) |
| 9 | + |
| 10 | +How can Pod start-up be accelerated on nodes in large clusters? This is a common issue that |
| 11 | +cluster administrators may face. |
| 12 | + |
| 13 | +This blog post focuses on methods to speed up pod start-up from the kubelet side. It does not |
| 14 | +involve the creation time of pods by controller-manager through kube-apiserver, nor does it |
| 15 | +include scheduling time for pods or webhooks executed on it. |
| 16 | + |
| 17 | +We have mentioned some important factors here to consider from the kubelet's perspective, but |
| 18 | +this is not an exhaustive list. As Kubernetes v1.27 is released, this blog highlights |
| 19 | +significant changes in v1.27 that aid in speeding up pod start-up. |
| 20 | + |
| 21 | +## Parallel container image pulls |
| 22 | + |
| 23 | +Pulling images always takes some time and what's worse is that image pulls are done serially by |
| 24 | +default. In other words, kubelet will send only one image pull request to the image service at |
| 25 | +a time. Other image pull requests have to wait until the one being processed is complete. |
| 26 | + |
| 27 | +To enable parallel image pulls, set the `serializeImagePulls` field to false in the kubelet |
| 28 | +configuration. When `serializeImagePulls` is disabled, requests for image pulls are immediately |
| 29 | +sent to the image service and multiple images can be pulled concurrently. |
| 30 | + |
| 31 | +### Maximum parallel image pulls will help secure your node from overloading on image pulling |
| 32 | + |
| 33 | +We introduced a new feature in kubelet that sets a limit on the number of parallel image |
| 34 | +pulls at the node level. This limit restricts the maximum number of images that can be pulled |
| 35 | +simultaneously. If there is an image pull request beyond this limit, it will be blocked until |
| 36 | +one of the ongoing image pulls finishes. Before enabling this feature, please ensure that your |
| 37 | +container runtime's image service can handle parallel image pulls effectively. |
| 38 | + |
| 39 | +To limit the number of simultaneous image pulls, you can configure the `maxParallelImagePulls` |
| 40 | +field in kubelet. By setting `maxParallelImagePulls` to a value of _n_, only _n_ images will |
| 41 | +be pulled concurrently. Any additional image pulls beyond this limit will wait until at least |
| 42 | +one ongoing pull is complete. |
| 43 | + |
| 44 | +You can find more details in the associated KEP: [Kubelet limit of Parallel Image Pulls](https://kep.k8s.io/3673) |
| 45 | + (KEP-3673). |
| 46 | + |
| 47 | +## Raised default API query-per-second limits for kubelet |
| 48 | + |
| 49 | +To improve pod startup in scenarios with multiple pods on a node, particularly sudden scaling |
| 50 | +situations, it is necessary for Kubelet to synchronize the pod status and prepare configmaps, |
| 51 | +secrets, or volumes. This requires a large bandwidth to access kube-apiserver. |
| 52 | + |
| 53 | +In versions prior to v1.27, the default `kubeAPIQPS` was 5 and `kubeAPIBurst` was 10. However, |
| 54 | +the kubelet in v1.27 has increased these defaults to 50 and 100 respectively for better performance during |
| 55 | +pod startup. It's worth noting that this isn't the only reason why we've bumped up the API QPS |
| 56 | +limits for Kubelet. |
| 57 | + |
| 58 | +1. It has a potential to be hugely throttled now (default QPS = 5) |
| 59 | +2. In large clusters they can generate significant load anyway as there are a lot of them |
| 60 | +3. They have a dedicated PriorityLevel and FlowSchema that we can easily control |
| 61 | + |
| 62 | +Previously, we often encountered `volume mount timeout` on kubelet in node with more than 50 pods |
| 63 | +during pod start up. We suggest that cluster operators bump `kubeAPIQPS` to 20 and `kubeAPIBurst` to 40, |
| 64 | + especially if using bare metal nodes. |
| 65 | + |
| 66 | +More detials can be found in the KEP <https://kep.k8s.io/1040> and the pull request [#116121](https://github.com/kubernetes/kubernetes/pull/116121). |
| 67 | + |
| 68 | +## Event triggered updates to container status |
| 69 | + |
| 70 | +`Evented PLEG` (PLEG is short for "Pod Lifecycle Event Generator") is set to be in beta for v1.27, |
| 71 | +Kubernetes offers two ways for the kubelet to detect Pod lifecycle events, such as a the last |
| 72 | +process in a container shutting down. |
| 73 | +In Kubernetes v1.27, the _event based_ mechanism has graduated to beta but remains |
| 74 | +disabled by default. If you do explicitly switch to event-based lifecycle change detection, |
| 75 | +the kubelet is able to start Pods more quickly than with the default approach that relies on polling. |
| 76 | +The default mechanism, polling for lifecycle changes, adds a noticeable overhead; this affects |
| 77 | +the kubelet's ability to handle different tasks in parallel, and leads to poor performance and |
| 78 | +reliability issues. For these reasons, we recommend that you switch your nodes to use |
| 79 | +event-based pod lifecycle change detection. |
| 80 | + |
| 81 | +Further details can be found in the KEP <https://kep.k8s.io/3386> and |
| 82 | +[Switching From Polling to CRI Event-based Updates to Container Status](/docs/tasks/administer-cluster/switch-to-evented-pleg/). |
| 83 | + |
| 84 | +## Raise your pod resource limit if needed |
| 85 | + |
| 86 | +During start-up, some pods may consume a considerable amount of CPU or memory. If the CPU limit is |
| 87 | +low, this can significantly slow down the pod start-up process. To improve the memory management, |
| 88 | +Kubernetes v1.22 introduced a feature gate called MemoryQoS to kubelet. This feature enables |
| 89 | +kubelet to set memory QoS at container, pod, and QoS levels for better protection and guaranteed |
| 90 | +quality of memory when running with cgroups v2. Although it has benefits, it is possible that |
| 91 | +enabling this feature gate may affect the start-up speed of the pod if the pod startup consumes |
| 92 | +a large amount of memory. |
| 93 | + |
| 94 | +Kubelet configuration now includes `memoryThrottlingFactor`. This factor is multiplied by |
| 95 | +the memory limit or node allocatable memory to set the cgroupv2 memory.high value for enforcing |
| 96 | +MemoryQoS. Decreasing this factor sets a lower high limit for container cgroups, increasing reclaim |
| 97 | +pressure. Increasing this factor will put less reclaim pressure. The default value is 0.8 initially |
| 98 | +and will change to 0.9 in Kubernetes v1.27. This parameter adjustment can reduce the potential |
| 99 | +impact of this feature on pod startup speed. |
| 100 | + |
| 101 | +Further details can be found in the KEP <https://kep.k8s.io/2570>. |
| 102 | + |
| 103 | +## What's more? |
| 104 | + |
| 105 | +In Kubernetes v1.26, a new histogram metric `pod_start_sli_duration_seconds` was added for Pod |
| 106 | +startup latency SLI/SLO details. Additionally, the kubelet log will now display more information |
| 107 | +about pod start-related timestamps, as shown below: |
| 108 | + |
| 109 | +> Dec 30 15:33:13.375379 e2e-022435249c-674b9-minion-group-gdj4 kubelet[8362]: I1230 15:33:13.375359 8362 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/konnectivity-agent-gnc9k" podStartSLOduration=-9.223372029479458e+09 pod.CreationTimestamp="2022-12-30 15:33:06 +0000 UTC" firstStartedPulling="2022-12-30 15:33:09.258791695 +0000 UTC m=+13.029631711" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2022-12-30 15:33:13.375009262 +0000 UTC m=+17.145849275" watchObservedRunningTime="2022-12-30 15:33:13.375317944 +0000 UTC m=+17.146157970" |
| 110 | +
|
| 111 | +The SELinux Relabeling with Mount Options feature moved to Beta in v1.27. This feature speeds up |
| 112 | +container startup by mounting volumes with the correct SELinux label instead of changing each file |
| 113 | +on the volumes recursively. Further details can be found in the KEP <https://kep.k8s.io/1710>. |
| 114 | + |
| 115 | +To identify the cause of slow pod startup, analyzing metrics and logs can be helpful. Other |
| 116 | +factorsthat may impact pod startup include container runtime, disk speed, CPU and memory |
| 117 | +resources on the node. |
| 118 | + |
| 119 | +SIG Node is responsible for ensuring fast Pod startup times, while addressing issues in large |
| 120 | +clusters falls under the purview of SIG Scalability as well. |
0 commit comments