Skip to content

Commit 71427bc

Browse files
authored
add blog for how to speed up pod startup from kubelet side (#40156)
* add blog for how to speed up pod startup from kubelet side * rename blog to recent devs in kubelet to speed up pod startup and update according to comments * add pod resource limit related things that may be related to pod startup * add SELinux Relabeling with Mount Options feature * update per sftim's comment
1 parent 686eefe commit 71427bc

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.27: updates on speeding up Pod startup"
4+
date: 2023-05-15T00:00:00+0000
5+
slug: speed-up-pod-startup
6+
---
7+
8+
**Authors**: Paco Xu (DaoCloud), Sergey Kanzhelev (Google), Ruiwen Zhao (Google)
9+
10+
How can Pod start-up be accelerated on nodes in large clusters? This is a common issue that
11+
cluster administrators may face.
12+
13+
This blog post focuses on methods to speed up pod start-up from the kubelet side. It does not
14+
involve the creation time of pods by controller-manager through kube-apiserver, nor does it
15+
include scheduling time for pods or webhooks executed on it.
16+
17+
We have mentioned some important factors here to consider from the kubelet's perspective, but
18+
this is not an exhaustive list. As Kubernetes v1.27 is released, this blog highlights
19+
significant changes in v1.27 that aid in speeding up pod start-up.
20+
21+
## Parallel container image pulls
22+
23+
Pulling images always takes some time and what's worse is that image pulls are done serially by
24+
default. In other words, kubelet will send only one image pull request to the image service at
25+
a time. Other image pull requests have to wait until the one being processed is complete.
26+
27+
To enable parallel image pulls, set the `serializeImagePulls` field to false in the kubelet
28+
configuration. When `serializeImagePulls` is disabled, requests for image pulls are immediately
29+
sent to the image service and multiple images can be pulled concurrently.
30+
31+
### Maximum parallel image pulls will help secure your node from overloading on image pulling
32+
33+
We introduced a new feature in kubelet that sets a limit on the number of parallel image
34+
pulls at the node level. This limit restricts the maximum number of images that can be pulled
35+
simultaneously. If there is an image pull request beyond this limit, it will be blocked until
36+
one of the ongoing image pulls finishes. Before enabling this feature, please ensure that your
37+
container runtime's image service can handle parallel image pulls effectively.
38+
39+
To limit the number of simultaneous image pulls, you can configure the `maxParallelImagePulls`
40+
field in kubelet. By setting `maxParallelImagePulls` to a value of _n_, only _n_ images will
41+
be pulled concurrently. Any additional image pulls beyond this limit will wait until at least
42+
one ongoing pull is complete.
43+
44+
You can find more details in the associated KEP: [Kubelet limit of Parallel Image Pulls](https://kep.k8s.io/3673)
45+
(KEP-3673).
46+
47+
## Raised default API query-per-second limits for kubelet
48+
49+
To improve pod startup in scenarios with multiple pods on a node, particularly sudden scaling
50+
situations, it is necessary for Kubelet to synchronize the pod status and prepare configmaps,
51+
secrets, or volumes. This requires a large bandwidth to access kube-apiserver.
52+
53+
In versions prior to v1.27, the default `kubeAPIQPS` was 5 and `kubeAPIBurst` was 10. However,
54+
the kubelet in v1.27 has increased these defaults to 50 and 100 respectively for better performance during
55+
pod startup. It's worth noting that this isn't the only reason why we've bumped up the API QPS
56+
limits for Kubelet.
57+
58+
1. It has a potential to be hugely throttled now (default QPS = 5)
59+
2. In large clusters they can generate significant load anyway as there are a lot of them
60+
3. They have a dedicated PriorityLevel and FlowSchema that we can easily control
61+
62+
Previously, we often encountered `volume mount timeout` on kubelet in node with more than 50 pods
63+
during pod start up. We suggest that cluster operators bump `kubeAPIQPS` to 20 and `kubeAPIBurst` to 40,
64+
especially if using bare metal nodes.
65+
66+
More detials can be found in the KEP <https://kep.k8s.io/1040> and the pull request [#116121](https://github.com/kubernetes/kubernetes/pull/116121).
67+
68+
## Event triggered updates to container status
69+
70+
`Evented PLEG` (PLEG is short for "Pod Lifecycle Event Generator") is set to be in beta for v1.27,
71+
Kubernetes offers two ways for the kubelet to detect Pod lifecycle events, such as a the last
72+
process in a container shutting down.
73+
In Kubernetes v1.27, the _event based_ mechanism has graduated to beta but remains
74+
disabled by default. If you do explicitly switch to event-based lifecycle change detection,
75+
the kubelet is able to start Pods more quickly than with the default approach that relies on polling.
76+
The default mechanism, polling for lifecycle changes, adds a noticeable overhead; this affects
77+
the kubelet's ability to handle different tasks in parallel, and leads to poor performance and
78+
reliability issues. For these reasons, we recommend that you switch your nodes to use
79+
event-based pod lifecycle change detection.
80+
81+
Further details can be found in the KEP <https://kep.k8s.io/3386> and
82+
[Switching From Polling to CRI Event-based Updates to Container Status](/docs/tasks/administer-cluster/switch-to-evented-pleg/).
83+
84+
## Raise your pod resource limit if needed
85+
86+
During start-up, some pods may consume a considerable amount of CPU or memory. If the CPU limit is
87+
low, this can significantly slow down the pod start-up process. To improve the memory management,
88+
Kubernetes v1.22 introduced a feature gate called MemoryQoS to kubelet. This feature enables
89+
kubelet to set memory QoS at container, pod, and QoS levels for better protection and guaranteed
90+
quality of memory when running with cgroups v2. Although it has benefits, it is possible that
91+
enabling this feature gate may affect the start-up speed of the pod if the pod startup consumes
92+
a large amount of memory.
93+
94+
Kubelet configuration now includes `memoryThrottlingFactor`. This factor is multiplied by
95+
the memory limit or node allocatable memory to set the cgroupv2 memory.high value for enforcing
96+
MemoryQoS. Decreasing this factor sets a lower high limit for container cgroups, increasing reclaim
97+
pressure. Increasing this factor will put less reclaim pressure. The default value is 0.8 initially
98+
and will change to 0.9 in Kubernetes v1.27. This parameter adjustment can reduce the potential
99+
impact of this feature on pod startup speed.
100+
101+
Further details can be found in the KEP <https://kep.k8s.io/2570>.
102+
103+
## What's more?
104+
105+
In Kubernetes v1.26, a new histogram metric `pod_start_sli_duration_seconds` was added for Pod
106+
startup latency SLI/SLO details. Additionally, the kubelet log will now display more information
107+
about pod start-related timestamps, as shown below:
108+
109+
> Dec 30 15:33:13.375379 e2e-022435249c-674b9-minion-group-gdj4 kubelet[8362]: I1230 15:33:13.375359 8362 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/konnectivity-agent-gnc9k" podStartSLOduration=-9.223372029479458e+09 pod.CreationTimestamp="2022-12-30 15:33:06 +0000 UTC" firstStartedPulling="2022-12-30 15:33:09.258791695 +0000 UTC m=+13.029631711" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2022-12-30 15:33:13.375009262 +0000 UTC m=+17.145849275" watchObservedRunningTime="2022-12-30 15:33:13.375317944 +0000 UTC m=+17.146157970"
110+
111+
The SELinux Relabeling with Mount Options feature moved to Beta in v1.27. This feature speeds up
112+
container startup by mounting volumes with the correct SELinux label instead of changing each file
113+
on the volumes recursively. Further details can be found in the KEP <https://kep.k8s.io/1710>.
114+
115+
To identify the cause of slow pod startup, analyzing metrics and logs can be helpful. Other
116+
factorsthat may impact pod startup include container runtime, disk speed, CPU and memory
117+
resources on the node.
118+
119+
SIG Node is responsible for ensuring fast Pod startup times, while addressing issues in large
120+
clusters falls under the purview of SIG Scalability as well.

0 commit comments

Comments
 (0)