Skip to content

Commit 21275df

Browse files
authored
Merge pull request #32638 from krmayankk/maxun
Add blog post for v1.24 maxUnavailable for StatefulSet
2 parents 6a89b98 + 5c29ba1 commit 21275df

File tree

1 file changed

+148
-0
lines changed

1 file changed

+148
-0
lines changed
Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
---
2+
layout: blog
3+
title: 'Kubernetes 1.24: Maximum Unavailable Replicas for StatefulSet'
4+
date: 2022-05-27
5+
slug: maxunavailable-for-statefulset
6+
---
7+
8+
**Author:** Mayank Kumar (Salesforce)
9+
10+
Kubernetes [StatefulSets](/docs/concepts/workloads/controllers/statefulset/), since their introduction in
11+
1.5 and becoming stable in 1.9, have been widely used to run stateful applications. They provide stable pod identity, persistent
12+
per pod storage and ordered graceful deployment, scaling and rolling updates. You can think of StatefulSet as the atomic building
13+
block for running complex stateful applications. As the use of Kubernetes has grown, so has the number of scenarios requiring
14+
StatefulSets. Many of these scenarios, require faster rolling updates than the currently supported one-pod-at-a-time updates, in the
15+
case where you're using the `OrderedReady` Pod management policy for a StatefulSet.
16+
17+
18+
Here are some examples:
19+
20+
- I am using a StatefulSet to orchestrate a multi-instance, cache based application where the size of the cache is large. The cache
21+
starts cold and requires some siginificant amount of time before the container can start. There could be more initial startup tasks
22+
that are required. A RollingUpdate on this StatefulSet would take a lot of time before the application is fully updated. If the
23+
StatefulSet supported updating more than one pod at a time, it would result in a much faster update.
24+
25+
- My stateful application is composed of leaders and followers or one writer and multiple readers. I have multiple readers or
26+
followers and my application can tolerate multiple pods going down at the same time. I want to update this application more than
27+
one pod at a time so that i get the new updates rolled out quickly, especially if the number of instances of my application are
28+
large. Note that my application still requires unique identity per pod.
29+
30+
31+
In order to support such scenarios, Kubernetes 1.24 includes a new alpha feature to help. Before you can use the new feature you must
32+
enable the `MaxUnavailableStatefulSet` feature flag. Once you enable that, you can specify a new field called `maxUnavailable`, part
33+
of the `spec` for a StatefulSet. For example:
34+
35+
```
36+
apiVersion: apps/v1
37+
kind: StatefulSet
38+
metadata:
39+
name: web
40+
namespace: default
41+
spec:
42+
podManagementPolicy: OrderedReady # you must set OrderedReady
43+
replicas: 5
44+
selector:
45+
matchLabels:
46+
app: nginx
47+
template:
48+
metadata:
49+
labels:
50+
app: nginx
51+
spec:
52+
containers:
53+
- image: k8s.gcr.io/nginx-slim:0.8
54+
imagePullPolicy: IfNotPresent
55+
name: nginx
56+
updateStrategy:
57+
rollingUpdate:
58+
maxUnavailable: 2 # this is the new alpha field, whose default value is 1
59+
partition: 0
60+
type: RollingUpdate
61+
```
62+
63+
If you enable the new feature and you don't specify a value for `maxUnavailable` in a StatefulSet, Kubernetes applies a default
64+
`maxUnavailable: 1`. This matches the behavior you would see if you don't enable the new feature.
65+
66+
I'll run through a scenario based on that example manifest to demonstrate how this feature works. I will deploy a StatefulSet that
67+
has 5 replicas, with `maxUnavailable` set to 2 and `partition` set to 0.
68+
69+
I can trigger a rolling update by changing the image to `k8s.gcr.io/nginx-slim:0.9`. Once I initiate the rolling update, I can
70+
watch the pods update 2 at a time as the current value of maxUnavailable is 2. The below output shows a span of time and is not
71+
complete. The maxUnavailable can be an absolute number (for example, 2) or a percentage of desired Pods (for example, 10%). The
72+
absolute number is calculated from percentage by rounding down.
73+
```
74+
kubectl get pods --watch
75+
```
76+
77+
```
78+
NAME READY STATUS RESTARTS AGE
79+
web-0 1/1 Running 0 85s
80+
web-1 1/1 Running 0 2m6s
81+
web-2 1/1 Running 0 106s
82+
web-3 1/1 Running 0 2m47s
83+
web-4 1/1 Running 0 2m27s
84+
web-4 1/1 Terminating 0 5m43s ----> start terminating 4
85+
web-3 1/1 Terminating 0 6m3s ----> start terminating 3
86+
web-3 0/1 Terminating 0 6m7s
87+
web-3 0/1 Pending 0 0s
88+
web-3 0/1 Pending 0 0s
89+
web-4 0/1 Terminating 0 5m48s
90+
web-4 0/1 Terminating 0 5m48s
91+
web-3 0/1 ContainerCreating 0 2s
92+
web-3 1/1 Running 0 2s
93+
web-4 0/1 Pending 0 0s
94+
web-4 0/1 Pending 0 0s
95+
web-4 0/1 ContainerCreating 0 0s
96+
web-4 1/1 Running 0 1s
97+
web-2 1/1 Terminating 0 5m46s ----> start terminating 2 (only after both 4 and 3 are running)
98+
web-1 1/1 Terminating 0 6m6s ----> start terminating 1
99+
web-2 0/1 Terminating 0 5m47s
100+
web-1 0/1 Terminating 0 6m7s
101+
web-1 0/1 Pending 0 0s
102+
web-1 0/1 Pending 0 0s
103+
web-1 0/1 ContainerCreating 0 1s
104+
web-1 1/1 Running 0 2s
105+
web-2 0/1 Pending 0 0s
106+
web-2 0/1 Pending 0 0s
107+
web-2 0/1 ContainerCreating 0 0s
108+
web-2 1/1 Running 0 1s
109+
web-0 1/1 Terminating 0 6m6s ----> start terminating 0 (only after 2 and 1 are running)
110+
web-0 0/1 Terminating 0 6m7s
111+
web-0 0/1 Pending 0 0s
112+
web-0 0/1 Pending 0 0s
113+
web-0 0/1 ContainerCreating 0 0s
114+
web-0 1/1 Running 0 1s
115+
```
116+
Note that as soon as the rolling update starts, both 4 and 3 (the two highest ordinal pods) start terminating at the same time. Pods
117+
with ordinal 4 and 3 may become ready at their own pace. As soon as both pods 4 and 3 are ready, pods 2 and 1 start terminating at the
118+
same time. When pods 2 and 1 are both running and ready, pod 0 starts terminating.
119+
120+
In Kubernetes, updates to StatefulSets follow a strict ordering when updating Pods. In this example, the update starts at replica 4, then
121+
replica 3, then replica 2, and so on, one pod at a time. When going one pod at a time, its not possible for 3 to be running and ready
122+
before 4. When `maxUnavailable` is more than 1 (in the example scenario I set `maxUnavailable` to 2), it is possible that replica 3 becomes
123+
ready and running before replica 4 is ready—and that is ok. If you're a developer and you set `maxUnavailable` to more than 1, you should
124+
know that this outcome is possible and you must ensure that your application is able to handle such ordering issues that occur
125+
if any. When you set `maxUnavailable` greater than 1, the ordering is guaranteed in between each batch of pods being updated. That guarantee
126+
means that pods in update batch 2 (replicas 2 and 1) cannot start updating until the pods from batch 0 (replicas 4 and 3) are ready.
127+
128+
Although Kubernetes refers to these as _replicas_, your stateful application may have a different view and each pod of the StatefulSet may
129+
be holding completely different data than other pods. The important thing here is that updates to StatefulSets happen in batches, and you can
130+
now have a batch size larger than 1 (as an alpha feature).
131+
132+
Also note, that the above behavior is with `podManagementPolicy: OrderedReady`. If you defined a StatefulSet as `podManagementPolicy: Parallel`,
133+
not only `maxUnavailable` number of replicas are terminated at the same time; `maxUnavailable` number of replicas start in `ContainerCreating`
134+
phase at the same time as well. This is called bursting.
135+
136+
So, now you may have a lot of questions about:-
137+
- What is the behavior when you set `podManagementPolicy: Parallel`?
138+
- What is the behavior when `partition` to a value other than `0`?
139+
140+
It might be better to try and see it for yourself. This is an alpha feature, and the Kubernetes contributors are looking for feedback on this feature. Did
141+
this help you achieve your stateful scenarios Did you find a bug or do you think the behavior as implemented is not intuitive or can
142+
break applications or catch them by surprise? Please [open an issue](https://github.com/kubernetes/kubernetes/issues) to let us know.
143+
144+
## Further reading and next steps {#next-steps}
145+
- [Maximum unavailable Pods](/docs/concepts/workloads/controllers/statefulset/#maximum-unavailable-pods)
146+
- [KEP for MaxUnavailable for StatefulSet](https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/961-maxunavailable-for-statefulset)
147+
- [Implementation](https://github.com/kubernetes/kubernetes/pull/82162/files)
148+
- [Enhancement Tracking Issue](https://github.com/kubernetes/enhancements/issues/961)

0 commit comments

Comments
 (0)