Skip to content

Commit df7d75f

Browse files
authored
Merge pull request #37418 from pwschuurman/kep-3335-blog
Add 1.27 feature blog article about StatefulSet Start Ordinals (KEP-3335)
2 parents 8e6f2ee + f71a862 commit df7d75f

File tree

1 file changed

+223
-0
lines changed

1 file changed

+223
-0
lines changed
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.27: StatefulSet Start Ordinal Simplifies Migration"
4+
date: 2023-04-28
5+
slug: statefulset-start-ordinal
6+
---
7+
8+
**Author**: Peter Schuurman (Google)
9+
10+
Kubernetes v1.26 introduced a new, alpha-level feature for
11+
[StatefulSets](/docs/concepts/workloads/controllers/statefulset/) that controls
12+
the ordinal numbering of Pod replicas. As of Kubernetes v1.27, this feature is
13+
now beta. Ordinals can start from arbitrary
14+
non-negative numbers. This blog post will discuss how this feature can be
15+
used.
16+
17+
## Background
18+
19+
StatefulSets ordinals provide sequential identities for pod replicas. When using
20+
[`OrderedReady` Pod management](/docs/tutorials/stateful-application/basic-stateful-set/#orderedready-pod-management)
21+
Pods are created from ordinal index `0` up to `N-1`.
22+
23+
With Kubernetes today, orchestrating a StatefulSet migration across clusters is
24+
challenging. Backup and restore solutions exist, but these require the
25+
application to be scaled down to zero replicas prior to migration. In today's
26+
fully connected world, even planned application downtime may not allow you to
27+
meet your business goals. You could use
28+
[Cascading Delete](/docs/tutorials/stateful-application/basic-stateful-set/#cascading-delete)
29+
or
30+
[On Delete](/docs/tutorials/stateful-application/basic-stateful-set/#on-delete)
31+
to migrate individual pods, however this is error prone and tedious to manage.
32+
You lose the self-healing benefit of the StatefulSet controller when your Pods
33+
fail or are evicted.
34+
35+
Kubernetes v1.26 enables a StatefulSet to be responsible for a range of ordinals
36+
within a range {0..N-1} (the ordinals 0, 1, ... up to N-1).
37+
With it, you can scale down a range
38+
{0..k-1} in a source cluster, and scale up the complementary range {k..N-1}
39+
in a destination cluster, while maintaining application availability. This
40+
enables you to retain *at most one* semantics (meaning there is at most one Pod
41+
with a given identity running in a StatefulSet) and
42+
[Rolling Update](/docs/tutorials/stateful-application/basic-stateful-set/#rolling-update)
43+
behavior when orchestrating a migration across clusters.
44+
45+
## Why would I want to use this feature?
46+
47+
Say you're running your StatefulSet in one cluster, and need to migrate it out
48+
to a different cluster. There are many reasons why you would need to do this:
49+
* **Scalability**: Your StatefulSet has scaled too large for your cluster, and
50+
has started to disrupt the quality of service for other workloads in your
51+
cluster.
52+
* **Isolation**: You're running a StatefulSet in a cluster that is accessed
53+
by multiple users, and namespace isolation isn't sufficient.
54+
* **Cluster Configuration**: You want to move your StatefulSet to a different
55+
cluster to use some environment that is not available on your current
56+
cluster.
57+
* **Control Plane Upgrades**: You want to move your StatefulSet to a cluster
58+
running an upgraded control plane, and can't handle the risk or downtime of
59+
in-place control plane upgrades.
60+
61+
## How do I use it?
62+
63+
Enable the `StatefulSetStartOrdinal` feature gate on a cluster, and create a
64+
StatefulSet with a customized `.spec.ordinals.start`.
65+
66+
## Try it out
67+
68+
In this demo, I'll use the new mechanism to migrate a
69+
StatefulSet from one Kubernetes cluster to another. The
70+
[redis-cluster](https://github.com/bitnami/charts/tree/main/bitnami/redis-cluster)
71+
Bitnami Helm chart will be used to install Redis.
72+
73+
Tools Required:
74+
* [yq](https://github.com/mikefarah/yq)
75+
* [helm](https://helm.sh/docs/helm/helm_install/)
76+
77+
### Pre-requisites {#demo-pre-requisites}
78+
79+
To do this, I need two Kubernetes clusters that can both access common
80+
networking and storage; I've named my clusters `source` and `destination`.
81+
Specifically, I need:
82+
83+
* The `StatefulSetStartOrdinal` feature gate enabled on both clusters.
84+
* Client configuration for `kubectl` that lets me access both clusters as an
85+
administrator.
86+
* The same `StorageClass` installed on both clusters, and set as the default
87+
StorageClass for both clusters. This `StorageClass` should provision
88+
underlying storage that is accessible from either or both clusters.
89+
* A flat network topology that allows for pods to send and receive packets to
90+
and from Pods in either clusters. If you are creating clusters on a cloud
91+
provider, this configuration may be called private cloud or private network.
92+
93+
1. Create a demo namespace on both clusters:
94+
95+
```
96+
kubectl create ns kep-3335
97+
```
98+
99+
2. Deploy a Redis cluster with six replicas in the source cluster:
100+
101+
```
102+
helm repo add bitnami https://charts.bitnami.com/bitnami
103+
helm install redis --namespace kep-3335 \
104+
bitnami/redis-cluster \
105+
--set persistence.size=1Gi \
106+
--set cluster.nodes=6
107+
```
108+
109+
3. Check the replication status in the source cluster:
110+
111+
```
112+
kubectl exec -it redis-redis-cluster-0 -- /bin/bash -c \
113+
"redis-cli -c -h redis-redis-cluster -a $(kubectl get secret redis-redis-cluster -o jsonpath="{.data.redis-password}" | base64 -d) CLUSTER NODES;"
114+
```
115+
116+
```
117+
2ce30362c188aabc06f3eee5d92892d95b1da5c3 10.104.0.14:6379@16379 myself,master - 0 1669764411000 3 connected 10923-16383
118+
7743661f60b6b17b5c71d083260419588b4f2451 10.104.0.16:6379@16379 slave 2ce30362c188aabc06f3eee5d92892d95b1da5c3 0 1669764410000 3 connected
119+
961f35e37c4eea507cfe12f96e3bfd694b9c21d4 10.104.0.18:6379@16379 slave a8765caed08f3e185cef22bd09edf409dc2bcc61 0 1669764411000 1 connected
120+
7136e37d8864db983f334b85d2b094be47c830e5 10.104.0.15:6379@16379 slave 2cff613d763b22c180cd40668da8e452edef3fc8 0 1669764412595 2 connected
121+
a8765caed08f3e185cef22bd09edf409dc2bcc61 10.104.0.19:6379@16379 master - 0 1669764411592 1 connected 0-5460
122+
2cff613d763b22c180cd40668da8e452edef3fc8 10.104.0.17:6379@16379 master - 0 1669764410000 2 connected 5461-10922
123+
```
124+
125+
4. Deploy a Redis cluster with zero replicas in the destination cluster:
126+
127+
```
128+
helm install redis --namespace kep-3335 \
129+
bitnami/redis-cluster \
130+
--set persistence.size=1Gi \
131+
--set cluster.nodes=0 \
132+
--set redis.extraEnvVars\[0\].name=REDIS_NODES,redis.extraEnvVars\[0\].value="redis-redis-cluster-headless.kep-3335.svc.cluster.local" \
133+
--set existingSecret=redis-redis-cluster
134+
```
135+
136+
5. Scale down the `redis-redis-cluster` StatefulSet in the source cluster by 1,
137+
to remove the replica `redis-redis-cluster-5`:
138+
139+
```
140+
kubectl patch sts redis-redis-cluster -p '{"spec": {"replicas": 5}}'
141+
```
142+
143+
6. Migrate dependencies from the source cluster to the destination cluster:
144+
145+
The following commands copy resources from `source` to `destionation`. Details
146+
that are not relevant in `destination` cluster are removed (eg: `uid`,
147+
`resourceVersion`, `status`).
148+
149+
**Steps for the source cluster**
150+
151+
Note: If using a `StorageClass` with `reclaimPolicy: Delete` configured, you
152+
should patch the PVs in `source` with `reclaimPolicy: Retain` prior to
153+
deletion to retain the underlying storage used in `destination`. See
154+
[Change the Reclaim Policy of a PersistentVolume](/docs/tasks/administer-cluster/change-pv-reclaim-policy/)
155+
for more details.
156+
157+
```
158+
kubectl get pvc redis-data-redis-redis-cluster-5 -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion, .metadata.annotations, .metadata.finalizers, .status)' > /tmp/pvc-redis-data-redis-redis-cluster-5.yaml
159+
kubectl get pv $(yq '.spec.volumeName' /tmp/pvc-redis-data-redis-redis-cluster-5.yaml) -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion, .metadata.annotations, .metadata.finalizers, .spec.claimRef, .status)' > /tmp/pv-redis-data-redis-redis-cluster-5.yaml
160+
kubectl get secret redis-redis-cluster -o yaml | yq 'del(.metadata.uid, .metadata.resourceVersion)' > /tmp/secret-redis-redis-cluster.yaml
161+
```
162+
163+
**Steps for the destination cluster**
164+
165+
Note: For the PV/PVC, this procedure only works if the underlying storage system
166+
that your PVs use can support being copied into `destination`. Storage
167+
that is associated with a specific node or topology may not be supported.
168+
Additionally, some storage systems may store addtional metadata about
169+
volumes outside of a PV object, and may require a more specialized
170+
sequence to import a volume.
171+
172+
```
173+
kubectl create -f /tmp/pv-redis-data-redis-redis-cluster-5.yaml
174+
kubectl create -f /tmp/pvc-redis-data-redis-redis-cluster-5.yaml
175+
kubectl create -f /tmp/secret-redis-redis-cluster.yaml
176+
```
177+
178+
7. Scale up the `redis-redis-cluster` StatefulSet in the destination cluster by
179+
1, with a start ordinal of 5:
180+
181+
```
182+
kubectl patch sts redis-redis-cluster -p '{"spec": {"ordinals": {"start": 5}, "replicas": 1}}'
183+
```
184+
185+
8. Check the replication status in the destination cluster:
186+
187+
```
188+
kubectl exec -it redis-redis-cluster-5 -- /bin/bash -c \
189+
"redis-cli -c -h redis-redis-cluster -a $(kubectl get secret redis-redis-cluster -o jsonpath="{.data.redis-password}" | base64 -d) CLUSTER NODES;"
190+
```
191+
192+
I should see that the new replica (labeled `myself`) has joined the Redis
193+
cluster (the IP address belongs to a different CIDR block than the
194+
replicas in the source cluster).
195+
196+
```
197+
2cff613d763b22c180cd40668da8e452edef3fc8 10.104.0.17:6379@16379 master - 0 1669766684000 2 connected 5461-10922
198+
7136e37d8864db983f334b85d2b094be47c830e5 10.108.0.22:6379@16379 myself,slave 2cff613d763b22c180cd40668da8e452edef3fc8 0 1669766685609 2 connected
199+
2ce30362c188aabc06f3eee5d92892d95b1da5c3 10.104.0.14:6379@16379 master - 0 1669766684000 3 connected 10923-16383
200+
961f35e37c4eea507cfe12f96e3bfd694b9c21d4 10.104.0.18:6379@16379 slave a8765caed08f3e185cef22bd09edf409dc2bcc61 0 1669766683600 1 connected
201+
a8765caed08f3e185cef22bd09edf409dc2bcc61 10.104.0.19:6379@16379 master - 0 1669766685000 1 connected 0-5460
202+
7743661f60b6b17b5c71d083260419588b4f2451 10.104.0.16:6379@16379 slave 2ce30362c188aabc06f3eee5d92892d95b1da5c3 0 1669766686613 3 connected
203+
```
204+
205+
9. Repeat steps #5 to #7 for the remainder of the replicas, until the
206+
Redis StatefulSet in the source cluster is scaled to 0, and the Redis
207+
StatefulSet in the destination cluster is healthy with 6 total replicas.
208+
209+
## What's Next?
210+
211+
This feature provides a building block for a StatefulSet to be split up across
212+
clusters, but does not prescribe the mechanism as to how the StatefulSet should
213+
be migrated. Migration requires coordination of StatefulSet replicas, along with
214+
orchestration of the storage and network layer. This is dependent on the storage
215+
and connectivity requirements of the application installed by the StatefulSet.
216+
Additionally, many StatefulSets are managed by
217+
[operators](/docs/concepts/extend-kubernetes/operator/), which adds another
218+
layer of complexity to migration.
219+
220+
If you're interested in building enhancements to make these processes easier,
221+
get involved with
222+
[SIG Multicluster](https://github.com/kubernetes/community/blob/master/sig-multicluster)
223+
to contribute!

0 commit comments

Comments
 (0)