Skip to content

Commit c1386ab

Browse files
committed
scripts: add migration script from public operator to cloud operator.
Check in a reference implementation for migrating from statesets managed by the public operator to the cloud operator. Note that this process involves some manual steps, and we may want to automate and test it further.
1 parent ebd8f6f commit c1386ab

File tree

6 files changed

+402
-0
lines changed

6 files changed

+402
-0
lines changed

scripts/migration/public/README.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
## Migrate from public operator to cloud operator
2+
3+
This guide will walk you through migrating a crdb cluster managed via the public operator to the crdb cloud operator. We assume you've created a cluster using the public operator. The goals of this process are to migrate without affecting cluster availability, and to preserve existing disks so that we don't have to replica data into empty volumes. Note that this process scales down the statefulset by one node before adding each operator-managed pod, so cluster capacity will be reduced by one node at times.
4+
5+
Pre-requisite: Install the public operator and create an operator-managed cluster:
6+
7+
```
8+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
9+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
10+
11+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/examples/example.yaml
12+
```
13+
14+
Set environment variables:
15+
16+
```
17+
export CRDBCLUSTER=cockroachdb
18+
export NAMESPACE=default
19+
export CLOUD_PROVIDER=gcp
20+
export REGION=us-central1
21+
```
22+
23+
Back up crdbcluster resource in case we need to revert:
24+
25+
```
26+
mkdir -p backup
27+
kubectl get crdbcluster -o yaml $CRDBCLUSTER > backup/crdbcluster-$CRDBCLUSTER.yaml
28+
```
29+
30+
Next, we need to re-map and generate tls certs. The crdb cloud operator uses slightly different certs than the public operator and mounts them in configmaps and secrets with different names. Run the `generate-certs.sh` script to generate and upload certs to your cluster.
31+
32+
```
33+
./generate-certs.sh
34+
```
35+
36+
Next, generate manifests for each crdbnode and the crdbcluster based on the state of the statefulset. We generate a manifest for each crdbnode because we want the crdb pods and their associated pvcs to have the same names as the original statefulset-managed pods and pvcs. This means that the new operator-managed pods will use the original pvcs, and won't have to replicate data into empty nodes.
37+
38+
```
39+
./generate-manifests.sh
40+
41+
The public operator and cloud operator use custom resource definitions with the same names, so we have to remove the public operator before installing the cloud operator. Uninstall the public operator, without deleting its managed pods, pvc, etc.:
42+
43+
```
44+
45+
# Ensure that operator can't accidentally delete managed k8s objects.
46+
kubectl delete clusterrolebinding cockroach-operator-rolebinding
47+
48+
# Delete public operator cr.
49+
kubectl delete crdbcluster $CRDBCLUSTER --cascade=orphan
50+
51+
# Delete public operator resources and crd.
52+
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
53+
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
54+
```
55+
56+
Install the cloud operator and wait for it to become ready:
57+
58+
```
59+
helm upgrade --install crdb-operator ./operator
60+
kubectl rollout status deployment/cockroach-operator --timeout=60s
61+
```
62+
63+
To migrate seamlessly from the statefulset to the cloud operator, we'll scale down statefulset-managed pods and replace them with crdbnode objects, one by one. Then we'll create the crdbcluster that manages the crdbnodes. Because of this order of operations, we need to create some objects that the crdbcluster will eventually own:
64+
65+
```
66+
kubectl create priorityclass crdb-critical --value 500000000
67+
yq '(.. | select(tag == "!!str")) |= envsubst' rbac-template.yaml > manifests/rbac.yaml
68+
kubectl apply -f manifests/rbac.yaml
69+
```
70+
71+
For each crdb pod, scale the statefulset down by one replica. For example, for a three-node cluster, first scale the statefulset down to two replicas:
72+
73+
```
74+
kubectl scale statefulset/$CRDBCLUSTER --replicas=2
75+
```
76+
77+
Then create the crdbnode corresponding to the statefulset pod you just scaled down:
78+
79+
```
80+
kubectl apply -f manifests/crdbnode-$CRDBCLUSTER-2.yaml
81+
```
82+
83+
Wait for the new pod to become ready. If it doesn't, check the cloud operator logs for errors.
84+
85+
Repeat this process for each crdb node until the statefulset has zero replicas.
86+
87+
The public operator creates a pod disruption budget that conflicts with a pod disruption budget managed by the cloud operator. Before applying the crdbcluster manifest, delete the existing pod disruption budget:
88+
89+
```
90+
kubectl delete poddisruptionbudget $CRDBCLUSTER
91+
```
92+
93+
Finally, apply the crdbcluster manifest:
94+
95+
```
96+
kubectl apply -f manifests/crdbcluster-$CRDBCLUSTER.yaml
97+
```
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
{
2+
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
3+
"kind": "CrdbCluster",
4+
"metadata": {
5+
"name": env(CRDBCLUSTER),
6+
"namespace": env(NAMESPACE)
7+
},
8+
"spec": {
9+
"dataStore": {},
10+
"features": [
11+
"reconcile",
12+
"reconcile-beta"
13+
],
14+
"mode": "MutableOnly",
15+
"regions": [
16+
{
17+
"cloudProvider": env(CLOUD_PROVIDER),
18+
"code": env(REGION),
19+
"namespace": env(NAMESPACE),
20+
"domain": "",
21+
"nodes": .spec.replicas
22+
}
23+
],
24+
"rollingRestartDelay": "30s",
25+
"template": {
26+
"metadata": {
27+
"annotations": {
28+
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
29+
},
30+
"finalizers": [
31+
"crdbnode.crdb.cockroachlabs.com/finalizer"
32+
],
33+
"labels": {
34+
"app": "cockroachdb",
35+
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
36+
"svc": "cockroachdb"
37+
},
38+
"namespace": env(NAMESPACE)
39+
},
40+
"spec": {
41+
"podLabels": .spec.template.metadata.labels,
42+
"certificates": {
43+
"externalCertificates": {
44+
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
45+
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
46+
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
47+
}
48+
},
49+
"dataStore": {
50+
"volumeClaimTemplate": {
51+
"metadata": {
52+
"name": "datadir"
53+
},
54+
"spec": {
55+
"accessModes": [
56+
"ReadWriteOnce"
57+
],
58+
"resources": {
59+
"requests": {
60+
"storage": .spec.volumeClaimTemplates[
61+
0
62+
].spec.resources.requests.storage
63+
}
64+
},
65+
"storageClassName": .spec.volumeClaimTemplates[
66+
0
67+
].spec.storageClassName
68+
}
69+
}
70+
},
71+
"domain": "",
72+
"env": [
73+
{
74+
"name": "HOST_IP",
75+
"valueFrom": {
76+
"fieldRef": {
77+
"apiVersion": "v1",
78+
"fieldPath": "status.hostIP"
79+
}
80+
}
81+
}
82+
],
83+
"resourceRequirements": .spec.template.spec.containers[
84+
0
85+
].resources,
86+
"image": .spec.template.spec.containers[
87+
0
88+
].image,
89+
"serviceAccountName": "cockroachdb",
90+
"useSecurityContexts": true
91+
}
92+
},
93+
"tlsEnabled": true
94+
}
95+
}
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
{
2+
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
3+
"kind": "CrdbNode",
4+
"metadata": {
5+
"annotations": {
6+
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
7+
},
8+
"finalizers": [
9+
"crdbnode.crdb.cockroachlabs.com/finalizer"
10+
],
11+
"generateName": "",
12+
"name": env(crdb_node_name),
13+
"labels": {
14+
"app": "cockroachdb",
15+
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
16+
"svc": "cockroachdb"
17+
},
18+
"namespace": env(NAMESPACE)
19+
},
20+
"spec": {
21+
"podLabels": .spec.template.metadata.labels,
22+
"certificates": {
23+
"externalCertificates": {
24+
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
25+
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
26+
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
27+
}
28+
},
29+
"dataStore": {
30+
"volumeClaimTemplate": {
31+
"metadata": {
32+
"name": "datadir"
33+
},
34+
"spec": {
35+
"accessModes": [
36+
"ReadWriteOnce"
37+
],
38+
"resources": {
39+
"requests": {
40+
"storage": .spec.volumeClaimTemplates[
41+
0
42+
].spec.resources.requests.storage
43+
}
44+
},
45+
"storageClassName": .spec.volumeClaimTemplates[
46+
0
47+
].spec.storageClassName
48+
}
49+
}
50+
},
51+
"domain": "",
52+
"env": [
53+
{
54+
"name": "HOST_IP",
55+
"valueFrom": {
56+
"fieldRef": {
57+
"apiVersion": "v1",
58+
"fieldPath": "status.hostIP"
59+
}
60+
}
61+
}
62+
],
63+
"resourceRequirements": .spec.template.spec.containers[
64+
0
65+
].resources,
66+
"image": .spec.template.spec.containers[
67+
0
68+
].image,
69+
"join": env(join_str),
70+
"serviceAccountName": "cockroachdb",
71+
"useSecurityContexts": true,
72+
"nodeName": env(k8s_node_name)
73+
}
74+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/usr/bin/env bash
2+
3+
set -euo pipefail
4+
5+
mkdir -p certs
6+
7+
# Fetch and remap CA cert.
8+
kubectl get secret -o yaml $CRDBCLUSTER-ca | yq '.data."ca.key"' | base64 -d >certs/ca.key
9+
kubectl get secret -o yaml $CRDBCLUSTER-node | yq '.data."ca.crt"' | base64 -d >certs/ca.crt
10+
kubectl create configmap $CRDBCLUSTER-ca --from-file=certs/ca.crt --dry-run=client -o yaml |
11+
kubectl apply -f -
12+
13+
# Fetch and update node certs. The node certs generated by the helm chart don't
14+
# include the necessary SANs for the cloud operator, so we create new certs
15+
# with the existing SANs as well as the additional SANs required for the cloud
16+
# operator.
17+
hosts=()
18+
for host in $(kubectl get secret -o yaml $CRDBCLUSTER-node |
19+
yq '.data."tls.crt"' |
20+
base64 -d |
21+
openssl x509 -noout -ext subjectAltName |
22+
tail -n+2 |
23+
sed -E 's/(DNS:)|(IP Address:)|,//g' |
24+
xargs); do
25+
hosts+=($host)
26+
done
27+
hosts+=("$CRDBCLUSTER-join.$NAMESPACE.svc.cluster.local")
28+
cockroach cert create-node --ca-key ./certs/ca.key --certs-dir ./certs --overwrite "${hosts[@]}"
29+
30+
kubectl create secret generic $CRDBCLUSTER-node-certs --from-file=tls.crt=certs/node.crt --from-file=tls.key=certs/node.key --dry-run=client -o yaml |
31+
kubectl apply -f -
32+
33+
# Root user certs. The public operator doesn't generate one, so we create new certs signed by the original CA.
34+
cockroach cert create-client root --ca-key certs/ca.key --certs-dir ./certs --overwrite
35+
kubectl create secret generic $CRDBCLUSTER-client-certs --from-file=tls.crt=./certs/client.root.crt --from-file=tls.key=./certs/client.root.key --dry-run=client -o yaml | kubectl apply -f -
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env bash
2+
3+
set -euo pipefail
4+
set -x
5+
6+
sts_yaml=$(kubectl get sts -o yaml $CRDBCLUSTER)
7+
8+
echo "${sts_yaml}" | yq "$(cat crdbcluster-template.json)" >manifests/crdbcluster-${CRDBCLUSTER}.yaml
9+
10+
num_nodes=$(echo "${sts_yaml}" | yq '.spec.replicas')
11+
12+
export join_str=""
13+
for idx in $(seq 0 $(($num_nodes - 1))); do
14+
if [[ -n "${join_str}" ]]; then
15+
join_str="${join_str},"
16+
fi
17+
join_str="${join_str}${CRDBCLUSTER}-${idx}.${CRDBCLUSTER}.${NAMESPACE}:26258"
18+
done
19+
20+
for idx in $(seq 0 $(($num_nodes - 1))); do
21+
export crdb_node_name=${CRDBCLUSTER}-${idx}
22+
export k8s_node_name=$(kubectl get pod -o yaml ${crdb_node_name} | yq '.spec.nodeName')
23+
echo "${sts_yaml}" | yq "$(cat crdbnode-template.json)" >manifests/crdbnode-${CRDBCLUSTER}-${idx}.yaml
24+
done

0 commit comments

Comments
 (0)