Skip to content

Commit 2ef1d80

Browse files
committed
scripts: add migration script from public operator to cloud operator.
Check in a reference implementation for migrating from statesets managed by the public operator to the cloud operator. Note that this process involves some manual steps, and we may want to automate and test it further.
1 parent ebd8f6f commit 2ef1d80

File tree

6 files changed

+397
-0
lines changed

6 files changed

+397
-0
lines changed

scripts/migration/public/README.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
## Migrate from public operator to cloud operator
2+
3+
This guide will walk you through migrating a crdb cluster managed via the public operator to the crdb cloud operator. We assume you've created a cluster using the public operator. The goals of this process are to migrate without affecting cluster availability, and to preserve existing disks so that we don't have to replica data into empty volumes. Note that this process scales down the statefulset by one node before adding each operator-managed pod, so cluster capacity will be reduced by one node at times.
4+
5+
Pre-requisite: Install the public operator and create an operator-managed cluster:
6+
7+
```
8+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
9+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
10+
11+
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/examples/example.yaml
12+
```
13+
14+
```
15+
export CRDBCLUSTER=cockroachdb
16+
export NAMESPACE=default
17+
```
18+
19+
```
20+
mkdir -p backup
21+
kubectl get crdbcluster -o yaml $CRDBCLUSTER > backup/crdbcluster-$CRDBCLUSTER.yaml
22+
```
23+
24+
The public operator and cloud operator use custom resource definitions with the same names, so we have to remove the public operator before installing the cloud operator. Uninstall the public operator, without deleting its managed pods, pvc, etc.:
25+
26+
```
27+
28+
# Ensure that operator can't accidentally delete managed k8s objects.
29+
kubectl delete clusterrolebinding cockroach-operator-rolebinding
30+
31+
# Delete public operator cr.
32+
kubectl delete crdbcluster $CRDBCLUSTER --cascade=orphan
33+
34+
# Delete public operator resources and crd.
35+
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
36+
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
37+
```
38+
39+
Install the cloud operator and wait for it to become ready:
40+
41+
```
42+
helm upgrade --install crdb-operator ./operator
43+
kubectl rollout status deployment/cockroach-operator --timeout=60s
44+
```
45+
46+
Next, we need to re-map and generate tls certs. The crdb cloud operator uses slightly different certs than the public operator and mounts them in configmaps and secrets with different names. Run the `generate-certs.sh` script to generate and upload certs to your cluster.
47+
48+
```
49+
./generate-certs.sh
50+
```
51+
52+
To migrate seamlessly from the statefulset to the cloud operator, we'll scale down statefulset-managed pods and replace them with crdbnode objects, one by one. Then we'll create the crdbcluster that manages the crdbnodes. Because of this order of operations, we need to create some objects that the crdbcluster will eventually own:
53+
54+
```
55+
kubectl create priorityclass crdb-critical --value 500000000
56+
yq '(.. | select(tag == "!!str")) |= envsubst' rbac-template.yaml > rbac.yaml
57+
kubectl apply -f rbac.yaml
58+
```
59+
60+
Next, generate manifests for each crdbnode and the crdbcluster based on the state of the statefulset. We generate a manifest for each crdbnode because we want the crdb pods and their associated pvcs to have the same names as the original statefulset-managed pods and pvcs. This means that the new operator-managed pods will use the original pvcs, and won't have to replicate data into empty nodes.
61+
62+
```
63+
./generate-manifests.sh
64+
```
65+
66+
For each crdb pod, scale the statefulset down by one replica. For example, for a three-node cluster, first scale the statefulset down to two replicas:
67+
68+
```
69+
kubectl scale statefulset/$CRDBCLUSTER --replicas=2
70+
```
71+
72+
Then create the crdbnode corresponding to the statefulset pod you just scaled down:
73+
74+
```
75+
kubectl apply -f crdbnode-$CRDBCLUSTER-2.yaml
76+
```
77+
78+
Wait for the new pod to become ready. If it doesn't, check the cloud operator logs for errors.
79+
80+
Repeat this process for each crdb node until the statefulset has zero replicas.
81+
82+
The public operator creates a pod disruption budget that conflicts with a pod disruption budget managed by the cloud operator. Before applying the crdbcluster manifest, delete the existing pod disruption budget:
83+
84+
```
85+
kubectl delete poddisruptionbudget $CRDBCLUSTER
86+
```
87+
88+
Finally, apply the crdbcluster manifest:
89+
90+
```
91+
kubectl apply -f crdbcluster-$CRDBCLUSTER.yaml
92+
```
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
{
2+
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
3+
"kind": "CrdbCluster",
4+
"metadata": {
5+
"name": env(CRDBCLUSTER),
6+
"namespace": env(NAMESPACE)
7+
},
8+
"spec": {
9+
"dataStore": {},
10+
"features": [
11+
"reconcile",
12+
"reconcile-beta"
13+
],
14+
"mode": "MutableOnly",
15+
"regions": [
16+
{
17+
"cloudProvider": env(CLOUD_PROVIDER),
18+
"code": env(REGION),
19+
"namespace": env(NAMESPACE),
20+
"domain": "",
21+
"nodes": .spec.replicas
22+
}
23+
],
24+
"rollingRestartDelay": "30s",
25+
"template": {
26+
"metadata": {
27+
"annotations": {
28+
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
29+
},
30+
"finalizers": [
31+
"crdbnode.crdb.cockroachlabs.com/finalizer"
32+
],
33+
"labels": {
34+
"app": "cockroachdb",
35+
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
36+
"svc": "cockroachdb"
37+
},
38+
"namespace": env(NAMESPACE)
39+
},
40+
"spec": {
41+
"podLabels": .spec.template.metadata.labels,
42+
"certificates": {
43+
"externalCertificates": {
44+
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
45+
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
46+
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
47+
}
48+
},
49+
"dataStore": {
50+
"volumeClaimTemplate": {
51+
"metadata": {
52+
"name": "datadir"
53+
},
54+
"spec": {
55+
"accessModes": [
56+
"ReadWriteOnce"
57+
],
58+
"resources": {
59+
"requests": {
60+
"storage": .spec.volumeClaimTemplates[
61+
0
62+
].spec.resources.requests.storage
63+
}
64+
},
65+
"storageClassName": .spec.volumeClaimTemplates[
66+
0
67+
].spec.storageClassName
68+
}
69+
}
70+
},
71+
"domain": "",
72+
"env": [
73+
{
74+
"name": "HOST_IP",
75+
"valueFrom": {
76+
"fieldRef": {
77+
"apiVersion": "v1",
78+
"fieldPath": "status.hostIP"
79+
}
80+
}
81+
}
82+
],
83+
"resourceRequirements": .spec.template.spec.containers[
84+
0
85+
].resources,
86+
"image": .spec.template.spec.containers[
87+
0
88+
].image,
89+
"serviceAccountName": "cockroachdb",
90+
"useSecurityContexts": true
91+
}
92+
},
93+
"tlsEnabled": true
94+
}
95+
}
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
{
2+
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
3+
"kind": "CrdbNode",
4+
"metadata": {
5+
"annotations": {
6+
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
7+
},
8+
"finalizers": [
9+
"crdbnode.crdb.cockroachlabs.com/finalizer"
10+
],
11+
"generateName": "",
12+
"name": env(crdb_node_name),
13+
"labels": {
14+
"app": "cockroachdb",
15+
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
16+
"svc": "cockroachdb"
17+
},
18+
"namespace": env(NAMESPACE)
19+
},
20+
"spec": {
21+
"podLabels": .spec.template.metadata.labels,
22+
"certificates": {
23+
"externalCertificates": {
24+
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
25+
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
26+
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
27+
}
28+
},
29+
"dataStore": {
30+
"volumeClaimTemplate": {
31+
"metadata": {
32+
"name": "datadir"
33+
},
34+
"spec": {
35+
"accessModes": [
36+
"ReadWriteOnce"
37+
],
38+
"resources": {
39+
"requests": {
40+
"storage": .spec.volumeClaimTemplates[
41+
0
42+
].spec.resources.requests.storage
43+
}
44+
},
45+
"storageClassName": .spec.volumeClaimTemplates[
46+
0
47+
].spec.storageClassName
48+
}
49+
}
50+
},
51+
"domain": "",
52+
"env": [
53+
{
54+
"name": "HOST_IP",
55+
"valueFrom": {
56+
"fieldRef": {
57+
"apiVersion": "v1",
58+
"fieldPath": "status.hostIP"
59+
}
60+
}
61+
}
62+
],
63+
"resourceRequirements": .spec.template.spec.containers[
64+
0
65+
].resources,
66+
"image": .spec.template.spec.containers[
67+
0
68+
].image,
69+
"join": env(join_str),
70+
"serviceAccountName": "cockroachdb",
71+
"useSecurityContexts": true,
72+
"nodeName": env(k8s_node_name)
73+
}
74+
}
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#!/usr/bin/env bash
2+
3+
set -euo pipefail
4+
5+
mkdir -p certs
6+
7+
# Fetch and remap CA cert.
8+
kubectl get secret -o yaml $CRDBCLUSTER-ca | yq '.data."ca.key"' | base64 -d >certs/ca.key
9+
kubectl get secret -o yaml $CRDBCLUSTER-node | yq '.data."ca.crt"' | base64 -d >certs/ca.crt
10+
kubectl create configmap $CRDBCLUSTER-ca --from-file=certs/ca.crt --dry-run=client -o yaml |
11+
kubectl apply -f -
12+
13+
# Fetch and update node certs. The node certs generated by the helm chart don't
14+
# include the necessary SANs for the cloud operator, so we create new certs
15+
# with the existing SANs as well as the additional SANs required for the cloud
16+
# operator.
17+
hosts=()
18+
for host in $(kubectl get secret -o yaml $CRDBCLUSTER-node |
19+
yq '.data."tls.crt"' |
20+
base64 -d |
21+
openssl x509 -noout -ext subjectAltName |
22+
tail -n+2 |
23+
sed -E 's/(DNS:)|(IP Address:)|,//g' |
24+
xargs); do
25+
hosts+=($host)
26+
done
27+
hosts+=("$CRDBCLUSTER-join.$NAMESPACE.svc.cluster.local")
28+
cockroach cert create-node --ca-key ./certs/ca.key --certs-dir ./certs --overwrite "${hosts[@]}"
29+
30+
kubectl create secret generic $CRDBCLUSTER-node-certs --from-file=tls.crt=certs/node.crt --from-file=tls.key=certs/node.key --dry-run=client -o yaml |
31+
kubectl apply -f -
32+
33+
# Root user certs. The public operator doesn't generate one, so we create new certs signed by the original CA.
34+
cockroach cert create-client root --ca-key certs/ca.key --certs-dir ./certs --overwrite
35+
kubectl create secret generic $CRDBCLUSTER-client-certs --from-file=tls.crt=./certs/client.root.crt --from-file=tls.key=./certs/client.root.key --dry-run=client -o yaml | kubectl apply -f -
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
#!/usr/bin/env bash
2+
3+
set -euo pipefail
4+
set -x
5+
6+
sts_yaml=$(kubectl get sts -o yaml $CRDBCLUSTER)
7+
8+
echo "${sts_yaml}" | yq "$(cat crdbcluster-template.json)" >crdbcluster-${CRDBCLUSTER}.yaml
9+
10+
num_nodes=$(echo "${sts_yaml}" | yq '.spec.replicas')
11+
12+
export join_str=""
13+
for idx in $(seq 0 $(($num_nodes - 1))); do
14+
if [[ -n "${join_str}" ]]; then
15+
join_str="${join_str},"
16+
fi
17+
join_str="${join_str}${CRDBCLUSTER}-${idx}.${CRDBCLUSTER}.${NAMESPACE}:26258"
18+
done
19+
20+
for idx in $(seq 0 $(($num_nodes - 1))); do
21+
export crdb_node_name=${CRDBCLUSTER}-${idx}
22+
export k8s_node_name=$(kubectl get pod -o yaml ${crdb_node_name} | yq '.spec.nodeName')
23+
echo "${sts_yaml}" | yq "$(cat crdbnode-template.json)" >crdbnode-${CRDBCLUSTER}-${idx}.yaml
24+
done

0 commit comments

Comments
 (0)