|
| 1 | +:_content-type: ASSEMBLY |
| 2 | +[id="dr-hosted-cluster-within-aws-region"] |
| 3 | += Disaster recovery for a hosted cluster within an AWS region |
| 4 | +include::_attributes/common-attributes.adoc[] |
| 5 | +:context: dr-hosted-cluster-within-aws-region |
| 6 | + |
| 7 | +toc::[] |
| 8 | + |
| 9 | +In a situation where you need disaster recovery (DR) for a hosted cluster, you can recover a hosted cluster to the same region within AWS. For example, you need DR when the upgrade of a management cluster fails and the hosted cluster is in a read-only state. |
| 10 | + |
| 11 | +:FeatureName: Hosted control planes |
| 12 | +include::snippets/technology-preview.adoc[] |
| 13 | + |
| 14 | +The DR process involves three main steps: |
| 15 | + |
| 16 | +. Backing up the hosted cluster on the source management cluster |
| 17 | +. Restoring the hosted cluster on a destination management cluster |
| 18 | +. Deleting the hosted cluster from the source management cluster |
| 19 | + |
| 20 | +Your workloads remain running during the process. The Cluster API might be unavailable for a period, but that will not affect the services that are running on the worker nodes. |
| 21 | + |
| 22 | +[IMPORTANT] |
| 23 | +==== |
| 24 | +Both the source management cluster and the destination management cluster must have the `--external-dns` flags to maintain the API server URL, as shown in this example: |
| 25 | +
|
| 26 | +.Example: External DNS flags |
| 27 | +[source,terminal] |
| 28 | +---- |
| 29 | +--external-dns-provider=aws \ |
| 30 | +--external-dns-credentials=<AWS Credentials location> \ |
| 31 | +--external-dns-domain-filter=<DNS Base Domain> |
| 32 | +---- |
| 33 | +
|
| 34 | +That way, the server URL ends with `https://api-sample-hosted.sample-hosted.aws.openshift.com`. |
| 35 | +
|
| 36 | +If you do not include the `--external-dns` flags to maintain the API server URL, the hosted cluster cannot be migrated. |
| 37 | +==== |
| 38 | + |
| 39 | +[id="dr-hosted-cluster-env-context"] |
| 40 | +== Example environment and context |
| 41 | + |
| 42 | +Consider an scenario where you have three clusters to restore. Two are management clusters, and one is a hosted cluster. You can restore either the control plane only or the control plane and the nodes. Before you begin, you need the following information: |
| 43 | + |
| 44 | +* Source MGMT Namespace: The source management namespace |
| 45 | +* Source MGMT ClusterName: The source management cluster name |
| 46 | +* Source MGMT Kubeconfig: The source management `kubeconfig` file |
| 47 | +* Destination MGMT Kubeconfig: The destination management `kubeconfig` file |
| 48 | +* HC Kubeconfig: The hosted cluster `kubeconfig` file |
| 49 | +* SSH key file: The SSH public key |
| 50 | +* Pull secret: The pull secret file to access the release images |
| 51 | +* AWS credentials |
| 52 | +* AWS region |
| 53 | +* Base domain: The DNS base domain to use as an external DNS |
| 54 | +* S3 bucket name: The bucket in the AWS region where you plan to upload the etcd backup |
| 55 | + |
| 56 | +This information is shown in the following example environment variables. |
| 57 | + |
| 58 | +.Example environment variables |
| 59 | +[source,terminal] |
| 60 | +---- |
| 61 | +SSH_KEY_FILE=${HOME}/.ssh/id_rsa.pub |
| 62 | +BASE_PATH=${HOME}/hypershift |
| 63 | +BASE_DOMAIN="aws.sample.com" |
| 64 | +PULL_SECRET_FILE="${HOME}/pull_secret.json" |
| 65 | +AWS_CREDS="${HOME}/.aws/credentials" |
| 66 | +AWS_ZONE_ID="Z02718293M33QHDEQBROL" |
| 67 | +
|
| 68 | +CONTROL_PLANE_AVAILABILITY_POLICY=SingleReplica |
| 69 | +HYPERSHIFT_PATH=${BASE_PATH}/src/hypershift |
| 70 | +HYPERSHIFT_CLI=${HYPERSHIFT_PATH}/bin/hypershift |
| 71 | +HYPERSHIFT_IMAGE=${HYPERSHIFT_IMAGE:-"quay.io/${USER}/hypershift:latest"} |
| 72 | +NODE_POOL_REPLICAS=${NODE_POOL_REPLICAS:-2} |
| 73 | +
|
| 74 | +# MGMT Context |
| 75 | +MGMT_REGION=us-west-1 |
| 76 | +MGMT_CLUSTER_NAME="${USER}-dev" |
| 77 | +MGMT_CLUSTER_NS=${USER} |
| 78 | +MGMT_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT_CLUSTER_NS}-${MGMT_CLUSTER_NAME}" |
| 79 | +MGMT_KUBECONFIG="${MGMT_CLUSTER_DIR}/kubeconfig" |
| 80 | +
|
| 81 | +# MGMT2 Context |
| 82 | +MGMT2_CLUSTER_NAME="${USER}-dest" |
| 83 | +MGMT2_CLUSTER_NS=${USER} |
| 84 | +MGMT2_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT2_CLUSTER_NS}-${MGMT2_CLUSTER_NAME}" |
| 85 | +MGMT2_KUBECONFIG="${MGMT2_CLUSTER_DIR}/kubeconfig" |
| 86 | +
|
| 87 | +# Hosted Cluster Context |
| 88 | +HC_CLUSTER_NS=clusters |
| 89 | +HC_REGION=us-west-1 |
| 90 | +HC_CLUSTER_NAME="${USER}-hosted" |
| 91 | +HC_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}" |
| 92 | +HC_KUBECONFIG="${HC_CLUSTER_DIR}/kubeconfig" |
| 93 | +BACKUP_DIR=${HC_CLUSTER_DIR}/backup |
| 94 | +
|
| 95 | +BUCKET_NAME="${USER}-hosted-${MGMT_REGION}" |
| 96 | +
|
| 97 | +# DNS |
| 98 | +AWS_ZONE_ID="Z07342811SH9AA102K1AC" |
| 99 | +EXTERNAL_DNS_DOMAIN="hc.jpdv.aws.kerbeross.com" |
| 100 | +---- |
| 101 | + |
| 102 | +[id="dr-hosted-cluster-process"] |
| 103 | +== Overview of the backup and restore process |
| 104 | + |
| 105 | +The backup and restore process works as follows: |
| 106 | + |
| 107 | +. On management cluster 1, which you can think of as the source management cluster, the control plane and workers interact by using the ExternalDNS API. |
| 108 | + |
| 109 | +. You take a snapshot of the hosted cluster, which includes etcd, the control plane, and the worker nodes. The worker nodes are moved to the external DNS, the control plane is saved in a local manifest file, and etcd is backed up to an S3 bucket. |
| 110 | + |
| 111 | +. On management cluster 2, which you can think of as the destination management cluster, you restore etcd from the S3 bucket and restore the control plane from the local manifest file. |
| 112 | + |
| 113 | +. By using the External DNS API, the worker nodes are restored to management cluster 2. |
| 114 | + |
| 115 | +. On management cluster 2, the control plane and worker nodes interact by using the ExternalDNS API. |
| 116 | + |
| 117 | +// When the updated diagram is available, I will add it here and update the first sentence in this section to read, "As shown in the following diagram, the backup and restore process works as follows:" |
| 118 | + |
| 119 | +You can manually back up and restore your hosted cluster, or you can run a script to complete the process. For more information about the script, see "Running a script to back up and restore a hosted cluster". |
| 120 | + |
| 121 | +// Backing up the hosted cluster |
| 122 | +include::modules/dr-hosted-cluster-within-aws-region-backup.adoc[leveloffset=+1] |
| 123 | + |
| 124 | +// Restoring the hosted cluster |
| 125 | +include::modules/dr-hosted-cluster-within-aws-region-restore.adoc[leveloffset=+1] |
| 126 | + |
| 127 | +// Deleting the hosted cluster |
| 128 | +include::modules/dr-hosted-cluster-within-aws-region-delete.adoc[leveloffset=+1] |
| 129 | + |
| 130 | +//Helper script |
| 131 | +include::modules/dr-hosted-cluster-within-aws-region-script.adoc[leveloffset=+1] |
0 commit comments