openshift
diff --git a/‎_topic_maps/_topic_map.yml
Lines changed: 4 additions & 0 deletions b/‎_topic_maps/_topic_map.yml
Lines changed: 4 additions & 0 deletions
diff --git a/‎backup_and_restore/control_plane_backup_and_restore/disaster_recovery/dr-hosted-cluster-within-aws-region.adoc
Lines changed: 131 additions & 0 deletions b/‎backup_and_restore/control_plane_backup_and_restore/disaster_recovery/dr-hosted-cluster-within-aws-region.adoc
Lines changed: 131 additions & 0 deletions
diff --git a/‎backup_and_restore/control_plane_backup_and_restore/etcd-backup-restore-hosted-cluster.adoc
Lines changed: 23 additions & 0 deletions b/‎backup_and_restore/control_plane_backup_and_restore/etcd-backup-restore-hosted-cluster.adoc
Lines changed: 23 additions & 0 deletions
diff --git a/‎modules/backup-etcd-hosted-cluster.adoc
Lines changed: 81 additions & 0 deletions b/‎modules/backup-etcd-hosted-cluster.adoc
Lines changed: 81 additions & 0 deletions
@@ -2536,6 +2536,8 @@ Topics:
     File: backing-up-etcd
   - Name: Replacing an unhealthy etcd member
     File: replacing-unhealthy-etcd-member
+  - Name: Backing up and restoring etcd on a hosted cluster
+    File: etcd-backup-restore-hosted-cluster
   - Name: Disaster recovery
     Dir: disaster_recovery
     Topics:
@@ -2545,6 +2547,8 @@ Topics:
       File: scenario-2-restoring-cluster-state
     - Name: Recovering from expired control plane certificates
       File: scenario-3-expired-certs
+    - Name: Disaster recovery for a hosted cluster within an AWS region
+      File: dr-hosted-cluster-within-aws-region
 ---
 Name: Migrating from version 3 to 4
 Dir: migrating_from_ocp_3_to_4
 
@@ -0,0 +1,131 @@
+:_content-type: ASSEMBLY
+[id="dr-hosted-cluster-within-aws-region"]
+= Disaster recovery for a hosted cluster within an AWS region
+include::_attributes/common-attributes.adoc[]
+:context: dr-hosted-cluster-within-aws-region
+
+toc::[]
+
+In a situation where you need disaster recovery (DR) for a hosted cluster, you can recover a hosted cluster to the same region within AWS. For example, you need DR when the upgrade of a management cluster fails and the hosted cluster is in a read-only state.
+
+:FeatureName: Hosted control planes
+include::snippets/technology-preview.adoc[]
+
+The DR process involves three main steps:
+
+. Backing up the hosted cluster on the source management cluster
+. Restoring the hosted cluster on a destination management cluster
+. Deleting the hosted cluster from the source management cluster
+
+Your workloads remain running during the process. The Cluster API might be unavailable for a period, but that will not affect the services that are running on the worker nodes.
+
+[IMPORTANT]
+====
+Both the source management cluster and the destination management cluster must have the `--external-dns` flags to maintain the API server URL, as shown in this example:
+
+.Example: External DNS flags
+[source,terminal]
+----
+--external-dns-provider=aws \
+--external-dns-credentials=<AWS Credentials location> \
+--external-dns-domain-filter=<DNS Base Domain>
+----
+
+That way, the server URL ends with `https://api-sample-hosted.sample-hosted.aws.openshift.com`.
+
+If you do not include the `--external-dns` flags to maintain the API server URL, the hosted cluster cannot be migrated.
+====
+
+[id="dr-hosted-cluster-env-context"]
+== Example environment and context
+
+Consider an scenario where you have three clusters to restore. Two are management clusters, and one is a hosted cluster. You can restore either the control plane only or the control plane and the nodes. Before you begin, you need the following information:
+
+* Source MGMT Namespace: The source management namespace
+* Source MGMT ClusterName: The source management cluster name
+* Source MGMT Kubeconfig: The source management `kubeconfig` file
+* Destination MGMT Kubeconfig: The destination management `kubeconfig` file
+* HC Kubeconfig: The hosted cluster `kubeconfig` file
+* SSH key file: The SSH public key
+* Pull secret: The pull secret file to access the release images
+* AWS credentials
+* AWS region
+* Base domain: The DNS base domain to use as an external DNS
+* S3 bucket name: The bucket in the AWS region where you plan to upload the etcd backup
+
+This information is shown in the following example environment variables.
+
+.Example environment variables
+[source,terminal]
+----
+SSH_KEY_FILE=${HOME}/.ssh/id_rsa.pub
+BASE_PATH=${HOME}/hypershift
+BASE_DOMAIN="aws.sample.com"
+PULL_SECRET_FILE="${HOME}/pull_secret.json"
+AWS_CREDS="${HOME}/.aws/credentials"
+AWS_ZONE_ID="Z02718293M33QHDEQBROL"
+
+CONTROL_PLANE_AVAILABILITY_POLICY=SingleReplica
+HYPERSHIFT_PATH=${BASE_PATH}/src/hypershift
+HYPERSHIFT_CLI=${HYPERSHIFT_PATH}/bin/hypershift
+HYPERSHIFT_IMAGE=${HYPERSHIFT_IMAGE:-"quay.io/${USER}/hypershift:latest"}
+NODE_POOL_REPLICAS=${NODE_POOL_REPLICAS:-2}
+
+# MGMT Context
+MGMT_REGION=us-west-1
+MGMT_CLUSTER_NAME="${USER}-dev"
+MGMT_CLUSTER_NS=${USER}
+MGMT_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT_CLUSTER_NS}-${MGMT_CLUSTER_NAME}"
+MGMT_KUBECONFIG="${MGMT_CLUSTER_DIR}/kubeconfig"
+
+# MGMT2 Context
+MGMT2_CLUSTER_NAME="${USER}-dest"
+MGMT2_CLUSTER_NS=${USER}
+MGMT2_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${MGMT2_CLUSTER_NS}-${MGMT2_CLUSTER_NAME}"
+MGMT2_KUBECONFIG="${MGMT2_CLUSTER_DIR}/kubeconfig"
+
+# Hosted Cluster Context
+HC_CLUSTER_NS=clusters
+HC_REGION=us-west-1
+HC_CLUSTER_NAME="${USER}-hosted"
+HC_CLUSTER_DIR="${BASE_PATH}/hosted_clusters/${HC_CLUSTER_NS}-${HC_CLUSTER_NAME}"
+HC_KUBECONFIG="${HC_CLUSTER_DIR}/kubeconfig"
+BACKUP_DIR=${HC_CLUSTER_DIR}/backup
+
+BUCKET_NAME="${USER}-hosted-${MGMT_REGION}"
+
+# DNS
+AWS_ZONE_ID="Z07342811SH9AA102K1AC"
+EXTERNAL_DNS_DOMAIN="hc.jpdv.aws.kerbeross.com"
+----
+
+[id="dr-hosted-cluster-process"]
+== Overview of the backup and restore process
+
+The backup and restore process works as follows:
+
+. On management cluster 1, which you can think of as the source management cluster, the control plane and workers interact by using the ExternalDNS API.
+
+. You take a snapshot of the hosted cluster, which includes etcd, the control plane, and the worker nodes. The worker nodes are moved to the external DNS, the control plane is saved in a local manifest file, and etcd is backed up to an S3 bucket.
+
+. On management cluster 2, which you can think of as the destination management cluster, you restore etcd from the S3 bucket and restore the control plane from the local manifest file.
+
+. By using the External DNS API, the worker nodes are restored to management cluster 2.
+
+. On management cluster 2, the control plane and worker nodes interact by using the ExternalDNS API.
+
+// When the updated diagram is available, I will add it here and update the first sentence in this section to read, "As shown in the following diagram, the backup and restore process works as follows:"
+
+You can manually back up and restore your hosted cluster, or you can run a script to complete the process. For more information about the script, see "Running a script to back up and restore a hosted cluster".
+
+// Backing up the hosted cluster
+include::modules/dr-hosted-cluster-within-aws-region-backup.adoc[leveloffset=+1]
+
+// Restoring the hosted cluster
+include::modules/dr-hosted-cluster-within-aws-region-restore.adoc[leveloffset=+1]
+
+// Deleting the hosted cluster
+include::modules/dr-hosted-cluster-within-aws-region-delete.adoc[leveloffset=+1]
+
+//Helper script
+include::modules/dr-hosted-cluster-within-aws-region-script.adoc[leveloffset=+1]
@@ -0,0 +1,23 @@
+:_content-type: ASSEMBLY
+[id="etcd-backup-restore-hosted-cluster"]
+= Backing up and restoring etcd on a hosted cluster
+include::_attributes/common-attributes.adoc[]
+:context: etcd-backup-restore-hosted-cluster
+
+toc::[]
+
+If you use hosted control planes on {product-title}, the process to back up and restore etcd is different from xref:../../backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[the usual etcd backup process]. 
+
+:FeatureName: Hosted control planes
+include::snippets/technology-preview.adoc[]
+
+// Backing up etcd on a hosted cluster
+include::modules/backup-etcd-hosted-cluster.adoc[leveloffset=+1]
+
+// Restoring an etcd snapshot on a hosted cluster
+include::modules/restoring-etcd-snapshot-hosted-cluster.adoc[leveloffset=+1]
+
+[role="_additional-resources"]
+[id="additional-resources_etcd-backup-restore-hosted-cluster"]
+== Additional resources
+* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/dr-hosted-cluster-within-aws-region.adoc#dr-hosted-cluster-within-aws-region[Disaster recovery for a hosted cluster within an AWS region]
@@ -0,0 +1,81 @@
+// Module included in the following assembly:
+//
+// * control_plane_backup_and_restore/etcd-backup-restore-hosted-cluster.adoc
+
+:_content-type: PROCEDURE
+[id="backup-etcd-hosted-cluster_{context}"]
+= Taking a snapshot of etcd on a hosted cluster
+
+As part of the process to back up etcd for a hosted cluster, you take a snapshot of etcd. After you take the snapshot, you can restore it, for example, as part of a disaster recovery operation.
+
+[IMPORTANT]
+====
+This procedure requires API downtime.
+====
+
+.Procedure
+
+. Pause reconciliation of the hosted cluster by entering this command:
++
+[source,terminal]
+----
+$ oc patch -n clusters hostedclusters/${CLUSTER_NAME} -p '{"spec":{"pausedUntil":"'${PAUSED_UNTIL}'"}}' --type=merge
+----
+
+. Stop all etcd-writer deployments by entering this command:
++
+[source,terminal]
+----
+$ oc scale deployment -n ${HOSTED_CLUSTER_NAMESPACE} --replicas=0 kube-apiserver openshift-apiserver openshift-oauth-apiserver
+----
+
+. Take an etcd snapshot by using the `exec` command in each etcd container:
++
+[source,terminal]
+----
+$ oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- env ETCDCTL_API=3 /usr/bin/etcdctl --cacert /etc/etcd/tls/client/etcd-client-ca.crt --cert /etc/etcd/tls/client/etcd-client.crt --key /etc/etcd/tls/client/etcd-client.key --endpoints=localhost:2379 snapshot save /var/lib/data/snapshot.db
+$ oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- env ETCDCTL_API=3 /usr/bin/etcdctl -w table snapshot status /var/lib/data/snapshot.db
+----
+
+. Copy the snapshot data to a location where you can retrieve it later, such as an S3 bucket, as shown in the following example.
++
+[NOTE]
+====
+The following example uses signature version 2. If you are in a region that supports signature version 4, such as the us-east-2 region, use signature version 4. Otherwise, if you use signature version 2 to copy the snapshot to an S3 bucket, the upload fails and signature version 2 is deprecated.
+====
++
+.Example
+[source,terminal]
+----
+BUCKET_NAME=somebucket
+FILEPATH="/${BUCKET_NAME}/${CLUSTER_NAME}-snapshot.db"
+CONTENT_TYPE="application/x-compressed-tar"
+DATE_VALUE=`date -R`
+SIGNATURE_STRING="PUT\n\n${CONTENT_TYPE}\n${DATE_VALUE}\n${FILEPATH}"
+ACCESS_KEY=accesskey
+SECRET_KEY=secret
+SIGNATURE_HASH=`echo -en ${SIGNATURE_STRING} | openssl sha1 -hmac ${SECRET_KEY} -binary | base64`
+
+oc exec -it etcd-0 -n ${HOSTED_CLUSTER_NAMESPACE} -- curl -X PUT -T "/var/lib/data/snapshot.db" \
+  -H "Host: ${BUCKET_NAME}.s3.amazonaws.com" \
+  -H "Date: ${DATE_VALUE}" \
+  -H "Content-Type: ${CONTENT_TYPE}" \
+  -H "Authorization: AWS ${ACCESS_KEY}:${SIGNATURE_HASH}" \
+  https://${BUCKET_NAME}.s3.amazonaws.com/${CLUSTER_NAME}-snapshot.db
+----
+
+. If you want to be able to restore the snapshot on a new cluster later, save the encryption secret that the hosted cluster references, as shown in this example:
++
+.Example
+[source,terminal]
+----
+oc get hostedcluster $CLUSTER_NAME -o=jsonpath='{.spec.secretEncryption.aescbc}'
+{"activeKey":{"name":"CLUSTER_NAME-etcd-encryption-key"}}
+
+# Save this secret, or the key it contains so the etcd data can later be decrypted
+oc get secret ${CLUSTER_NAME}-etcd-encryption-key -o=jsonpath='{.data.key}'
+----
+
+.Next steps
+
+Restore the etcd snapshot.