add warning about what disaster recovery does to a running cluster

deads2k · bergerhoffer · commit f556f007ec48 · 2021-08-11T12:13:32.000-04:00
diff --git a/backup_and_restore/disaster_recovery/about-disaster-recovery.adoc b/backup_and_restore/disaster_recovery/about-disaster-recovery.adoc
@@ -23,6 +23,13 @@ This also includes situations where you have lost the majority of your control p
 +
 If applicable, you might also need to xref:../../backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].
 +
+[WARNING]
+====
+Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.
+
+Prior to performing a restore, see xref:../../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-scenario-2-restoring-cluster-state-about_dr-restoring-cluster-state[About restoring cluster state] for more information on the impact to the cluster.
+====
++
 [NOTE]
 ====
 If you have a majority of your masters still available and have an etcd quorum, then follow the procedure to xref:../../backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].
diff --git a/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc b/backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc
@@ -7,5 +7,8 @@ toc::[]
 
 To restore the cluster to a previous state, you must have previously xref:../../backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[backed up etcd data] by creating a snapshot. You will use this snapshot to restore the cluster state.
 
+// About restoring to a previous cluster state
+include::modules/dr-restoring-cluster-state-about.adoc[leveloffset=+1]
+
 // Restoring to a previous cluster state
 include::modules/dr-restoring-cluster-state.adoc[leveloffset=+1]
diff --git a/modules/dr-restoring-cluster-state-about.adoc b/modules/dr-restoring-cluster-state-about.adoc
@@ -0,0 +1,24 @@
+// Module included in the following assemblies:
+//
+// * disaster_recovery/scenario-2-restoring-cluster-state.adoc
+
+[id="dr-scenario-2-restoring-cluster-state-about_{context}"]
+= About restoring cluster state
+
+You can use an etcd backup to restore your cluster to a previous state. This can be used to recover from the following situations:
+
+* The cluster has lost the majority of control plane hosts (quorum loss).
+* An administrator has deleted something critical and must restore to recover the cluster.
+
+[WARNING]
+====
+Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This should only be used as a last resort.
+
+If you are able to retrieve data using the Kubernetes API server, then etcd is available and you should not restore using an etcd backup.
+====
+
+Restoring etcd effectively takes a cluster back in time and all clients will experience a conflicting, parallel history. This can impact the behavior of watching components like kubelets, Kubernetes controller managers, SDN controllers, and persistent volume controllers.
+
+It can cause Operator churn when the content in etcd does not match the actual content on disk, causing Operators for the Kubernetes API server, Kubernetes controller manager, Kubernetes scheduler, and etcd to get stuck when files on disk conflict with content in etcd. This can require manual actions to resolve the issues.
+
+In extreme cases, the cluster can lose track of persistent volumes, delete critical workloads that no longer exist, reimage machines, and rewrite CA bundles with expired certificates.

Original file line number	Diff line number	Diff line change
`@@ -23,6 +23,13 @@ This also includes situations where you have lost the majority of your control p`
`23`	`23`	`+`
`24`	`24`	`If applicable, you might also need to xref:../../backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].`
`25`	`25`	`+`
	`26`	`+[WARNING]`
	`27`	`+====`
	`28`	`+Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.`
	`29`	`+`
	`30`	`+Prior to performing a restore, see xref:../../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-scenario-2-restoring-cluster-state-about_dr-restoring-cluster-state[About restoring cluster state] for more information on the impact to the cluster.`
	`31`	`+====`
	`32`	`++`
`26`	`33`	`[NOTE]`
`27`	`34`	`====`
`28`	`35`	`If you have a majority of your masters still available and have an etcd quorum, then follow the procedure to xref:../../backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].`