Skip to content

Commit f556f00

Browse files
deads2kbergerhoffer
authored andcommitted
add warning about what disaster recovery does to a running cluster
1 parent 93b37c3 commit f556f00

File tree

3 files changed

+34
-0
lines changed

3 files changed

+34
-0
lines changed

backup_and_restore/disaster_recovery/about-disaster-recovery.adoc

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,13 @@ This also includes situations where you have lost the majority of your control p
2323
+
2424
If applicable, you might also need to xref:../../backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-recovering-expired-certs[recover from expired control plane certificates].
2525
+
26+
[WARNING]
27+
====
28+
Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This procedure should only be used as a last resort.
29+
30+
Prior to performing a restore, see xref:../../backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc#dr-scenario-2-restoring-cluster-state-about_dr-restoring-cluster-state[About restoring cluster state] for more information on the impact to the cluster.
31+
====
32+
+
2633
[NOTE]
2734
====
2835
If you have a majority of your masters still available and have an etcd quorum, then follow the procedure to xref:../../backup_and_restore/replacing-unhealthy-etcd-member.adoc#replacing-unhealthy-etcd-member[replace a single unhealthy etcd member].

backup_and_restore/disaster_recovery/scenario-2-restoring-cluster-state.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,8 @@ toc::[]
77

88
To restore the cluster to a previous state, you must have previously xref:../../backup_and_restore/backing-up-etcd.adoc#backing-up-etcd-data_backup-etcd[backed up etcd data] by creating a snapshot. You will use this snapshot to restore the cluster state.
99

10+
// About restoring to a previous cluster state
11+
include::modules/dr-restoring-cluster-state-about.adoc[leveloffset=+1]
12+
1013
// Restoring to a previous cluster state
1114
include::modules/dr-restoring-cluster-state.adoc[leveloffset=+1]
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * disaster_recovery/scenario-2-restoring-cluster-state.adoc
4+
5+
[id="dr-scenario-2-restoring-cluster-state-about_{context}"]
6+
= About restoring cluster state
7+
8+
You can use an etcd backup to restore your cluster to a previous state. This can be used to recover from the following situations:
9+
10+
* The cluster has lost the majority of control plane hosts (quorum loss).
11+
* An administrator has deleted something critical and must restore to recover the cluster.
12+
13+
[WARNING]
14+
====
15+
Restoring to a previous cluster state is a destructive and destablizing action to take on a running cluster. This should only be used as a last resort.
16+
17+
If you are able to retrieve data using the Kubernetes API server, then etcd is available and you should not restore using an etcd backup.
18+
====
19+
20+
Restoring etcd effectively takes a cluster back in time and all clients will experience a conflicting, parallel history. This can impact the behavior of watching components like kubelets, Kubernetes controller managers, SDN controllers, and persistent volume controllers.
21+
22+
It can cause Operator churn when the content in etcd does not match the actual content on disk, causing Operators for the Kubernetes API server, Kubernetes controller manager, Kubernetes scheduler, and etcd to get stuck when files on disk conflict with content in etcd. This can require manual actions to resolve the issues.
23+
24+
In extreme cases, the cluster can lose track of persistent volumes, delete critical workloads that no longer exist, reimage machines, and rewrite CA bundles with expired certificates.

0 commit comments

Comments
 (0)