Skip to content

Commit 346f3a8

Browse files
renetapopovajackwaudbyNataliaIvakina
authored
A short editorial review of disaster recovery page, update the misleading steps, and update the error message (#1547)
Co-authored-by: Jack Waudby <[email protected]> Co-authored-by: NataliaIvakina <[email protected]>
1 parent 3926fb5 commit 346f3a8

File tree

2 files changed

+34
-35
lines changed

2 files changed

+34
-35
lines changed

modules/ROOT/pages/clustering/disaster-recovery.adoc

Lines changed: 28 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,18 @@
33
[[cluster-recovery]]
44
= Disaster recovery
55

6-
Databases can become unavailable for different reasons.
7-
For the purpose of this section, an _unavailable database_ is defined as a database that is incapable of serving writes, while still may be able to serve reads.
8-
Databases not performing as expected for other reasons are not considered unavailable and cannot be helped by this section.
9-
//Refer to <<link to error handling section, TBD>> for more information on troubleshooting.
10-
This section contains a step-by-step guide on how to recover databases that have become unavailable.
11-
By performing the actions described here, the unavailable databases are recovered and made fully operational with as little impact as possible on the other databases in the cluster.
6+
A database can become unavailable due to issues on different system levels.
7+
For example, a data center failover may lead to the loss of multiple servers, which may cause a set of databases to become unavailable.
8+
It is also possible for databases to become quarantined due to a critical failure in the system, which may lead to unavailability even without the loss of servers.
129

13-
There are many reasons why a database becomes unavailable and it can be caused by issues on different levels in the system.
14-
For example, a data-center failover may lead to the loss of multiple serves which in turn may cause a set of databases to become unavailable.
15-
It is also possible for databases to become quarantined due to a critical failure in the system which may lead to unavailability even without loss of servers.
10+
This section contains a step-by-step guide on how to recover _unavailable databases_ that are incapable of serving writes, while still may be able to serve reads.
11+
However, if a database is not performing as expected for other reasons, this section cannot help.
12+
By following the steps outlined here, you can recover the unavailable databases and make them fully operational with minimal impact on the other databases in the cluster.
1613

1714
[NOTE]
1815
====
19-
If *all* servers in a Neo4j cluster are lost in a data-center failover, it is not possible to recover the current cluster.
20-
A new cluster has to be created and the databases restored.
16+
If *all* servers in a Neo4j cluster are lost in a data center failover, it is not possible to recover the current cluster.
17+
You have to create a new cluster and restore the databases.
2118
See xref:clustering/setup/deploy.adoc[Deploy a basic cluster] and xref:clustering/databases.adoc#cluster-seed[Seed a database] for more information.
2219
====
2320

@@ -31,22 +28,22 @@ Consequently, in a disaster where multiple servers go down, some databases may k
3128

3229
== Guide to disaster recovery
3330

34-
There are three main steps to recover a cluster from a disaster.
35-
Depending on the disaster scenario, some steps may not be required, but it is recommended to complete each step in order to ensure that the cluster is fully operational.
31+
There are three main steps to recovering a cluster from a disaster.
32+
Completing each step, regardless of the disaster scenario, is recommended to ensure the cluster is fully operational.
3633

37-
The first step is to ensure that the `system` database is available in the cluster.
38-
The `system` database defines the configuration for the other databases and therefore it is vital to ensure that it is available before doing anything else.
34+
. Ensure the `system` database is available in the cluster.
35+
The `system` database defines the configuration for the other databases; therefore, it is vital to ensure it is available before doing anything else.
3936

40-
Once the `system` database's availability is verified, whether it was recovered or unaffected by the disaster, the next step is to recover lost servers to make sure the cluster's topology requirements are met.
37+
. After the `system` database's availability is verified, whether recovered or unaffected by the disaster, recover the lost servers to ensure the cluster's topology meets the requirements.
4138

42-
Only after the `system` database is available and the cluster topology is satisfied, can the databases be managed.
39+
. After the `system` database is available and the cluster's topology is satisfied, you can manage the databases.
4340

4441
The steps are described in detail in the following sections.
4542

4643
[NOTE]
4744
====
4845
In this section, an _offline_ server is a server that is not running but may be _restartable_.
49-
A _lost_ server however, is a server that is currently not running and cannot be restarted.
46+
A _lost_ server, however, is a server that is currently not running and cannot be restarted.
5047
====
5148

5249
[NOTE]
@@ -66,16 +63,16 @@ The `system` database is required for clusters to function properly.
6663
The server may have to be considered indefinitely lost.)
6764
. *Validate the `system` database's availability.*
6865
.. Run `SHOW DATABASE system`.
69-
If the response doesn't contain a writer, the `system` database is unavailable and needs to be recovered, continue to step 3.
66+
If the response does not contain a writer, the `system` database is unavailable and needs to be recovered, continue to step 3.
7067
.. Optionally, you can create a temporary user to validate the `system` database's writability by running `CREATE USER 'temporaryUser' SET PASSWORD 'temporaryPassword'`.
71-
... Confirm that the query was executed successfully and the temporary user was created as expected, by running `SHOW USERS`, then continue to xref:clustering/disaster-recovery.adoc#recover-servers[Recover servers].
68+
.. Confirm that the temporary user is created as expected, by running `SHOW USERS`, then continue to xref:clustering/disaster-recovery.adoc#recover-servers[Recover servers].
7269
If not, continue to step 3.
7370
+
7471
. *Restore the `system` database.*
7572
+
7673
[NOTE]
7774
====
78-
Only do the steps below if the `system` database's availability could not be validated by the first two steps in this section.
75+
Only do the steps below if the `system` database's availability cannot be validated by the first two steps in this section.
7976
====
8077
+
8178
[NOTE]
@@ -86,7 +83,7 @@ This method prevents downtime for the other databases in the cluster.
8683
If this is the case, ie. if a majority of servers are still available, follow the instructions in <<recover-servers>>.
8784
====
8885
+
89-
The following steps creates a new `system` database from a backup of the current `system` database.
86+
The following steps create a new `system` database from a backup of the current `system` database.
9087
This is required since the current `system` database has lost too many members in the server failover.
9188

9289
.. Shut down the Neo4j process on all servers.
@@ -114,14 +111,16 @@ The steps here identify the lost servers and safely detach them from the cluster
114111

115112
. Run `SHOW SERVERS`.
116113
If *all* servers show health `AVAILABLE` and status `ENABLED` continue to xref:clustering/disaster-recovery.adoc#recover-databases[Recover databases].
117-
. On each `UNAVAILABLE` server, run `CALL dbms.cluster.cordonServer("unavailable-server-id")`.
118-
. On each `CORDONED` server, run `DEALLOCATE DATABASES FROM SERVER cordoned-server-id`.
119-
. On each server that failed to deallocate with one of the following messages:
120-
.. `Could not deallocate server [server]. Can't move databases with only one primary [database].`
114+
. For each `UNAVAILABLE` server, run `CALL dbms.cluster.cordonServer("unavailable-server-id")` on one of the available servers.
115+
. For each `CORDONED` server, run `DEALLOCATE DATABASES FROM SERVER cordoned-server-id` on one of the available servers.
116+
. For each server that failed to deallocate with one of the following messages:
117+
.. `Could not deallocate server(s) 'serverId'. Unable to reallocate 'DatabaseId.\*'. +
118+
Required topology for 'DatabaseId.*' is 3 primaries and 0 secondaries. +
119+
Consider running SHOW SERVERS to determine what action is suitable to resolve this issue.`
121120
+
122121
or
123122
+
124-
`Could not deallocate server(s) [server].
123+
`Could not deallocate server(s) `serverId`.
125124
Database [database] has lost quorum of servers, only found [existing number of primaries] of [expected number of primaries].
126125
Cannot be safely reallocated.`
127126
+
@@ -143,7 +142,7 @@ A database can be set to `READ-ONLY`-mode before it is started to avoid updates
143142
.. `Could not deallocate server [server]. Reallocation of [database] not possible, no new target found. All existing servers: [existing-servers]. Actual allocated server with mode [mode] is [current-hostings].`
144143
+
145144
Add new servers and enable them and then return to step 3, see xref:clustering/servers.adoc#cluster-add-server[Add a server to the cluster] for more information.
146-
. Run `SHOW SERVERS YIELD *` once all enabled servers host the requested databases (`hosting`-field contains exactly the databases in the `requestedHosting` field), proceed to the next step.
145+
. Run `SHOW SERVERS YIELD *` once all enabled servers host the requested databases (`hosting`-field contains exactly the databases in the `requestedHosting` field), and proceed to the next step.
147146
Note that this may take a few minutes.
148147
. For each deallocated server, run `DROP SERVER deallocated-server-id`.
149148
. Return to step 1.
@@ -154,7 +153,7 @@ Note that this may take a few minutes.
154153
Once the `system` database is verified available, and all servers are online, the databases can be managed.
155154
The steps here aim to make the unavailable databases available.
156155

157-
. If you have previously dropped databases as part of this guide, re-create each one from backup.
156+
. If you have previously dropped databases as part of this guide, re-create each one from a backup.
158157
See the xref:database-administration/standard-databases/create-databases.adoc[Create databases] section for more information on how to create a database.
159158
. Run `SHOW DATABASES`.
160159
If all databases are in desired states on all servers (`requestedStatus`=`currentStatus`), disaster recovery is complete.

package-lock.json

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)