You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/ROOT/pages/clustering/disaster-recovery.adoc
+19-13Lines changed: 19 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,14 +30,14 @@ In this guide the following terms are used:
30
30
31
31
* An _offline_ server is a server that is not running but may be restartable.
32
32
* A _lost_ server, however, is a server that is currently not running and cannot be restarted.
33
-
* A _write-available_ database is able to serve writes, while a _writeunavailable_ database is not.
33
+
* A _write-available_ database is able to serve writes, while a _write-unavailable_ database is not.
34
34
====
35
35
36
36
There are four steps to recovering a cluster from a disaster:
37
37
38
38
. Start the Neo4j process on all servers which are not _lost_.
39
39
See xref:start-the-neo4j-process[Start the Neo4j process] for more information.
40
-
. Make the `system` database able to accept write operations, so that the cluster can be modified.
40
+
. Make the `system` database able to serve write operations, so that the cluster can be modified.
41
41
See xref:make-the-system-database-write-available[Make the `system` database write-available] for more information.
42
42
. Detach any potential lost servers from the cluster and replace them by new ones.
43
43
See xref:make-servers-available[Make servers available] for more information.
@@ -85,14 +85,14 @@ The server may have to be considered indefinitely lost.
85
85
86
86
==== Objective
87
87
====
88
-
The `system` database is able to accept write operations.
88
+
The `system` database is able to serve write operations.
89
89
====
90
90
91
91
The `system` database contains the view of the cluster.
92
92
This includes which servers and databases are present, where they live and how they are configured.
93
93
During a disaster, the view of the cluster might need to change to reflect a new reality, such as removing lost servers.
94
94
Databases might also need to be recreated to regain write availability.
95
-
Because both of these steps are executed by modifying the `system` database, making the `system` database write-enabled is a vital first step during disaster recovery.
95
+
Because both of these steps are executed by modifying the `system` database, making the `system` database write-available is a vital first step during disaster recovery.
96
96
97
97
==== Verifying the state
98
98
@@ -143,7 +143,8 @@ Be aware that not replacing servers can cause cluster overload when databases ar
143
143
=====
144
144
+
145
145
. On each server, run `bin/neo4j-admin database load system --from-path=[path-to-dump] --overwrite-destination=true` to load the current `system` database dump.
146
-
. On each server, ensure that the discovery settings are correct, see xref:clustering/setup/discovery.adoc[Cluster server discovery] for more information.
146
+
. On each server, ensure that the discovery settings are correct.
147
+
See xref:clustering/setup/discovery.adoc[Cluster server discovery] for more information.
147
148
. Start the Neo4j process on all servers.
148
149
====
149
150
@@ -162,7 +163,8 @@ Therefore, informing the cluster of servers which are lost is not enough.
162
163
The databases hosted on lost servers also need to be moved onto available servers in the cluster, before the lost servers can be removed.
163
164
164
165
==== Verifying the state
165
-
The cluster's view of servers can be seen by listing the servers, see xref:clustering/servers.adoc#_listing_servers[Listing servers] for more information.
166
+
The cluster's view of servers can be seen by listing the servers.
167
+
See xref:clustering/servers.adoc#_listing_servers[Listing servers] for more information.
166
168
The state has been verified if *all* servers show `health` = `Available` and `status` = `Enabled`.
167
169
168
170
[source, cypher]
@@ -173,7 +175,7 @@ SHOW SERVERS;
173
175
==== Path to correct state
174
176
Use the following steps to remove lost servers and add new ones to the cluster.
175
177
To remove lost servers, any allocations they were hosting must be moved to available servers in the cluster.
176
-
This can be done in two different ways:
178
+
This is done in two different steps:
177
179
178
180
* Any allocations that cannot move by themselves require the database to be recreated so that they are forced to move.
179
181
* Any allocations that can move will be instructed to do so by deallocating the server.
@@ -208,7 +210,8 @@ A database can be set to `READ-ONLY` before it is started to avoid updates on th
208
210
`ALTER DATABASE database-name SET ACCESS READ ONLY`.
209
211
=====
210
212
211
-
. On each server, run `CALL dbms.cluster.statusCheck([])` to check the write availability for all databases running in primary mode on this server, see xref:clustering/monitoring/status-check.adoc#monitoring-replication[Monitoring replication] for more information.
213
+
. On each server, run `CALL dbms.cluster.statusCheck([])` to check the write availability for all databases running in primary mode on this server.
214
+
See xref:clustering/monitoring/status-check.adoc#monitoring-replication[Monitoring replication] for more information.
212
215
+
213
216
[NOTE]
214
217
=====
@@ -218,7 +221,8 @@ Instead, check that the primary is allocated on an available server and that it
218
221
219
222
. For each database that is not write-available, recreate it to move it from lost servers and regain write availability.
220
223
Go to xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information about recreate options.
221
-
Remember to make sure there are recent backups for the databases before recreating them, see xref:backup-restore/online-backup.adoc[Online backup] for more information.
224
+
Remember to make sure there are recent backups for the databases before recreating them.
225
+
See xref:backup-restore/online-backup.adoc[Online backup] for more information.
222
226
If any database has `currentStatus` = `quarantined` on an available server, recreate them from backup using xref:clustering/databases.adoc#uri-seed[Backup as seed].
223
227
+
224
228
[CAUTION]
@@ -278,16 +282,18 @@ For the stricter check, run `SHOW DATABASES` and verify that `requestedStatus` =
278
282
279
283
==== Path to correct state
280
284
Use the following steps to make all databases in the cluster write-available again.
281
-
They include recreating any databases that are not write-capable and identifying any recreations that will not complete.
285
+
They include recreating any databases that are not write-available and identifying any recreations that will not complete.
282
286
Recreations might fail for different reasons, but one example is that the checksums do not match for the same transaction on different servers.
283
287
284
288
.Guide
285
289
[%collapsible]
286
290
====
287
-
. Identify all writeunavailable databases by running `CALL dbms.cluster.statusCheck([])` as described in the xref:clustering/disaster-recovery.adoc#example-verification[Example verification] part of this disaster recovery step.
291
+
. Identify all write-unavailable databases by running `CALL dbms.cluster.statusCheck([])` as described in the xref:clustering/disaster-recovery.adoc#example-verification[Example verification] part of this disaster recovery step.
288
292
Filter out all databases desired to be stopped, so that they are not recreated unnecessarily.
289
-
. Recreate every database that is not write-available and has not been recreated previously, see xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information.
290
-
Remember to make sure there are recent backups for the databases before recreating them, see xref:backup-restore/online-backup.adoc[Online backup] for more information.
293
+
. Recreate every database that is not write-available and has not been recreated previously.
294
+
See xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information.
295
+
Remember to make sure there are recent backups for the databases before recreating them.
296
+
See xref:backup-restore/online-backup.adoc[Online backup] for more information.
291
297
If any database has `currentStatus` = `quarantined` on an available server, recreate them from backup using xref:clustering/databases.adoc#uri-seed[Backup as seed].
0 commit comments