Skip to content

Commit ccc0826

Browse files
committed
WIP
1 parent 81332e9 commit ccc0826

File tree

1 file changed

+82
-56
lines changed

1 file changed

+82
-56
lines changed

modules/ROOT/pages/clustering/disaster-recovery.adoc

Lines changed: 82 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -6,48 +6,48 @@
66
A database can become unavailable due to issues on different system levels.
77
For example, a data center failover may lead to the loss of multiple servers, which may cause a set of databases to become unavailable.
88

9-
This section contains a step-by-step guide on how to recover _unavailable databases_ that are incapable of serving writes, while still may be able to serve reads.
9+
This section contains a step-by-step guide on how to recover *unavailable databases* that are incapable of serving writes, while possibly still being able to serve reads.
1010
However, if a database is not performing as expected for other reasons, this section cannot help.
11-
By following the steps outlined here, you can recover the unavailable databases and make them fully operational with minimal impact on the other databases in the cluster.
11+
By following the steps outlined here, you can recover the unavailable databases and make them fully operational, with minimal impact on the other databases in the cluster.
1212

13-
[NOTE]
13+
[CAUTION]
1414
====
15-
If *all* servers in a Neo4j cluster are lost in a data center failover, it is not possible to recover the current cluster.
16-
You have to create a new cluster and restore the databases.
17-
See xref:clustering/setup/deploy.adoc[Deploy a basic cluster] and xref:clustering/databases.adoc#cluster-seed[Seed a database] for more information.
15+
If *all* servers in a Neo4j cluster are lost in a disaster, it is not possible to recover the current cluster.
16+
You have to create a new cluster and restore the databases, see xref:clustering/setup/deploy.adoc[Deploy a basic cluster] and xref:clustering/databases.adoc#cluster-seed[Seed a database] for more information.
1817
====
1918

2019
== Faults in clusters
2120

2221
Databases in clusters follow an allocation strategy.
2322
This means that they are allocated differently within the cluster and may also have different numbers of primaries and secondaries.
24-
Furthermore, some databases may not be allowed to be allocated to some servers because of user defined strategies.
25-
The consequence of this is that all servers may be different in which databases they are hosting and are allowed to host.
23+
The consequence of this is that all servers may be different in which databases they are hosting.
2624
Losing a server in a cluster may cause some databases to lose a member while others are unaffected.
2725
Therefore, in a disaster where one or more servers go down, some databases may keep running with little to no impact, while others may lose all their allocated resources.
2826

2927
== Guide structure
28+
[NOTE]
29+
====
30+
In this guide, an _offline_ server is a server that is not running but may be restartable.
31+
A _lost_ server, however, is a server that is currently not running and cannot be restarted.
32+
A _write available_ database is able to serve writes, while a _write unavailable_ database is not.
33+
====
34+
3035
There are three main steps to recovering a cluster from a disaster.
31-
First, ensure the `system` database is write available i.e. able to accept writes.
32-
Then, detach any potential lost servers and replace them by new ones.
33-
Finish disaster recovery by starting or continuing to manage databases and verify that they are available.
36+
First, ensure the `system` database is write available.
37+
Then, detach any potential lost servers from the cluster and replace them by new ones.
38+
Finish disaster recovery by starting or continuing to manage databases and verify that they are write available.
3439

35-
Every step consists of the following four sections:
40+
Every step consists of the following three sections:
3641

37-
. State that needs to be verified.
38-
. Example of how the state can be verified.
39-
. Motivation for why this state is necessary.
40-
. Path to correct state.
42+
. A state that needs to be verified, with optional motivation.
43+
. An example of how the state can be verified.
44+
. A proposed series of steps to get to the correct state.
4145

4246
[CAUTION]
4347
====
4448
Verifying each state before continuing to the next step, regardless of the disaster scenario, is recommended to ensure the cluster is fully operational.
45-
4649
====
4750

48-
In this section, an _offline_ server is a server that is not running but may be _restartable_.
49-
A _lost_ server, however, is a server that is currently not running and cannot be restarted.
50-
5151

5252
== Guide to disaster recovery
5353

@@ -68,14 +68,14 @@ See xref:clustering/setup/routing.adoc#clustering-routing[Server-side routing] f
6868

6969
==== State
7070
====
71-
The `system` database is write available, i.e. able to accept writes.
71+
The `system` database is write available.
7272
====
7373

74-
==== Motivation
75-
The `system` database contains the view of the cluster. This includes which servers and databases are present and how they are configured.
76-
During a disaster, the goal is to change the view of the cluster, for example by removing and adding servers or recreating databases.
77-
In order for the view to be updated, the `system` database needs to be write available.
78-
Therefore, it is vital to ensure it is available so that the next steps are possible to execute.
74+
The `system` database contains the view of the cluster.
75+
This includes which servers and databases are present, where they live and how they are configured.
76+
During a disaster, the view of the cluster might need to change to reflect a new reality, for example by removing lost servers.
77+
Databases might also need to be recreated to regain write availability.
78+
Because both of these steps are executed by writing to the `system` database, this is a vital first step during disaster recovery.
7979

8080
==== Example verification
8181
The `system` database's write availability can be verified by using the xref:clustering/monitoring/status-check.adoc#monitoring-replication[Status check] procedure.
@@ -93,7 +93,7 @@ CALL dbms.cluster.statusCheck(["system"]);
9393
==== Path to correct state
9494
The following steps can be used to regain write availability for the `system` database if it has been lost.
9595
They create a new `system` database from the most up-to-date copy of the `system` database that can be found in the cluster.
96-
It is important to get a `system` database that is as up-to-date as possible, so that future commands operate on state that is as correct as possible.
96+
It is important to get a `system` database that is as up-to-date as possible, so it corresponds to the view before the disaster closely.
9797

9898
.Guide
9999
[%collapsible]
@@ -110,13 +110,14 @@ This causes downtime for all databases in the cluster until the processes are st
110110
. On each server, run `bin/neo4j-admin database info system` and compare the `lastCommittedTransaction` to find out which server has the most up-to-date copy of the `system` database.
111111
. On the most up-to-date server, run `bin/neo4j-admin database dump system --to-path=[path-to-dump]` to take a dump of the current `system` database and store it in an accessible location.
112112
. For every _lost_ server, add a new *unconstrained* one according to xref:clustering/servers.adoc#cluster-add-server[Add a server to the cluster].
113-
It is important that the new servers are unconstrained, or deallocating servers might be blocked even though enough servers was added.
113+
It is important that the new servers are unconstrained, or deallocating servers might be blocked even though enough servers were added.
114114
+
115115
[NOTE]
116116
=====
117-
While recommended to avoid cluster overload, it is not strictly necessary to add servers in this step.
117+
While recommended, it is not strictly necessary to add new servers in this step.
118118
There is also an option to change the `system` database mode (`server.cluster.system_database_mode`) on secondary allocations to make them primary allocations for the new `system` database.
119119
The amount of primary allocations needed is defined by `dbms.cluster.minimum_initial_system_primaries_count`, see the xref:configuration/configuration-settings.adoc#config_dbms.cluster.minimum_initial_system_primaries_count[Configuration settings] for more information.
120+
Not replacing servers can cause cluster overload when databases are moved from lost servers to available ones in the next step of this guide.
120121
=====
121122
+
122123
. On each server, run `bin/neo4j-admin database load system --from-path=[path-to-dump] --overwrite-destination=true` to load the current `system` database dump.
@@ -133,11 +134,10 @@ The amount of primary allocations needed is defined by `dbms.cluster.minimum_ini
133134
All servers in the cluster's view are available and enabled.
134135
====
135136

136-
==== Motivation
137-
// different stuffs here
138-
Following the loss of one or more servers, the cluster's view of servers must be updated.
139-
This includes moving allocations on the lost servers onto servers which are actually in the cluster
140-
This includes identifying the lost servers and replacing them by new ones.
137+
A lost server will still be in the `system` database's view of the cluster, but in an unavailable state.
138+
According to the view of the cluster, these lost servers are still hosting the databases they had before they became lost.
139+
Therefore, removing lost servers is not as easy as informing the `system` database that they are lost.
140+
It also includes moving requested allocations on the lost servers onto servers which are actually in the cluster, so that those databases' topologies are still satisfied.
141141

142142
==== Example verification
143143
The cluster's view of servers can be seen by listing the servers, see xref:clustering/servers.adoc#_listing_servers[Listing servers] for more information.
@@ -149,7 +149,9 @@ SHOW SERVERS;
149149
----
150150

151151
==== Path to correct state
152-
Detach lost servers and add new ones to the cluster
152+
The following steps can be used to remove lost servers and add new ones to the cluster.
153+
They include moving any potential database allocations from lost servers to available servers in the cluster.
154+
These steps might also recreate some databases, since a database which has lost a majority of its primary allocations cannot be moved from one server to another.
153155

154156
.Guide
155157
[%collapsible]
@@ -158,16 +160,19 @@ Detach lost servers and add new ones to the cluster
158160
This prevents new database allocations from being moved to this server.
159161
. For each `CORDONED` server, make sure a new *unconstrained* server has been added to the cluster to take its place, see xref:clustering/servers.adoc#cluster-add-server[Add a server to the cluster] for more information.
160162
If servers were added in the 'System database write availability' step of this guide, additional servers might not be needed here.
163+
It is important that the new servers are unconstrained, or deallocating servers might be blocked even though enough servers were added.
161164
162165
+
163166
[NOTE]
164167
=====
165168
While recommended, it is not strictly necessary to add new servers in this step.
166-
However, not adding new servers reduces the capacity of the cluster to handle work and might require the topology for a database to be altered to make deallocations and recreations possible.
169+
However, not adding new servers reduces the capacity of the cluster to handle work.
170+
Furthermore, it might require the topology for a database to be altered to make deallocating servers and recreating databases possible.
167171
=====
168172
173+
// ? from here
169174
. For each `CORDONED` server, run `DEALLOCATE DATABASES FROM SERVER cordoned-server-id` on one of the available servers.
170-
This will try to move all database allocations from this server to another server in the cluster.
175+
This will try to move all database allocations from this server to an available server in the cluster.
171176
Once a server is `DEALLOCATED`, all allocated user databases on this server has been moved successfully.
172177
+
173178
[NOTE]
@@ -178,6 +183,7 @@ Therefore, an allocation with `currentStatus` = `DEALLOCATING` should reach the
178183
. If any deallocations failed, make them possible by executing the following steps:
179184
.. Run `SHOW DATABASES`. If a database show `currentStatus`= `offline` this database has been stopped.
180185
.. For each stopped database that has at least one allocation on any of the `CORDONED` servers, start them by running `START DATABASE stopped-db WAIT`.
186+
This is necessary since stopped databases cannot be moved from one server to another.
181187
+
182188
[NOTE]
183189
=====
@@ -188,7 +194,7 @@ A database can be set to `READ-ONLY` before it is started to avoid updates on a
188194
Depending on the environment, consider extending the timeout for this procedure.
189195
If any of the primary allocations for a database report `replicationSuccessful` = `TRUE`, this database is write available.
190196
191-
.. For each database that is not write available, recreate it to regain write availability.
197+
.. For each database that is not write available, recreate it to move it from lost servers and regain write availability.
192198
Go to xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information about recreate options.
193199
Remember to make sure there are recent backups for the databases before recreating them, see xref:backup-restore/online-backup.adoc[Online backup] for more information.
194200
+
@@ -199,42 +205,62 @@ Otherwise, recreating with xref:clustering/databases.adoc#uri-seed[Backup as see
199205
=====
200206
.. Return to step 3 to retry deallocating all servers.
201207
. For each deallocated server, run `DROP SERVER deallocated-server-id`.
202-
This safely removes the server from the cluster view.
208+
This safely removes the server from the cluster's view.
203209
210+
// ? to here really
204211
====
205212

206213

207214
[[recover-databases]]
208215
=== Database availability
209216

210-
Once the `system` database and all servers are available, manage and verify that all databases are in the desired state.
211-
212-
. Run `CALL dbms.cluster.statusCheck([])` on all servers, see xref:clustering/monitoring/status-check.adoc#monitoring-replication[Monitoring replication] for more information.
213-
Depending on the environment, consider extending the timeout for this procedure.
214-
If any of the primary allocations for a database report `replicationSuccessful` = `TRUE`, this database is write available.
215-
If all databases are write available, disaster recovery is complete.
216-
+
217-
[NOTE]
217+
==== State
218218
====
219-
Remember that previously stopped databases might have been started during this process.
219+
All databases are write available.
220220
====
221221

222-
. Recreate every database that is not write available and has not been recreated previously, see xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information.
223-
Remember to make sure there are recent backups for the databases before recreating them, see xref:backup-restore/online-backup.adoc[Online backup] for more information.
224-
. Run `SHOW DATABASES` and check any recreated databases which are not write available.
222+
Once this state is verified, disaster recovery is complete.
223+
However, remember that previously stopped databases might have been started during this process.
224+
If they are still desired to be in stopped state, run `START DATABASE started-db WAIT`.
225225

226-
+
227226
[NOTE]
228227
====
229-
Remember, recreating a database can take an unbounded amount of time since it may involve copying the store to a new server, as described in xref:clustering/databases.adoc#recreate-databases[Recreate databases].
228+
Remember, recreating a database can take an unbounded amount of time since it may involve copying the store to a new server, as described in xref:clustering/databases.adoc#recreate-databases[Recreate databases].
230229
Therefore, an allocation with `currentStatus` = `STARTING` might reach the `requestedStatus` given some time.
231230
====
231+
232+
==== Example verification
233+
All databases' write availability can be verified by using the xref:clustering/monitoring/status-check.adoc#monitoring-replication[Status check] procedure.
234+
The procedure should be called on all servers in the cluster, in order to provide the correct view.
235+
The status check procedure writes a dummy transaction, and therefore the correctness of the procedure depends on the given timeout.
236+
The default timeout is 1 second, but depending on the network latency in the environment it might need to be extended.
237+
If any of the primary allocations for a database report `replicationSuccessful` = `TRUE`, this database is write available.
238+
Therefore, the desired state has been verified when this is true for all databases.
239+
240+
[source, shell]
241+
----
242+
CALL dbms.cluster.statusCheck([]);
243+
----
244+
245+
A stricter verification could be done to verify if all databases are in desired states on all servers.
246+
For the stricter check, run `SHOW DATABASES` and verify that `requestedStatus` = `currentStatus` for all database allocations on all servers.
247+
248+
==== Path to correct state
249+
The following steps can be used to make all databases in the cluster write available again.
250+
They include recreating any databases that are not write available, as well as identifying any recreations which will not complete.
251+
Recreations might fail for different reasons, but one example is that the checksums does not match for the same transaction on different copies.
252+
253+
.Guide
254+
[%collapsible]
255+
====
256+
. Run `CALL dbms.cluster.statusCheck([])` on all servers to identify write unavailable databases, see xref:clustering/monitoring/status-check.adoc#monitoring-replication[Monitoring replication] for more information.
257+
. Recreate every database that is not write available and has not been recreated previously, see xref:clustering/databases.adoc#recreate-databases[Recreate databases] for more information.
258+
Remember to make sure there are recent backups for the databases before recreating them, see xref:backup-restore/online-backup.adoc[Online backup] for more information.
259+
. Run `SHOW DATABASES` and check any recreated databases which are not write available.
232260
Recreating a database will not complete if one of the following messages is displayed in the message field:
233261
** `Seeders ServerId1 and ServerId2 have different checksums for transaction TransactionId. All seeders must have the same checksum for the same append index.`
234262
** `Seeders ServerId1 and ServerId2 have incompatible storeIds. All seeders must have compatible storeIds.`
235263
** `No store found on any of the seeders ServerId1, ServerId2...`
236-
+
237-
238264
. For each database which will not complete recreation, recreate them from backup using xref:clustering/databases.adoc#uri-seed[Backup as seed] or define seeding servers in the recreate procedure using xref:clustering/databases.adoc#specified-servers[Specified seeders] so that problematic allocations are excluded.
239-
. Return to step 1 to make sure all databases are in their desired state.
240265
266+
====

0 commit comments

Comments
 (0)