Skip to content

Commit c98e8c0

Browse files
Further improvements
1 parent cbaef33 commit c98e8c0

File tree

1 file changed

+22
-11
lines changed

1 file changed

+22
-11
lines changed

modules/ROOT/pages/clustering/multi-region-deployment/disaster-recovery.adoc

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ You have to create a new cluster and restore the databases, see xref:clustering/
2020

2121
Databases in clusters may be allocated differently within the cluster and may also have different numbers of primaries and secondaries.
2222

23-
image::healthy-cluster.svg[width="400", title="Healthy cluster", role=popup]
23+
image::healthy-cluster.svg[width="400", title="A healthy cluster", role=popup]
2424

2525
The consequence of this is that all servers may be different in which databases they are hosting.
2626
Losing a server in a cluster may cause some databases to lose a member while others are unaffected.
@@ -39,23 +39,28 @@ image::disaster.svg[width="400", title="Example of a cluster disaster", role=pop
3939

4040
|Database A
4141
|All allocations are lost.
42-
|The database needs to be recreated from a backup.
42+
|The database needs to be recreated from a backup since there are no available allocations left in the cluster.
4343

4444
|Database B
4545
|The primary allocation is lost, and the secondary allocation is available.
46-
|The database needs to be recreated, but can be based on available allocations in the cluster.
46+
|The database needs to be recreated since it has lost a majority of primary allocations and is therefore write-unvailable.
47+
However, the recreation can be based on the secondary allocation still present on a healthy server, so a backup is not required.
48+
The recreated database will be as up-to-date as the secondary allocation was at the time of the disaster.
4749

4850
|Database C
4951
|Two primary allocations and a secondary one are lost.
50-
|The database needs to be recreated, but can be based on available allocations in the cluster.
52+
|The database needs to be recreated since it has lost a majority of primary allocations and is therefore write-unavailable.
53+
However, the recreation can be based on the primary and secondary allocations still present on healthy servers, so a backup is not required.
54+
The recreated database will reflect the state of the most up-to-date surviving primary or secondary allocation.
5155

5256
|Database D
5357
|One primary allocation and two secondary allocations are lost.
54-
|The database will move when a server is deallocated.
58+
|The database remains write-available, allowing it to automatically move allocations from lost servers to available ones when the lost servers are deallocated.
59+
Therefore, the database does not need to be recreated even though some allocations have been lost.
5560

5661
|Database E
5762
|Stays unaffected.
58-
|No action is required.
63+
|None of the database's allocations were affected by the disaster, so no action is required.
5964
|===
6065

6166
Although databases C and D share the same topology, their primaries and secondaries are allocated differently, requiring distinct recovery strategies in this disaster example.
@@ -227,7 +232,7 @@ image::servers-cordoned.svg[width="400", title="Cordon unavailable servers", rol
227232
. For each `Cordoned` server, make sure a new *unconstrained* server has been added to the cluster to take its place.
228233
See xref:clustering/servers.adoc#cluster-add-server[Add a server to the cluster] for more information.
229234
+
230-
If servers were added in the <<make-the-system-database-write-available, Make the `system` database write-available>> step of this guide (like it is done in the current disaster recovery example), additional servers might not be needed here.
235+
If servers were added in the <<make-the-system-database-write-available, Make the `system` database write-available>> step of this guide (like it has been done in the current disaster recovery example), additional servers might not be needed here.
231236
It is important that the new servers are unconstrained, or deallocating servers might be blocked even though enough servers were added.
232237
+
233238
[NOTE]
@@ -269,13 +274,15 @@ If any database has `currentStatus` = `quarantined` on an available server, recr
269274
If you recreate databases using xref:database-administration/standard-databases/recreate-database.adoc#undefined-servers[undefined servers] or xref:database-administration/standard-databases/recreate-database.adoc#undefined-servers-backup[undefined servers with fallback backup], the store might not be recreated as up-to-date as possible in certain edge cases where the `system` database has been restored.
270275
=====
271276
+
272-
image::servers-cordoned-databases-moved.svg[width="400", title="Recreate databases", role=popup]
277+
image::servers-cordoned-databases-moved.svg[width="400", title="All write-unavailable databases were recreated", role=popup]
273278

274279
. For each `Cordoned` server, run `DEALLOCATE DATABASES FROM SERVER cordoned-server-id` on one of the available servers.
275280
This will move all database allocations from this server to an available server in the cluster.
276281
+
277282
image::servers-deallocated.svg[width="400", title="Deallocate databases from unavailable servers", role=popup]
278283
+
284+
Note that the database D was still write-available, which means the allocations can be moved from lost servers to available ones when the lost servers are deallocated.
285+
+
279286
[NOTE]
280287
=====
281288
This operation might fail if enough unconstrained servers were not added to the cluster to replace lost servers.
@@ -284,8 +291,11 @@ Another reason is that some available servers are also `Cordoned`.
284291

285292
. For each deallocating or deallocated server, run `DROP SERVER deallocated-server-id`.
286293
This removes the server from the cluster's view.
287-
288-
294+
+
295+
image::fully-recovered-cluster.svg[width="400", title="The fully recovered cluster", role="popup"]
296+
+
297+
After dropping the deallocated servers, you still have to ensure that all moved and recreated databases are write-available.
298+
For this purpose, follow the steps <<write-available-databases-steps, below>>.
289299

290300
[[make-databases-write-available]]
291301
=== Make databases write-available
@@ -323,6 +333,7 @@ Instead, check that the primary is allocated on an available server and that it
323333
A stricter verification can be done to verify that all databases are in their desired states on all servers.
324334
For the stricter check, run `SHOW DATABASES` and verify that `requestedStatus` = `currentStatus` for all database allocations on all servers.
325335

336+
[[write-available-databases-steps]]
326337
==== Path to correct state
327338

328339
Use the following steps to make all databases in the cluster write-available again.
@@ -350,7 +361,7 @@ Recreating a database will not complete if one of the following messages is disp
350361
** `No store found on any of the seeders ServerId1, ServerId2...`
351362
. For each database which will not complete recreation, recreate them from backup using xref:database-administration/standard-databases/recreate-database.adoc#uri-seed[Backup as seed].
352363

353-
image::fully-recovered-cluster.svg[width="400", title="Fully recovered cluster", role="popup"]
364+
354365

355366

356367

0 commit comments

Comments
 (0)