Skip to content

Commit 8d066fc

Browse files
author
avandras
committed
Document how to change existing setups into multisite
1 parent 9c221af commit 8d066fc

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

docs/multisite.rst

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Global DCS
2525

2626
The multisite deployment will only be as resilient as the global DCS cluster. DCS has to maintain quorum (more than half of all nodes connected to each other, being able to write the same changes). In case of a typical 3 node DCS cluster, this means quorum is 2, and if any 2 nodes share a potential failure point (e.g. being attached to the same network component), then that failure will bring the whole multisite cluster into read only mode within the multisite TTL timeout (see Configuration below).
2727

28-
Let's consider an example where there are 2 datacenters, and two of the three DCS nodes are in datacenter A. If the whole datacenter goes offline (e.g. power outage, fire, network connection to datacenter severed) then the other site in datacenter B will not be able to promote. If that site happened to be leader at the pont of the DCS failure, it will demote itself to avoid a split brain situation, thus retaining safety.
28+
Let's consider an example where there are 2 datacenters, and two of the three DCS nodes are in datacenter A. If the whole datacenter goes offline (e.g. power outage, fire, network connection to datacenter severed) then the other site in datacenter B will not be able to promote. If that site happened to be leader at the pont of the DCS failure, it would demote itself to avoid a split brain situation, thus retaining data safety.
2929

3030
In short, this means that to survive a full site outage the system needs to have at least 3 sites. To simplify things, one of the 3 sites is only required to have a single DCS node. If only 2 sites are available, then hosting this third quorum node on public cloud infrastructure is a viable option.
3131

@@ -182,6 +182,34 @@ Connections to the primary
182182
Applications should be ready to try to connect to the new primary. See 'Connecting to a multisite cluster' for more details.
183183

184184

185+
Connecting to a multisite cluster
186+
---------------------------------
187+
188+
# TODO: multi-host connstring, HAProxy (differences from a normal Patroni), possible use of vip-manager (one endpoint per site)
189+
190+
191+
Transforming an existing setup into multisite
192+
---------------------------------------------
193+
194+
If the present setup consists of a standby cluster replicating from a leader site, the following steps have to be performed:
195+
196+
1. Set up the global DCS
197+
1.1 if a separate DCS cluster is going to be used, set up the new cluster as usual (one node in both Patroni sites, and a third node in a third site)
198+
2. Enable multisite on leader site's Patroni cluster
199+
2.1 apply the multisite config to all nodes' Patroni config files
200+
2.2 reload local configuration on the leader site cluster's nodes (`patronictl reload`)
201+
2.3 check if `patronictl list` shows an extra line saying 'Multisite <leader-site> is leader'
202+
3. Enable multisite on the standby cluster
203+
3.1 repeat the steps from 2. on the standby cluster
204+
3.2 after reloading the config, you should see `patronictl list` saying 'Multisite <standby-site> is standby, replicating from <leader-site>'
205+
4. Remove `standby_cluster` specification from the dynamic config
206+
4.1 use `patronictl edit-config` to remove all lines belonging to the standby cluster definition
207+
208+
If the present setup is one Patroni cluster over two sites, first turn that setup into a stanby cluster setup, and perform the above steps to enable multisite.
209+
210+
Moving from an existing Postgres setup to multisite can be achieved by setting up a full multisite cluster which is still replicating from the original primary. This can be achieved by using the usual standby cluster specification, this time on the leader site's cluster. On cutover simply remove the standby cluster specification, thus promoting the leader site.
211+
212+
185213
Glossary
186214
++++++++
187215

0 commit comments

Comments
 (0)