11# pg_auto_failover
22
3- [ ![ Documentation Status] ( https://readthedocs.org/projects/pg-auto-failover/badge/?version=master )] ( https://pg-auto-failover.readthedocs.io/en/master /?badge=master )
3+ [ ![ Documentation Status] ( https://readthedocs.org/projects/pg-auto-failover/badge/?version=main )] ( https://pg-auto-failover.readthedocs.io/en/main /?badge=main )
44
55pg_auto_failover is an extension and service for PostgreSQL that monitors
66and manages automated failover for a Postgres cluster. It is optimized for
@@ -13,22 +13,24 @@ and secondary by the monitor.
1313
1414![ pg_auto_failover Architecture with 2 nodes] ( docs/tikz/arch-single-standby.svg?raw=true " pg_auto_failover Architecture with 2 nodes ")
1515
16- The pg_auto_failover Monitor implements a state machine and relies on in-core
17- PostgreSQL facilities to deliver HA. For example. when the ** secondary** node
18- is detected to be unavailable, or when its lag is too much, then the
19- Monitor removes it from the ` synchronous_standby_names ` setting on the
20- ** primary** node. Until the ** secondary** is back to being monitored healthy,
21- failover and switchover operations are not allowed, preventing data loss.
16+ The pg_auto_failover Monitor implements a state machine and relies on
17+ in-core PostgreSQL facilities to deliver HA. For example. when the
18+ ** secondary** node is detected to be unavailable, or when its lag is too
19+ much, then the Monitor removes it from the ` synchronous_standby_names `
20+ setting on the ** primary** node. Until the ** secondary** is back to being
21+ monitored healthy, failover and switchover operations are not allowed,
22+ preventing data loss.
2223
2324pg_auto_failover consists of the following parts:
2425
2526 - a PostgreSQL extension named ` pgautofailover `
2627 - a PostgreSQL service to operate the pg_auto_failover monitor
2728 - a pg_auto_failover keeper to operate your PostgreSQL instances, see ` pg_autoctl run `
2829
29- Starting with pg_auto_failover version 1.4, it is possible to implement a
30- production architecture with any number of Postgres nodes, for better data
31- availability guarantees.
30+ ## Multiple Standbys
31+
32+ It is possible to implement a production architecture with any number of
33+ Postgres nodes, for better data availability guarantees.
3234
3335![ pg_auto_failover Architecture with 3 nodes] ( docs/tikz/arch-multi-standby.svg?raw=true " pg_auto_failover Architecture with 3 nodes ")
3436
@@ -37,23 +39,19 @@ that reaches the secondary state is added to synchronous_standby_names on
3739the primary. With pg_auto_failover 1.4 it is possible to remove a node from
3840the _ replication quorum_ of Postgres.
3941
40- ## Dependencies
42+ ## Citus HA
4143
42- At runtime, pg_auto_failover depends on only Postgres. Postgres versions 10,
43- 11, 12, 13, and 14 are currently supported .
44+ Starting with pg_auto_failover 2.0 it's now possible to also implement High
45+ Availability for a Citus cluster .
4446
45- At buildtime. pg_auto_failover depends on Postgres server development
46- package like any other Postgres extensions (the server development package
47- for Postgres 11 when using debian or Ubuntu is named
48- ` postgresql-server-dev-11 ` ), and then ` libssl-dev ` and ` libkrb5-dev ` are
49- needed to for the client side when building with all the ` libpq `
50- authentication options.
47+ ![ pg_auto_failover Architecture with Citus] ( docs/tikz/arch-citus.svg?raw=true " pg_auto_failover Architecture with Citus ")
5148
5249## Documentation
5350
5451Please check out project
55- [ documentation] ( https://pg-auto-failover.readthedocs.io/en/master/ ) for how
56- to guides and troubleshooting information.
52+ [ documentation] ( https://pg-auto-failover.readthedocs.io/en/main/ ) for
53+ tutorial, manual pages, detailed design coverage, and troubleshooting
54+ information.
5755
5856## Installing pg_auto_failover from packages
5957
@@ -64,16 +62,14 @@ the packages from there.
6462### Ubuntu or Debian:
6563
6664Binary packages for debian and derivatives (ubuntu) are available from
67- ` apt.postgresql.org ` __ repository, install by following the linked
68- documentation and then::
65+ [ apt.postgresql.org] ( https://wiki.postgresql.org/wiki/Apt ) repository,
66+ install by following the linked documentation and then::
6967
7068``` bash
7169$ sudo apt-get install pg-auto-failover-cli
7270$ sudo apt-get install postgresql-14-auto-failover
7371```
7472
75- __ https://wiki.postgresql.org/wiki/Apt
76-
7773When using debian, two packages are provided for pg_auto_failover: the
7874monitor Postgres extension is packaged separately and depends on the
7975Postgres version you want to run for the monitor itself. The monitor's
@@ -97,229 +93,32 @@ $ apt-get update
9793$ apt-get install -y --no-install-recommends postgresql-14
9894```
9995
100- ### Fedora, CentOS, or Red Hat:
101-
102- ``` bash
103- # Add the repository to your system
104- curl https://install.citusdata.com/community/rpm.sh | sudo bash
105-
106- # Install pg_auto_failover
107- sudo yum install -y pg-auto-failover10_11
108-
109- # Confirm installation
110- /usr/pgsql-11/bin/pg_autoctl --version
111- ```
96+ ### Other installation methods
11297
113- ## Building pg_auto_failover from source
98+ Please see our extended documentation chapter [ Installing
99+ pg_auto_failover] ( https://pg-auto-failover.readthedocs.io/en/main/install.html )
100+ for details.
114101
115- To build the project, make sure you have installed the build-dependencies,
116- then just type ` make ` . You can install the resulting binary using `make
117- install`.
118-
119- Build dependencies example on debian for Postgres 11:
120-
121- ~~~ bash
122- $ sudo apt-get install postgresql-server-dev-11 libssl-dev libkrb5-dev libncurses6
123- ~~~
124-
125- Then build pg_auto_failover from sources with the following instructions:
126-
127- ~~~ bash
128- $ make
129- $ sudo make install -j10
130- ~~~
102+ ## Trying pg_auto_failover on your local computer
131103
132- For this to work though, the PostgreSQL client (libpq) and server
133- (postgresql-server-dev) libraries must be available in your standard include
134- and link paths.
104+ The main documentation for pg_auto_failover includes the following 3 tutorial:
135105
136- The ` make install ` step will deploy the ` pgautofailover ` PostgreSQL extension in
137- the PostgreSQL directory for extensions as pointed by ` pg_config ` , and
138- install the ` pg_autoctl ` binary command in the directory pointed to by
139- ` pg_config --bindir ` , alongside other PostgreSQL tools such as ` pg_ctl ` and
140- ` pg_controldata ` .
106+ - The main [ pg_auto_failover
107+ Tutorial] ( https://pg-auto-failover.readthedocs.io/en/main/tutorial.html )
108+ uses docker-compose on your local computer to start multiple Postgres
109+ nodes and implement your first failover.
141110
142- ## Trying pg_auto_failover on your local computer
111+ - The complete [ pg_auto_failover Azure VM
112+ Tutorial] ( https://pg-auto-failover.readthedocs.io/en/main/azure-tutorial.html )
113+ guides you into creating an Azure network and then Azure VMs in that
114+ network, to then provisioning those VMs, and then running Postgres nodes
115+ with pg_auto_failover and then introducing hard failures and witnessing
116+ an automated failover.
143117
144- Once the building and installation is done, follow those steps:
145-
146- 0 . If you're building from sources, and if you've already been using tmux,
147- then try the following command:
148-
149- ~~~ bash
150- $ make cluster
151- ~~~
152-
153- This creates a tmux session with multiple panes that are each running a
154- node for pg_auto_failover: the monitor, a first Postgres node, a second
155- Postgres node, and then there is another tmux pane for interactive
156- commands.
157-
158- 1 . Install and run a monitor
159-
160- ~~~ bash
161- $ export PGDATA=./monitor
162- $ export PGPORT=5000
163- $ pg_autoctl create monitor --ssl-self-signed --hostname localhost --auth trust --run
164- ~~~
165-
166- 2. Get the Postgres URI (connection string) for the monitor node:
167-
168- ~ ~~ bash
169- $ pg_autoctl show uri --formation monitor
170- postgres://autoctl_node@localhost:5000/pg_auto_failover? sslmode=require
171- ~~~
172-
173- The following two steps are going to use the option ` --monitor` which
174- expects that connection string. So copy/paste your actual Postgres URI
175- for the monitor in the next steps.
176-
177- 3. Install and run a primary PostgreSQL instance:
178-
179- ~ ~~ bash
180- $ export PGDATA=./node_1
181- $ export PGPORT=5001
182- $ pg_autoctl create postgres \
183- --hostname localhost \
184- --auth trust \
185- --ssl-self-signed \
186- --monitor ' postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require' \
187- --run
188- ~ ~~
189-
190- 4. Install and run a secondary PostgreSQL instance, using exactly the same
191- command, but with a different PGDATA and PGPORT, because we' re running
192- everything on the same host:
193-
194- ~~~ bash
195- $ export PGDATA=./node_2
196- $ export PGPORT=5002
197- $ pg_autoctl create postgres \
198- --hostname localhost \
199- --auth trust \
200- --ssl-self-signed \
201- --monitor ' postgres://autoctl_node@localhost:5000/pg_auto_failover? sslmode=require' \
202- --run
203- ~~~
204-
205- 4. See the state of the new system:
206-
207- ~~~ bash
208- $ export PGDATA=./monitor
209- $ export PGPORT=5000
210- $ pg_autoctl show state
211- Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
212- -------+-------+----------------+-----------+-----------+---------------------+--------------------
213- node_1 | 1 | localhost:5001 | 0/30000D8 | yes | primary | primary
214- node_2 | 2 | localhost:5002 | 0/30000D8 | yes | secondary | secondary
215- ~~~
216-
217- That' s it! You now have a running pg_auto_failover setup with two PostgreSQL nodes
218- using Streaming Replication to implement fault-tolerance.
219-
220- # # Your first failover
221-
222- Now that we have two nodes setup and running, we can initiate a manual
223- failover, also named a switchover. It is possible to trigger such an
224- operation without any node having to actually fail when using
225- pg_auto_failover.
226-
227- The command ` pg_autoctl perform switchover` can be used to force
228- pg_auto_failover to orchestrate a failover. Because all the nodes are
229- actually running fine (meaning that ` pg_autoctl` actively reports the local
230- state of each node to the monitor), the failover process does not have to
231- carefully implement timeouts to make sure to avoid split-brain.
232-
233- ~ ~~ bash
234- $ pg_autoctl perform switchover
235- 19:06:41 63977 INFO Listening monitor notifications about state changes in formation " default" and group 0
236- 19:06:41 63977 INFO Following table displays times when notifications are received
237- Time | Name | Node | Host:Port | Current State | Assigned State
238- ---------+--------+-------+----------------+---------------------+--------------------
239- 19:06:43 | node_1 | 1 | localhost:5001 | primary | draining
240- 19:06:43 | node_2 | 2 | localhost:5002 | secondary | prepare_promotion
241- 19:06:43 | node_2 | 2 | localhost:5002 | prepare_promotion | prepare_promotion
242- 19:06:43 | node_2 | 2 | localhost:5002 | prepare_promotion | stop_replication
243- 19:06:43 | node_1 | 1 | localhost:5001 | primary | demote_timeout
244- 19:06:43 | node_1 | 1 | localhost:5001 | draining | demote_timeout
245- 19:06:43 | node_1 | 1 | localhost:5001 | demote_timeout | demote_timeout
246- 19:06:44 | node_2 | 2 | localhost:5002 | stop_replication | stop_replication
247- 19:06:44 | node_2 | 2 | localhost:5002 | stop_replication | wait_primary
248- 19:06:44 | node_1 | 1 | localhost:5001 | demote_timeout | demoted
249- 19:06:44 | node_1 | 1 | localhost:5001 | demoted | demoted
250- 19:06:44 | node_2 | 2 | localhost:5002 | wait_primary | wait_primary
251- 19:06:45 | node_1 | 1 | localhost:5001 | demoted | catchingup
252- 19:06:46 | node_1 | 1 | localhost:5001 | catchingup | catchingup
253- 19:06:47 | node_1 | 1 | localhost:5001 | catchingup | secondary
254- 19:06:47 | node_2 | 2 | localhost:5002 | wait_primary | primary
255- 19:06:47 | node_1 | 1 | localhost:5001 | secondary | secondary
256- 19:06:48 | node_2 | 2 | localhost:5002 | primary | primary
257- ~ ~~
258-
259- The promotion of the secondary node is finished when the node reaches the
260- goal state * wait_primary* . At this point, the application that connects to
261- the secondary is allowed to proceed with write traffic.
262-
263- Because this is a switchover and no nodes have failed, ` node_1` that used to
264- be the primary completes its cycle and joins as a secondary within the same
265- operation. The Postgres tool ` pg_rewind` is used to implement that
266- transition.
267-
268- And there you have done a full failover from your ` node_1` , former primary, to
269- your ` node_2` , new primary. We can have a look at the state now:
270-
271- ~~~
272- $ pg_autoctl show state
273- Name | Node | Host: Port | LSN | Reachable | Current State | Assigned State
274- -------+-------+----------------+-----------+-----------+---------------------+--------------------
275- node_1 | 1 | localhost:5001 | 0/3001648 | yes | secondary | secondary
276- node_2 | 2 | localhost:5002 | 0/3001648 | yes | primary | primary
277- ~~~
278-
279- ## Cleaning-up your local setup
280-
281- You can use the commands `pg_autoctl stop`, `pg_autoctl drop node
282- --destroy`, and `pg_autoctl drop monitor --destroy` if you want to get rid
283- of everything set-up so far.
284-
285- ## Formations and Groups
286-
287- In the previous example, the options `--formation` and `--group` are not
288- used. This means we've been using the default values: the default formation
289- is named *default* and the default group id is zero (0).
290-
291- It's possible to add other services to the same running monitor by using
292- another formation.
293-
294- ## Installing pg_auto_failover on-top of an existing Postgres setup
295-
296- The `pg_autoctl create postgres --pgdata ${PGDATA}` step can be used with an
297- existing Postgres installation running at `${PGDATA}`, only with the primary
298- node.
299-
300- On a secondary node, it is possible to re-use an existing data directory
301- when it has the same `system_identifier` as the other node(s) already
302- registered in the same formation and group.
303-
304- ## Application and Connection Strings
305-
306- To retrieve the connection string to use at the application level, use the
307- following command:
308-
309- ~~~ bash
310- $ pg_autoctl show uri --formation default --pgdata ...
311- postgres://localhost:5002,localhost:5001/postgres?target_session_attrs=read-write&sslmode=require
312- ~~~
313-
314- You can use that connection string from within your application, adjusting
315- the username that is used to connect. By default, pg_auto_failover edits the
316- Postgres HBA rules to allow the ` --username ` given at `pg_autoctl create
317- postgres` time to connect to this URI from the database node itself.
318-
319- To allow application servers to connect to the Postgres database, edit your
320- ` pg_hba.conf ` file as documented in [ the pg_hba.conf
321- file] ( https://www.postgresql.org/docs/current/auth-pg-hba-conf.html ) chapter
322- of the PostgreSQL documentation.
118+ - The [ Citus Cluster Quick
119+ Start] ( https://pg-auto-failover.readthedocs.io/en/main/citus-quickstart.html )
120+ tutorial uses docker-compose to create a full Citus cluster and guide
121+ you to a worker failover and then a coordinator failover.
323122
324123## Reporting Security Issues
325124
0 commit comments