Skip to content

Commit 62f982b

Browse files
authored
Review the README, introduce a new first simple tutorial. (#942)
1 parent 69441fe commit 62f982b

File tree

13 files changed

+866
-674
lines changed

13 files changed

+866
-674
lines changed

README.md

Lines changed: 41 additions & 242 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# pg_auto_failover
22

3-
[![Documentation Status](https://readthedocs.org/projects/pg-auto-failover/badge/?version=master)](https://pg-auto-failover.readthedocs.io/en/master/?badge=master)
3+
[![Documentation Status](https://readthedocs.org/projects/pg-auto-failover/badge/?version=main)](https://pg-auto-failover.readthedocs.io/en/main/?badge=main)
44

55
pg_auto_failover is an extension and service for PostgreSQL that monitors
66
and manages automated failover for a Postgres cluster. It is optimized for
@@ -13,22 +13,24 @@ and secondary by the monitor.
1313

1414
![pg_auto_failover Architecture with 2 nodes](docs/tikz/arch-single-standby.svg?raw=true "pg_auto_failover Architecture with 2 nodes")
1515

16-
The pg_auto_failover Monitor implements a state machine and relies on in-core
17-
PostgreSQL facilities to deliver HA. For example. when the **secondary** node
18-
is detected to be unavailable, or when its lag is too much, then the
19-
Monitor removes it from the `synchronous_standby_names` setting on the
20-
**primary** node. Until the **secondary** is back to being monitored healthy,
21-
failover and switchover operations are not allowed, preventing data loss.
16+
The pg_auto_failover Monitor implements a state machine and relies on
17+
in-core PostgreSQL facilities to deliver HA. For example. when the
18+
**secondary** node is detected to be unavailable, or when its lag is too
19+
much, then the Monitor removes it from the `synchronous_standby_names`
20+
setting on the **primary** node. Until the **secondary** is back to being
21+
monitored healthy, failover and switchover operations are not allowed,
22+
preventing data loss.
2223

2324
pg_auto_failover consists of the following parts:
2425

2526
- a PostgreSQL extension named `pgautofailover`
2627
- a PostgreSQL service to operate the pg_auto_failover monitor
2728
- a pg_auto_failover keeper to operate your PostgreSQL instances, see `pg_autoctl run`
2829

29-
Starting with pg_auto_failover version 1.4, it is possible to implement a
30-
production architecture with any number of Postgres nodes, for better data
31-
availability guarantees.
30+
## Multiple Standbys
31+
32+
It is possible to implement a production architecture with any number of
33+
Postgres nodes, for better data availability guarantees.
3234

3335
![pg_auto_failover Architecture with 3 nodes](docs/tikz/arch-multi-standby.svg?raw=true "pg_auto_failover Architecture with 3 nodes")
3436

@@ -37,23 +39,19 @@ that reaches the secondary state is added to synchronous_standby_names on
3739
the primary. With pg_auto_failover 1.4 it is possible to remove a node from
3840
the _replication quorum_ of Postgres.
3941

40-
## Dependencies
42+
## Citus HA
4143

42-
At runtime, pg_auto_failover depends on only Postgres. Postgres versions 10,
43-
11, 12, 13, and 14 are currently supported.
44+
Starting with pg_auto_failover 2.0 it's now possible to also implement High
45+
Availability for a Citus cluster.
4446

45-
At buildtime. pg_auto_failover depends on Postgres server development
46-
package like any other Postgres extensions (the server development package
47-
for Postgres 11 when using debian or Ubuntu is named
48-
`postgresql-server-dev-11`), and then `libssl-dev` and `libkrb5-dev` are
49-
needed to for the client side when building with all the `libpq`
50-
authentication options.
47+
![pg_auto_failover Architecture with Citus](docs/tikz/arch-citus.svg?raw=true "pg_auto_failover Architecture with Citus")
5148

5249
## Documentation
5350

5451
Please check out project
55-
[documentation](https://pg-auto-failover.readthedocs.io/en/master/) for how
56-
to guides and troubleshooting information.
52+
[documentation](https://pg-auto-failover.readthedocs.io/en/main/) for
53+
tutorial, manual pages, detailed design coverage, and troubleshooting
54+
information.
5755

5856
## Installing pg_auto_failover from packages
5957

@@ -64,16 +62,14 @@ the packages from there.
6462
### Ubuntu or Debian:
6563

6664
Binary packages for debian and derivatives (ubuntu) are available from
67-
`apt.postgresql.org`__ repository, install by following the linked
68-
documentation and then::
65+
[apt.postgresql.org](https://wiki.postgresql.org/wiki/Apt) repository,
66+
install by following the linked documentation and then::
6967

7068
```bash
7169
$ sudo apt-get install pg-auto-failover-cli
7270
$ sudo apt-get install postgresql-14-auto-failover
7371
```
7472

75-
__ https://wiki.postgresql.org/wiki/Apt
76-
7773
When using debian, two packages are provided for pg_auto_failover: the
7874
monitor Postgres extension is packaged separately and depends on the
7975
Postgres version you want to run for the monitor itself. The monitor's
@@ -97,229 +93,32 @@ $ apt-get update
9793
$ apt-get install -y --no-install-recommends postgresql-14
9894
```
9995

100-
### Fedora, CentOS, or Red Hat:
101-
102-
```bash
103-
# Add the repository to your system
104-
curl https://install.citusdata.com/community/rpm.sh | sudo bash
105-
106-
# Install pg_auto_failover
107-
sudo yum install -y pg-auto-failover10_11
108-
109-
# Confirm installation
110-
/usr/pgsql-11/bin/pg_autoctl --version
111-
```
96+
### Other installation methods
11297

113-
## Building pg_auto_failover from source
98+
Please see our extended documentation chapter [Installing
99+
pg_auto_failover](https://pg-auto-failover.readthedocs.io/en/main/install.html)
100+
for details.
114101

115-
To build the project, make sure you have installed the build-dependencies,
116-
then just type `make`. You can install the resulting binary using `make
117-
install`.
118-
119-
Build dependencies example on debian for Postgres 11:
120-
121-
~~~ bash
122-
$ sudo apt-get install postgresql-server-dev-11 libssl-dev libkrb5-dev libncurses6
123-
~~~
124-
125-
Then build pg_auto_failover from sources with the following instructions:
126-
127-
~~~ bash
128-
$ make
129-
$ sudo make install -j10
130-
~~~
102+
## Trying pg_auto_failover on your local computer
131103

132-
For this to work though, the PostgreSQL client (libpq) and server
133-
(postgresql-server-dev) libraries must be available in your standard include
134-
and link paths.
104+
The main documentation for pg_auto_failover includes the following 3 tutorial:
135105

136-
The `make install` step will deploy the `pgautofailover` PostgreSQL extension in
137-
the PostgreSQL directory for extensions as pointed by `pg_config`, and
138-
install the `pg_autoctl` binary command in the directory pointed to by
139-
`pg_config --bindir`, alongside other PostgreSQL tools such as `pg_ctl` and
140-
`pg_controldata`.
106+
- The main [pg_auto_failover
107+
Tutorial](https://pg-auto-failover.readthedocs.io/en/main/tutorial.html)
108+
uses docker-compose on your local computer to start multiple Postgres
109+
nodes and implement your first failover.
141110

142-
## Trying pg_auto_failover on your local computer
111+
- The complete [pg_auto_failover Azure VM
112+
Tutorial](https://pg-auto-failover.readthedocs.io/en/main/azure-tutorial.html)
113+
guides you into creating an Azure network and then Azure VMs in that
114+
network, to then provisioning those VMs, and then running Postgres nodes
115+
with pg_auto_failover and then introducing hard failures and witnessing
116+
an automated failover.
143117

144-
Once the building and installation is done, follow those steps:
145-
146-
0. If you're building from sources, and if you've already been using tmux,
147-
then try the following command:
148-
149-
~~~ bash
150-
$ make cluster
151-
~~~
152-
153-
This creates a tmux session with multiple panes that are each running a
154-
node for pg_auto_failover: the monitor, a first Postgres node, a second
155-
Postgres node, and then there is another tmux pane for interactive
156-
commands.
157-
158-
1. Install and run a monitor
159-
160-
~~~ bash
161-
$ export PGDATA=./monitor
162-
$ export PGPORT=5000
163-
$ pg_autoctl create monitor --ssl-self-signed --hostname localhost --auth trust --run
164-
~~~
165-
166-
2. Get the Postgres URI (connection string) for the monitor node:
167-
168-
~~~ bash
169-
$ pg_autoctl show uri --formation monitor
170-
postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require
171-
~~~
172-
173-
The following two steps are going to use the option `--monitor` which
174-
expects that connection string. So copy/paste your actual Postgres URI
175-
for the monitor in the next steps.
176-
177-
3. Install and run a primary PostgreSQL instance:
178-
179-
~~~ bash
180-
$ export PGDATA=./node_1
181-
$ export PGPORT=5001
182-
$ pg_autoctl create postgres \
183-
--hostname localhost \
184-
--auth trust \
185-
--ssl-self-signed \
186-
--monitor 'postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require' \
187-
--run
188-
~~~
189-
190-
4. Install and run a secondary PostgreSQL instance, using exactly the same
191-
command, but with a different PGDATA and PGPORT, because we're running
192-
everything on the same host:
193-
194-
~~~ bash
195-
$ export PGDATA=./node_2
196-
$ export PGPORT=5002
197-
$ pg_autoctl create postgres \
198-
--hostname localhost \
199-
--auth trust \
200-
--ssl-self-signed \
201-
--monitor 'postgres://autoctl_node@localhost:5000/pg_auto_failover?sslmode=require' \
202-
--run
203-
~~~
204-
205-
4. See the state of the new system:
206-
207-
~~~ bash
208-
$ export PGDATA=./monitor
209-
$ export PGPORT=5000
210-
$ pg_autoctl show state
211-
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
212-
-------+-------+----------------+-----------+-----------+---------------------+--------------------
213-
node_1 | 1 | localhost:5001 | 0/30000D8 | yes | primary | primary
214-
node_2 | 2 | localhost:5002 | 0/30000D8 | yes | secondary | secondary
215-
~~~
216-
217-
That's it! You now have a running pg_auto_failover setup with two PostgreSQL nodes
218-
using Streaming Replication to implement fault-tolerance.
219-
220-
## Your first failover
221-
222-
Now that we have two nodes setup and running, we can initiate a manual
223-
failover, also named a switchover. It is possible to trigger such an
224-
operation without any node having to actually fail when using
225-
pg_auto_failover.
226-
227-
The command `pg_autoctl perform switchover` can be used to force
228-
pg_auto_failover to orchestrate a failover. Because all the nodes are
229-
actually running fine (meaning that `pg_autoctl` actively reports the local
230-
state of each node to the monitor), the failover process does not have to
231-
carefully implement timeouts to make sure to avoid split-brain.
232-
233-
~~~ bash
234-
$ pg_autoctl perform switchover
235-
19:06:41 63977 INFO Listening monitor notifications about state changes in formation "default" and group 0
236-
19:06:41 63977 INFO Following table displays times when notifications are received
237-
Time | Name | Node | Host:Port | Current State | Assigned State
238-
---------+--------+-------+----------------+---------------------+--------------------
239-
19:06:43 | node_1 | 1 | localhost:5001 | primary | draining
240-
19:06:43 | node_2 | 2 | localhost:5002 | secondary | prepare_promotion
241-
19:06:43 | node_2 | 2 | localhost:5002 | prepare_promotion | prepare_promotion
242-
19:06:43 | node_2 | 2 | localhost:5002 | prepare_promotion | stop_replication
243-
19:06:43 | node_1 | 1 | localhost:5001 | primary | demote_timeout
244-
19:06:43 | node_1 | 1 | localhost:5001 | draining | demote_timeout
245-
19:06:43 | node_1 | 1 | localhost:5001 | demote_timeout | demote_timeout
246-
19:06:44 | node_2 | 2 | localhost:5002 | stop_replication | stop_replication
247-
19:06:44 | node_2 | 2 | localhost:5002 | stop_replication | wait_primary
248-
19:06:44 | node_1 | 1 | localhost:5001 | demote_timeout | demoted
249-
19:06:44 | node_1 | 1 | localhost:5001 | demoted | demoted
250-
19:06:44 | node_2 | 2 | localhost:5002 | wait_primary | wait_primary
251-
19:06:45 | node_1 | 1 | localhost:5001 | demoted | catchingup
252-
19:06:46 | node_1 | 1 | localhost:5001 | catchingup | catchingup
253-
19:06:47 | node_1 | 1 | localhost:5001 | catchingup | secondary
254-
19:06:47 | node_2 | 2 | localhost:5002 | wait_primary | primary
255-
19:06:47 | node_1 | 1 | localhost:5001 | secondary | secondary
256-
19:06:48 | node_2 | 2 | localhost:5002 | primary | primary
257-
~~~
258-
259-
The promotion of the secondary node is finished when the node reaches the
260-
goal state *wait_primary*. At this point, the application that connects to
261-
the secondary is allowed to proceed with write traffic.
262-
263-
Because this is a switchover and no nodes have failed, `node_1` that used to
264-
be the primary completes its cycle and joins as a secondary within the same
265-
operation. The Postgres tool `pg_rewind` is used to implement that
266-
transition.
267-
268-
And there you have done a full failover from your `node_1`, former primary, to
269-
your `node_2`, new primary. We can have a look at the state now:
270-
271-
~~~
272-
$ pg_autoctl show state
273-
Name | Node | Host:Port | LSN | Reachable | Current State | Assigned State
274-
-------+-------+----------------+-----------+-----------+---------------------+--------------------
275-
node_1 | 1 | localhost:5001 | 0/3001648 | yes | secondary | secondary
276-
node_2 | 2 | localhost:5002 | 0/3001648 | yes | primary | primary
277-
~~~
278-
279-
## Cleaning-up your local setup
280-
281-
You can use the commands `pg_autoctl stop`, `pg_autoctl drop node
282-
--destroy`, and `pg_autoctl drop monitor --destroy` if you want to get rid
283-
of everything set-up so far.
284-
285-
## Formations and Groups
286-
287-
In the previous example, the options `--formation` and `--group` are not
288-
used. This means we've been using the default values: the default formation
289-
is named *default* and the default group id is zero (0).
290-
291-
It's possible to add other services to the same running monitor by using
292-
another formation.
293-
294-
## Installing pg_auto_failover on-top of an existing Postgres setup
295-
296-
The `pg_autoctl create postgres --pgdata ${PGDATA}` step can be used with an
297-
existing Postgres installation running at `${PGDATA}`, only with the primary
298-
node.
299-
300-
On a secondary node, it is possible to re-use an existing data directory
301-
when it has the same `system_identifier` as the other node(s) already
302-
registered in the same formation and group.
303-
304-
## Application and Connection Strings
305-
306-
To retrieve the connection string to use at the application level, use the
307-
following command:
308-
309-
~~~ bash
310-
$ pg_autoctl show uri --formation default --pgdata ...
311-
postgres://localhost:5002,localhost:5001/postgres?target_session_attrs=read-write&sslmode=require
312-
~~~
313-
314-
You can use that connection string from within your application, adjusting
315-
the username that is used to connect. By default, pg_auto_failover edits the
316-
Postgres HBA rules to allow the `--username` given at `pg_autoctl create
317-
postgres` time to connect to this URI from the database node itself.
318-
319-
To allow application servers to connect to the Postgres database, edit your
320-
`pg_hba.conf` file as documented in [the pg_hba.conf
321-
file](https://www.postgresql.org/docs/current/auth-pg-hba-conf.html) chapter
322-
of the PostgreSQL documentation.
118+
- The [Citus Cluster Quick
119+
Start](https://pg-auto-failover.readthedocs.io/en/main/citus-quickstart.html)
120+
tutorial uses docker-compose to create a full Citus cluster and guide
121+
you to a worker failover and then a coordinator failover.
323122

324123
## Reporting Security Issues
325124

0 commit comments

Comments
 (0)