You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For HA, it's recommended that you run at least 3 members.
19
-
20
-
Automatic failovers will only consider members residing within your primary region. The primary region is represented as an environment variable defined within the `fly.toml` file. That being said, if you're running a 3 member setup at least 2 of your members should reside within your primary region.
18
+
For HA, it's recommended that you run at least 3 members within your primary region. Automatic failovers will only consider members residing within your primary region. The primary region is represented as an environment variable defined within the `fly.toml` file.
Copy file name to clipboardExpand all lines: docs/fencing.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
# Fencing
2
2
3
3
## How do we verify the real primary?
4
-
We start out evaluating the cluster state by checking each registered standby for connectivity and asking who their primary is.
4
+
We start out by evaluating the cluster state by checking each registered standby within the primary region for connectivity and asking who their primary is.
5
5
6
6
The "clusters state" is represented across a few different dimensions:
7
7
@@ -24,7 +24,7 @@ map[string]int{
24
24
}
25
25
```
26
26
27
-
The real primary is resolvable so long as the majority of members can agree on who it is. Quorum being defined as `total_members / 2 + 1`.
27
+
The real primary is resolvable so long as the majority of members can agree on who it is. Quorum being defined as `total_members_in_region / 2 + 1`.
28
28
29
29
**Note: When the primary being evaluated meets quorum, it will still be fenced in the event a conflict is found. This is to protect against a possible race condition where an old primary comes back up in the middle of an active failover.**
30
30
@@ -45,11 +45,11 @@ The cluster will be made read-only and the `zombie.lock` file will be created wi
45
45
46
46
## Monitoring cluster state
47
47
48
-
In order to mitigate possible split-brain scenarios, it's important that cluster state is evaluated regularly and when specific events/actions take place.
48
+
In order to mitigate possible split-brain scenarios, it's important that cluster state is evaluated regularly and when specific events/actions take place.
49
49
50
50
### On boot
51
51
This is to ensure the booting primary is not a primary coming back from the dead.
52
-
52
+
53
53
### During standby connect/reconnect/disconnect events
54
54
There are a myriad of reasons why a standby might disconnect, but we have to assume the possibility of a network partition. In either case, if quorum is lost, the primary will be fenced.
55
55
@@ -60,7 +60,7 @@ Cluster state is monitored in the background at regular intervals. This acts as
60
60
## Split-brain detection window
61
61
**This pertains to v0.0.36+**
62
62
63
-
When a network partition is initiated, the following steps are performed:
63
+
When a network partition is initiated, the following steps are performed:
64
64
65
65
1. Repmgr will attempt to ping registered members with a 5s connect timeout.
66
66
2. Repmgr will wait up to 30 seconds for the standby to reconnect before issuing a `child_node_disconnect` event.
0 commit comments