|
| 1 | +# Fencing |
| 2 | + |
| 3 | +## How do we verify the real primary? |
| 4 | +We start out evaluating the cluster state by checking each registered standby for connectivity and asking who their primary is. |
| 5 | + |
| 6 | +The "clusters state" is represented across a few different dimensions: |
| 7 | + |
| 8 | +**Total members** |
| 9 | +Number of registered members, including the primary. |
| 10 | + |
| 11 | +**Total active members** |
| 12 | +Number of members that are responsive. This includes the primary we are evaluating, so this will never be less than one. |
| 13 | + |
| 14 | +**Total inactive members** |
| 15 | +Number of registered members that are non-responsive. |
| 16 | + |
| 17 | +**Conflict map** |
| 18 | +The conflict map is a `map[string]int` that tracks conflicting primary's queried from our standbys and the number of occurrences a given primary was referenced. |
| 19 | + |
| 20 | +As an example, say we have a 3 member cluster and both of the standby's indicate their registered primary does not match. This will be recorded as: |
| 21 | +``` |
| 22 | +map[string]int{ |
| 23 | + "fdaa:0:2e26:a7b:8c31:bf37:488c:2": 2 |
| 24 | +} |
| 25 | +``` |
| 26 | + |
| 27 | +The real primary is resolvable so long as the majority of members can agree on who it is. Quorum being defined as `total_members / 2 + 1`. |
| 28 | + |
| 29 | +**There is one exception to note here. When the primary being evaluated meets quorum, it will still be fenced in the event a conflict is found. This is to protect against a possible race condition where the old primary comes back up during an active failover.** |
| 30 | + |
| 31 | +Tests can be found here: https://github.com/fly-apps/postgres-flex/pull/49/files#diff-3d71960ff7855f775cb257a74643d67d2636b354c9d485d10c2ded2426a7f362 |
| 32 | + |
| 33 | +## What if the real primary can't be resolved or doesn't match the booting primary? |
| 34 | + |
| 35 | +In both of these instances the primary member will be fenced. |
| 36 | + |
| 37 | +**If the real primary is resolvable** |
| 38 | +The cluster will be made read-only, the PGBouncer will be reconfigured to target the "real" primary and the ip address is written to a `zombie.lock` file. The PGBouncer reconfiguration will ensure that any connections hitting this member will be routed to the real primary in order to minimize interruptions. Once completed there will be panic to force a full member restart. When the member is restarted, we will read the ip address from the `zombie.lock` file and use that to attempt to rejoin the cluster we diverged from. If we are successful, the `zombie.lock` is cleared and we will boot as a standby. |
| 39 | + |
| 40 | +**Note: We will not attempt to rejoin a cluster if the resolved primary resides in a region that differs from the `PRIMARY_REGION` environment variable set on self. The `PRIMARY_REGION` will need to be updated before a rejoin will be attempted.** |
| 41 | + |
| 42 | +**If the real primary is NOT resolvable** |
| 43 | +The cluster will be made read-only, PGBouncer will remain disabled and a `zombie.lock` file will be created without a value. When the member reboots, we will read the `zombie.lock` file and see that it's empty. This indicates that we've entered a failure mode that can't be recovered automatically. This could be an issue where previously deleted members were not properly unregistered, or the primary's state has diverged to a point where its registered members have been cycled out. |
0 commit comments