Skip to content

Commit 03eeb7f

Browse files
authored
Update troubleshooting.md
1 parent 9c6bc11 commit 03eeb7f

File tree

1 file changed

+12
-10
lines changed

1 file changed

+12
-10
lines changed

docs/troubleshooting.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,40 +3,42 @@
33

44
## Member unregistration failed when removing machine
55

6-
```
7-
$ fly machines remove 9185340f4d3383 --app flex-testing
6+
Example failure when removing a machine:
7+
```bash
8+
fly machines remove 9185340f4d3383 --app flex-testing
89
machine 9185340f4d3383 was found and is currently in stopped state, attempting to destroy...
910
unregistering postgres member 'fdaa:0:2e26:a7b:7d16:cff7:9849:2' from the cluster... <insert-random-error-here> (failed)
1011

1112
9185340f4d3383 has been destroyed
1213
```
14+
1315
Unfortionately, this can happen for a variety of reasons. If no action is taken, the member and associated replication slot will automatically be cleaned up after 24 hours. Depending on the current cluster size, problems can arise if the down member impacts the clusters ability to meet quorum. If this case, it's important to take action right away to prevent your cluster from going read-only.
1416

1517

1618
To address this, start by ssh'ing into one of your running Machines.
1719

18-
```
20+
```bash
1921
fly ssh console --app <app-name>
2022
```
2123

22-
Switch to the postgres user and move into the home directory.
23-
```
24+
Use the `repmgr` cli tool to view the current cluster state.
25+
```bash
26+
# Switch to the postgres user and move into the home directory.
2427
su postgres
2528
cd ~
26-
```
2729

28-
Use the `rempgr` cli tool to view the current cluster state.
29-
```
3030
repmgr daemon status
3131

3232
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
3333
----+----------------------------------+---------+---------------+------------------------------------+---------+-----+---------+--------------------
3434
376084936 | fdaa:0:2e26:a7b:7d18:1a68:804e:2 | primary | * running | | running | 630 | no | n/a
3535
1349952263 | fdaa:0:2e26:a7b:7d17:4463:955d:2 | standby | ? unreachable | ? fdaa:0:2e26:a7b:7d18:1a68:804e:2 | n/a | n/a | n/a | n/a
3636
1412735685 | fdaa:0:2e26:a7b:c850:8f12:fb1d:2 | standby | running | fdaa:0:2e26:a7b:7d18:1a68:804e:2 | running | 617 | no | 1 second(s) ago
37-
```
3837

39-
Manually unregister the unreachable standby.
4038
```
39+
40+
41+
Unregister the unreachable standby.
42+
```bash
4143
repmgr standby unregister --node-id 1349952263
4244
```

0 commit comments

Comments
 (0)